Compare commits

...

621 Commits

Author SHA1 Message Date
abb04342fc Linux 4.11.6 2017-06-17 06:47:27 +02:00
6a9a78ba17 drm/i915: Disable decoupled MMIO
commit 4c4c565513 upstream.

The decoupled MMIO feature doesn't work as intended by HW team. Enabling
it with forcewake will only make debugging efforts more difficult, so
let's disable it.

Fixes: 85ee17ebee ("drm/i915/bxt: Broxton decoupled MMIO")
Cc: Zhe Wang <zhe1.wang@intel.com>
Cc: Praveen Paneri <praveen.paneri@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Signed-off-by: Kai Chen <kai.chen@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170523215812.18328-2-kai.chen@intel.com
(cherry picked from commit 0051c10aca)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
b094fd12b9 drm/i915: Always recompute watermarks when distrust_bios_wm is set, v2.
commit 4e3aed8445 upstream.

On some systems there can be a race condition in which no crtc state is
added to the first atomic commit. This results in all crtc's having a
null DDB allocation, causing a FIFO underrun on any update until the
first modeset.

Changes since v1:
- Do not take the connection_mutex, this is already done below.

Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Inspired-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 98d39494d3 ("drm/i915/gen9: Compute DDB allocation at atomic
check time (v4)")
Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170531154236.27180-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit 367d73d280)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-06-17 06:44:18 +02:00
dcdac1c29f drm/i915: Guard against i915_ggtt_disable_guc() being invoked unconditionally
commit d90c98905a upstream.

Commit 7c3f86b6dc ("drm/i915: Invalidate the guc ggtt TLB upon
insertion") added the restoration of the invalidation routine after the
GuC was disabled, but missed that the GuC was unconditionally disabled
when not used. This then overwrites the invalidate routine for the older
chipsets, causing havoc and breaking resume as the most obvious victim.

We place the guard inside i915_ggtt_disable_guc() to be backport
friendly (the bug was introduced into v4.11) but it would be preferred
to be in more control over when this was guard (i.e. do not try and
teardown the data structures before we have enabled them). That should
be true with the reorganisation of the guc loaders.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 7c3f86b6dc ("drm/i915: Invalidate the guc ggtt TLB upon insertion")
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170531190514.3691-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
(cherry picked from commit cb60606d83)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
4d2c473f9f drm/i915: Workaround VLV/CHV DSI scanline counter hardware fail
commit 8f4d38099b upstream.

The scanline counter is bonkers on VLV/CHV DSI. The scanline counter
increment is not lined up with the start of vblank like it is on
every other platform and output type. This causes problems for
both the vblank timestamping and atomic update vblank evasion.

On my FFRD8 machine at least, the scanline counter increment
happens about 1/3 of a scanline ahead of the start of vblank (which
is where all register latching happens still). That means we can't
trust the scanline counter to tell us whether we're in vblank or not
while we're on that particular line. In order to keep vblank
timestamping in working condition when called from the vblank irq,
we'll leave scanline_offset at one, which means that the entire
line containing the start of vblank is considered to be inside
the vblank.

For the vblank evasion we'll need to consider that entire line
to be bad, since we can't tell whether the registers already
got latched or not. And we can't actually use the start of vblank
interrupt to get us past that line as the interrupt would fire
too soon, and then we'd up waiting for the next start of vblank
instead. One way around that would using the frame start
interrupt instead since that wouldn't fire until the next
scanline, but that would require some bigger changes in the
interrupt code. So for simplicity we'll just poll until we get
past the bad line.

v2: Adjust the comments a bit

Cc: Jonas Aaberg <cja@gmx.net>
Tested-by: Jonas Aaberg <cja@gmx.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99086
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161215174734.28779-1-ville.syrjala@linux.intel.com
Tested-by: Mika Kahola <mika.kahola@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
(cherry picked from commit ec1b4ee283)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
3b981b2388 drm/i915: Fix 90/270 rotated coordinates for FBC
commit 1065467ed8 upstream.

The clipped src coordinates have already been rotated by 270 degrees for
when the plane rotation is 90/270 degrees, hence the FBC code should no
longer swap the width and height.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Fixes: b63a16f6cd ("drm/i915: Compute display surface offset in the plane check hook for SKL+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170331180056.14086-4-ville.syrjala@linux.intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
(cherry picked from commit 73714c05df)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
5b31ae00ac Revert "drm/i915: Restore lost "Initialized i915" welcome message"
commit d38162e4b5 upstream.

This reverts commit bc5ca47c0a.

Gabriel put this back into generic code with

commit 75f6dfe3e6
Author: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Date:   Wed Dec 28 12:32:11 2016 -0200

    drm: Deduplicate driver initialization message

but somehow he missed Chris' patch to add the message meanwhile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101025
Fixes: 75f6dfe3e6 ("drm: Deduplicate driver initialization message")
Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517131557.7836-1-daniel.vetter@ffwll.ch
(cherry picked from commit 6bdba81979)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
c094b10c3c s390/kvm: do not rely on the ILC on kvm host protection fauls
commit c0e7bb38c0 upstream.

For most cases a protection exception in the host (e.g. copy
on write or dirty tracking) on the sie instruction will indicate
an instruction length of 4. Turns out that there are some corner
cases (e.g. runtime instrumentation) where this is not necessarily
true and the ILC is unpredictable.

Let's replace our 4 byte rewind_pad with 3 byte nops to prepare for
all possible ILCs.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
2273302258 xtensa: don't use linux IRQ #0
commit e5c86679d5 upstream.

Linux IRQ #0 is reserved for error reporting and may not be used.
Increase NR_IRQS for one additional slot and increase
irq_domain_add_legacy parameter first_irq value to 1, so that linux
IRQ #0 is not associated with hardware IRQ #0 in legacy IRQ domains.
Introduce macro XTENSA_PIC_LINUX_IRQ for static translation of xtensa
PIC hardware IRQ # to linux IRQ #. Use this macro in XTFPGA platform
data definitions.

This fixes inability to use hardware IRQ #0 in configurations that don't
use device tree and allows for non-identity mapping between linux IRQ #
and hardware IRQ #.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
2f3271eb10 efi: Fix boot panic because of invalid BGRT image address
commit 792ef14df5 upstream.

Maniaxx reported a kernel boot crash in the EFI code, which I emulated
by using same invalid phys addr in code:

  BUG: unable to handle kernel paging request at ffffffffff280001
  IP: efi_bgrt_init+0xfb/0x153
  ...
  Call Trace:
   ? bgrt_init+0xbc/0xbc
   acpi_parse_bgrt+0xe/0x12
   acpi_table_parse+0x89/0xb8
   acpi_boot_init+0x445/0x4e2
   ? acpi_parse_x2apic+0x79/0x79
   ? dmi_ignore_irq0_timer_override+0x33/0x33
   setup_arch+0xb63/0xc82
   ? early_idt_handler_array+0x120/0x120
   start_kernel+0xb7/0x443
   ? early_idt_handler_array+0x120/0x120
   x86_64_start_reservations+0x29/0x2b
   x86_64_start_kernel+0x154/0x177
   secondary_startup_64+0x9f/0x9f

There is also a similar bug filed in bugzilla.kernel.org:

  https://bugzilla.kernel.org/show_bug.cgi?id=195633

The crash is caused by this commit:

  7b0a911478 efi/x86: Move the EFI BGRT init code to early init code

The root cause is the firmware on those machines provides invalid BGRT
image addresses.

In a kernel before above commit BGRT initializes late and uses ioremap()
to map the image address. Ioremap validates the address, if it is not a
valid physical address ioremap() just fails and returns. However in current
kernel EFI BGRT initializes early and uses early_memremap() which does not
validate the image address, and kernel panic happens.

According to ACPI spec the BGRT image address should fall into
EFI_BOOT_SERVICES_DATA, see the section 5.2.22.4 of below document:

  http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf

Fix this issue by validating the image address in efi_bgrt_init(). If the
image address does not fall into any EFI_BOOT_SERVICES_DATA areas we just
bail out with a warning message.

Reported-by: Maniaxx <tripleshiftone@gmail.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 7b0a911478 ("efi/x86: Move the EFI BGRT init code to early init code")
Link: http://lkml.kernel.org/r/20170609084558.26766-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:18 +02:00
041d665eb1 partitions/msdos: FreeBSD UFS2 file systems are not recognized
commit 223220356d upstream.

The code in block/partitions/msdos.c recognizes FreeBSD, OpenBSD
and NetBSD partitions and does a reasonable job picking out OpenBSD
and NetBSD UFS subpartitions.

But for FreeBSD the subpartitions are always "bad".

    Kernel: <bsd:bad subpartition - ignored

Though all 3 of these BSD systems use UFS as a file system, only
FreeBSD uses relative start addresses in the subpartition
declarations.

The following patch fixes this for FreeBSD partitions and leaves
the code for OpenBSD and NetBSD intact:

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:17 +02:00
99f5ba009e drm/i915: Prevent the system suspend complete optimization
commit 6ab92afc95 upstream.

Since

commit bac2a909a0
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Wed Jan 21 02:17:42 2015 +0100

    PCI / PM: Avoid resuming PCI devices during system suspend

PCI devices will default to allowing the system suspend complete
optimization where devices are not woken up during system suspend if
they were already runtime suspended. This however breaks the i915/HDA
drivers for two reasons:

- The i915 driver has system suspend specific steps that it needs to
  run, that bring the device to a different state than its runtime
  suspended state.

- The HDA driver's suspend handler requires power that it will request
  from the i915 driver's power domain handler. This in turn requires the
  i915 driver to runtime resume itself, but this won't be possible if the
  suspend complete optimization is in effect: in this case the i915
  runtime PM is disabled and trying to get an RPM reference returns
  -EACCESS.

Solve this by requiring the PCI/PM core to resume the device during
system suspend which in effect disables the suspend complete optimization.

Regardless of the above commit the optimization stayed disabled for DRM
devices until

commit d14d2a8453
Author: Lukas Wunner <lukas@wunner.de>
Date:   Wed Jun 8 12:49:29 2016 +0200

    drm: Remove dev_pm_ops from drm_class

so this patch is in practice a fix for this commit. Another reason for
the bug staying hidden for so long is that the optimization for a device
is disabled if it's disabled for any of its children devices. i915 may
have a backlight device as its child which doesn't support runtime PM
and so doesn't allow the optimization either.  So if this backlight
device got registered the bug stayed hidden.

Credits to Marta, Tomi and David who enabled pstore logging,
that caught one instance of this issue across a suspend/
resume-to-ram and Ville who rememberd that the optimization was enabled
for some devices at one point.

The first WARN triggered by the problem:

[ 6250.746445] WARNING: CPU: 2 PID: 17384 at drivers/gpu/drm/i915/intel_runtime_pm.c:2846 intel_runtime_pm_get+0x6b/0xd0 [i915]
[ 6250.746448] pm_runtime_get_sync() failed: -13
[ 6250.746451] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul
snd_hda_codec_realtek snd_hda_codec_generic ghash_clmulni_intel e1000e snd_hda_codec snd_hwdep snd_hda_core ptp mei_me pps_core snd_pcm lpc_ich mei prime_
numbers i2c_hid i2c_designware_platform i2c_designware_core [last unloaded: i915]
[ 6250.746512] CPU: 2 PID: 17384 Comm: kworker/u8:0 Tainted: G     U  W       4.11.0-rc5-CI-CI_DRM_334+ #1
[ 6250.746515] Hardware name:                  /NUC5i5RYB, BIOS RYBDWi35.86A.0362.2017.0118.0940 01/18/2017
[ 6250.746521] Workqueue: events_unbound async_run_entry_fn
[ 6250.746525] Call Trace:
[ 6250.746530]  dump_stack+0x67/0x92
[ 6250.746536]  __warn+0xc6/0xe0
[ 6250.746542]  ? pci_restore_standard_config+0x40/0x40
[ 6250.746546]  warn_slowpath_fmt+0x46/0x50
[ 6250.746553]  ? __pm_runtime_resume+0x56/0x80
[ 6250.746584]  intel_runtime_pm_get+0x6b/0xd0 [i915]
[ 6250.746610]  intel_display_power_get+0x1b/0x40 [i915]
[ 6250.746646]  i915_audio_component_get_power+0x15/0x20 [i915]
[ 6250.746654]  snd_hdac_display_power+0xc8/0x110 [snd_hda_core]
[ 6250.746661]  azx_runtime_resume+0x218/0x280 [snd_hda_intel]
[ 6250.746667]  pci_pm_runtime_resume+0x76/0xa0
[ 6250.746672]  __rpm_callback+0xb4/0x1f0
[ 6250.746677]  ? pci_restore_standard_config+0x40/0x40
[ 6250.746682]  rpm_callback+0x1f/0x80
[ 6250.746686]  ? pci_restore_standard_config+0x40/0x40
[ 6250.746690]  rpm_resume+0x4ba/0x740
[ 6250.746698]  __pm_runtime_resume+0x49/0x80
[ 6250.746703]  pci_pm_suspend+0x57/0x140
[ 6250.746709]  dpm_run_callback+0x6f/0x330
[ 6250.746713]  ? pci_pm_freeze+0xe0/0xe0
[ 6250.746718]  __device_suspend+0xf9/0x370
[ 6250.746724]  ? dpm_watchdog_set+0x60/0x60
[ 6250.746730]  async_suspend+0x1a/0x90
[ 6250.746735]  async_run_entry_fn+0x34/0x160
[ 6250.746741]  process_one_work+0x1f2/0x6d0
[ 6250.746749]  worker_thread+0x49/0x4a0
[ 6250.746755]  kthread+0x107/0x140
[ 6250.746759]  ? process_one_work+0x6d0/0x6d0
[ 6250.746763]  ? kthread_create_on_node+0x40/0x40
[ 6250.746768]  ret_from_fork+0x2e/0x40
[ 6250.746778] ---[ end trace 102a62fd2160f5e6 ]---

v2:
- Use the new pci_dev->needs_resume flag, to avoid any overhead during
  the ->pm_prepare hook. (Rafael)

v3:
- Update commit message to reference the actual regressing commit.
  (Lukas)

v4:
- Rebase on v4 of patch 1/2.

Fixes: d14d2a8453 ("drm: Remove dev_pm_ops from drm_class")
References: https://bugs.freedesktop.org/show_bug.cgi?id=100378
References: https://bugs.freedesktop.org/show_bug.cgi?id=100770
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Tomi Sarvela <tomi.p.sarvela@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1493726649-32094-2-git-send-email-imre.deak@intel.com
(cherry picked from commit adfdf85d79)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:17 +02:00
d7c0a50b21 PCI/PM: Add needs_resume flag to avoid suspend complete optimization
commit 4d071c3238 upstream.

Some drivers - like i915 - may not support the system suspend direct
complete optimization due to differences in their runtime and system
suspend sequence.  Add a flag that when set resumes the device before
calling the driver's system suspend handlers which effectively disables
the optimization.

Needed by a future patch fixing suspend/resume on i915.

Suggested by Rafael.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: stable@vger.kernel.org
(rebased on v4.8, added kernel version to commit message stable tag)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:17 +02:00
92220696d5 drm/i915: Do not drop pagetables when empty
This is the minimal backport for stable of the upstream commit:

commit dd19674bac
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Feb 15 08:43:46 2017 +0000

    drm/i915: Remove bitmap tracking for used-ptes

Due to a race with the shrinker, when we try to allocate a pagetable, we
may end up shrinking it instead. This comes as a nasty surprise as we
try to dereference it to fill in the pagetable entries for the object.

In linus/master this is fixed by pinning the pagetables prior to
allocation, but that backport is roughly
 drivers/gpu/drm/i915/i915_gem_gtt.c |   10 ----------
 1 file changed, 10 deletions(-)
i.e. unsuitable for stable. Instead we neuter the code that tried to
free the pagetables.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99295
Fixes: 2ce5179fe8 ("drm/i915/gtt: Free unused lower-level page tables")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.10+
Tested-by: Maël Lavault <mael.lavault@protonmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:44:17 +02:00
be8335f6b5 Linux 4.11.5 2017-06-14 15:08:04 +02:00
2c2c503792 kthread: fix boot hang (regression) on MIPS/OpenRISC
commit b0f5a8f32e upstream.

This fixes a regression in commit 4d6501dce0 where I didn't notice
that MIPS and OpenRISC were reinitialising p->{set,clear}_child_tid to
NULL after our initialisation in copy_process().

We can simply get rid of the arch-specific initialisation here since it
is now always done in copy_process() before hitting copy_thread{,_tls}().

Review notes:

 - As far as I can tell, copy_process() is the only user of
   copy_thread_tls(), which is the only caller of copy_thread() for
   architectures that don't implement copy_thread_tls().

 - After this patch, there is no arch-specific code touching
   p->set_child_tid or p->clear_child_tid whatsoever.

 - It may look like MIPS/OpenRISC wanted to always have these fields be
   NULL, but that's not true, as copy_process() would unconditionally
   set them again _after_ calling copy_thread_tls() before commit
   4d6501dce0.

Fixes: 4d6501dce0 ("kthread: Fix use-after-free if kthread fork fails")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net> # MIPS only
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Jamie Iles <jamie.iles@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
1a422e547a netfilter: nft_set_rbtree: handle element re-addition after deletion
commit d2df92e98a upstream.

The existing code selects no next branch to be inspected when
re-inserting an inactive element into the rb-tree, looping endlessly.
This patch restricts the check for active elements to the EEXIST case
only.

Fixes: e701001e7c ("netfilter: nft_rbtree: allow adjacent intervals with dynamic updates")
Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
6b183d1b84 drm/i915/vbt: split out defaults that are set when there is no VBT
commit bb1d132935 upstream.

The main thing are the DDI ports. If there's a VBT that says there are
no outputs, we should trust that, and not have semi-random
defaults. Unfortunately, the defaults have resulted in some Chromebooks
without VBT to rely on this behaviour, so we split out the defaults for
the missing VBT case.

Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/95c26079ff640d43f53b944f17e9fc356b36daec.1489152288.git.jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
07f0aa40fc drm/i915/vbt: don't propagate errors from intel_bios_init()
commit 665788572c upstream.

We don't use the error return for anything other than reporting and
logging that there is no VBT. We can pull the logging in the function,
and remove the error status return. Moreover, if we needed the
information for something later on, we'd probably be better off storing
the bit in dev_priv, and using it where it's needed, instead of using
the error return.

While at it, improve the comments.

Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/438ebbb0d5f0d321c625065b9cc78532a1dab24f.1489152288.git.jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
1df30f8264 audit: fix the RCU locking for the auditd_connection structure
commit 48d0e023af upstream.

Cong Wang correctly pointed out that the RCU read locking of the
auditd_connection struct was wrong, this patch correct this by
adopting a more traditional, and correct RCU locking model.

This patch is heavily based on an earlier prototype by Cong Wang.

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
f17b15c896 hwmon: (coretemp) Handle frozen hotplug state correctly
commit 90b4f30b6d upstream.

The recent conversion to the hotplug state machine missed that the original
hotplug notifiers did not execute in the frozen state, which is used on
suspend on resume.

This does not matter on single socket machines, but on multi socket systems
this breaks when the device for a non-boot socket is removed when the last
CPU of that socket is brought offline. The device removal locks up the
machine hard w/o any debug output.

Prevent executing the hotplug callbacks when cpuhp_tasks_frozen is true.

Thanks to Tommi for providing debug information patiently while I failed to
spot the obvious.

Fixes: e00ca5df37 ("hwmon: (coretemp) Convert to hotplug state machine")
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: "Chen, Yu C" <yu.c.chen@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:51 +02:00
dcaa3c1ec9 iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size
commit a008c31c7e upstream.

On a ppc64 machine executing overlayfs/019 with xfs as the lower and
upper filesystem causes the following call trace,

WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
Modules linked in:
CPU: 2 PID: 8034 Comm: fsstress Tainted: G             L  4.11.0-rc5-next-20170405 #100
task: c000000631314880 task.stack: c0000003915d4000
NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
REGS: c0000003915d7570 TRAP: 0700   Tainted: G             L   (4.11.0-rc5-next-20170405)
MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
  CR: 24004284  XER: 00000000
CFAR: c0000000006f7190 SOFTE: 1
GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
Call Trace:
[c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
[c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
[c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
[c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
[c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
[c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
[c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
[c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
[c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
Instruction dump:
78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378

The above problem can also be recreated on a regular xfs filesystem
using the command,

$ fsstress -d /mnt -l 1000 -n 1000 -p 1000

The reason for the call trace is,
1. When 'reserving' blocks for delayed allocation , XFS reserves more
   blocks (i.e. past file's current EOF) than required. This is done
   because XFS assumes that userspace might write more data and hence
   'reserving' more blocks might lead to the file's new data being
   stored contiguously on disk.
2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
   then cover the prealloc-ed EOF blocks in addition to the regular blocks.
3. When flushing the dirty blocks to disk, we only flush data till the
   file's EOF. But before writing out the dirty data, we allocate blocks
   on the disk for holding the file's new data. This allocation includes
   the blocks that are part of the 'prealloc EOF blocks'.
4. Later, when the last reference to the inode is being closed, XFS frees the
   unused 'prealloc EOF blocks' in xfs_inactive().

In step 3 above, When allocating space on disk for the delayed allocation
range, the space allocator might sometimes allocate less blocks than
required. If such an allocation ends right at the current EOF of the
file, We will not be able to clear the "delayed allocation" flag for the
'prealloc EOF blocks', since we won't have dirty buffer heads associated
with that range of the file.

In such a situation if a Direct I/O read operation is performed on file
range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
range [X, Y] and invalidate page cache for that range (Refer to
iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
the extent items (which are still cached in memory) for the file
range. When doing so we are not supposed to get an extent item with
IOMAP_DELALLOC flag set, since the previous "flush" operation should
have converted any delayed allocation data in the range [X, Y]. Hence we
end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().

This commit fixes the bug by preventing the read operation from going
beyond iomap_dio->i_size.

Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
695b2bd64b cgroup: mark cgroup_get() with __maybe_unused
commit 310b4816a5 upstream.

a590b90d47 ("cgroup: fix spurious warnings on cgroup_is_dead() from
cgroup_sk_alloc()") converted most cgroup_get() usages to
cgroup_get_live() leaving cgroup_sk_alloc() the sole user of
cgroup_get().  When !CONFIG_SOCK_CGROUP_DATA, this ends up triggering
unused warning for cgroup_get().

Silence the warning by adding __maybe_unused to cgroup_get().

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/20170501145340.17e8ef86@canb.auug.org.au
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
ba301861f6 pinctrl: cherryview: Add terminate entry for dmi_system_id tables
commit a9de080bbc upstream.

Make sure dmi_system_id tables are NULL terminated.

Fixes: 7036502783 ("pinctrl: cherryview: Add a quirk to make Acer
Chromebook keyboard work again")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
944601cf16 serial: sh-sci: Fix panic when serial console and DMA are enabled
commit 3c9101766b upstream.

This patch fixes an issue that kernel panic happens when DMA is enabled
and we press enter key while the kernel booting on the serial console.

* An interrupt may occur after sci_request_irq().
* DMA transfer area is initialized by setup_timer() in sci_request_dma()
  and used in interrupt.

If an interrupt occurred between sci_request_irq() and setup_timer() in
sci_request_dma(), DMA transfer area has not been initialized yet.
So, this patch changes the order of sci_request_irq() and
sci_request_dma().

Fixes: 73a19e4c03 ("serial: sh-sci: Add DMA support.")
Signed-off-by: Takatoshi Akiyama <takatoshi.akiyama.kj@ps.hitachi-solutions.com>
[Shimoda changes the commit log]
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
4e979ac972 drm/i915/skl: Add missing SKL ID
commit ca7a45ba6f upstream.

Used by production device:
    Intel(R) Iris(TM) Graphics P555

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227112256.20060-1-michal.winiarski@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
fa8a1ce792 drm/i915: Fix runtime PM for LPE audio
commit 668e3b014a upstream.

Not calling pm_runtime_enable() means that runtime PM can't be
enabled at all via sysfs. So we definitely need to call it
from somewhere.

Calling it from the driver seems like a bad idea because it
would have to be paired with a pm_runtime_disable() at driver
unload time, otherwise the core gets upset. Also if there's
no LPE audio driver loaded then we couldn't runtime suspend
i915 either.

So it looks like a better plan is to call it from i915 when
we register the platform device. That seems to match how
pci generally does things. I cargo culted the
pm_runtime_forbid() and pm_runtime_set_active() calls from
pci as well.

The exposed runtime PM API is massive an thorougly misleading, so
I don't actually know if this is how you're supposed to use the API
or not. But it seems to work. I can now runtime suspend i915 again
with or without the LPE audio driver loaded, and reloading the
LPE audio driver also seems to work.

Note that powertop won't auto-tune runtime PM for platform devices,
which is a little annoying. So I'm not sure that leaving runtime
PM in "on" mode by default is the best choice here. But I've left
it like that for now at least.

Also remove the comment about there not being much benefit from
LPE audio runtime PM. Not allowing runtime PM blocks i915 runtime
PM, which will also block s0ix, and that could have a measurable
impact on power consumption.

Cc: Takashi Iwai <tiwai@suse.de>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Fixes: 0b6b524f39 ("ALSA: x86: Don't enable runtime PM as default")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170427160231.13337-2-ville.syrjala@linux.intel.com
Reviewed-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 183c00350c)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
fd6c3d0a3b drivers: char: mem: Fix wraparound check to allow mappings up to the end
commit 32829da54d upstream.

A recent fix to /dev/mem prevents mappings from wrapping around the end
of physical address space. However, the check was written in a way that
also prevents a mapping reaching just up to the end of physical address
space, which may be a valid use case (especially on 32-bit systems).
This patch fixes it by checking the last mapped address (instead of the
first address behind that) for overflow.

Fixes: b299cde245 ("drivers: char: mem: Check for address space wraparound with mmap()")
Reported-by: Nico Huber <nico.h@gmx.de>
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
fc337d0e00 cpu/hotplug: Drop the device lock on error
commit 40da1b11f0 upstream.

If a custom CPU target is specified and that one is not available _or_
can't be interrupted then the code returns to userland without dropping a
lock as notices by lockdep:

|echo 133 > /sys/devices/system/cpu/cpu7/hotplug/target
| ================================================
| [ BUG: lock held when returning to user space! ]
| ------------------------------------------------
| bash/503 is leaving the kernel with locks still held!
| 1 lock held by bash/503:
|  #0:  (device_hotplug_lock){+.+...}, at: [<ffffffff815b5650>] lock_device_hotplug_sysfs+0x10/0x40

So release the lock then.

Fixes: 757c989b99 ("cpu/hotplug: Make target state writeable")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170602142714.3ogo25f2wbq6fjpj@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:50 +02:00
4fa867a37a ASoC: Fix use-after-free at card unregistration
commit 4efda5f213 upstream.

soc_cleanup_card_resources() call snd_card_free() at the last of its
procedure.  This turned out to lead to a use-after-free.
PCM runtimes have been already removed via soc_remove_pcm_runtimes(),
while it's dereferenced later in soc_pcm_free() called via
snd_card_free().

The fix is simple: just move the snd_card_free() call to the beginning
of the whole procedure.  This also gives another benefit: it
guarantees that all operations have been shut down before actually
releasing the resources, which was racy until now.

Reported-and-tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
6d4bee6600 ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
commit ba3021b2c7 upstream.

snd_timer_user_tselect() reallocates the queue buffer dynamically, but
it forgot to reset its indices.  Since the read may happen
concurrently with ioctl and snd_timer_user_tselect() allocates the
buffer via kmalloc(), this may lead to the leak of uninitialized
kernel-space data, as spotted via KMSAN:

  BUG: KMSAN: use of unitialized memory in snd_timer_user_read+0x6c4/0xa10
  CPU: 0 PID: 1037 Comm: probe Not tainted 4.11.0-rc5+ #2739
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x143/0x1b0 lib/dump_stack.c:52
   kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
   kmsan_check_memory+0xc2/0x140 mm/kmsan/kmsan.c:1086
   copy_to_user ./arch/x86/include/asm/uaccess.h:725
   snd_timer_user_read+0x6c4/0xa10 sound/core/timer.c:2004
   do_loop_readv_writev fs/read_write.c:716
   __do_readv_writev+0x94c/0x1380 fs/read_write.c:864
   do_readv_writev fs/read_write.c:894
   vfs_readv fs/read_write.c:908
   do_readv+0x52a/0x5d0 fs/read_write.c:934
   SYSC_readv+0xb6/0xd0 fs/read_write.c:1021
   SyS_readv+0x87/0xb0 fs/read_write.c:1018

This patch adds the missing reset of queue indices.  Together with the
previous fix for the ioctl/read race, we cover the whole problem.

Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
9018818b24 ALSA: timer: Fix race between read and ioctl
commit d11662f4f7 upstream.

The read from ALSA timer device, the function snd_timer_user_tread(),
may access to an uninitialized struct snd_timer_user fields when the
read is concurrently performed while the ioctl like
snd_timer_user_tselect() is invoked.  We have already fixed the races
among ioctls via a mutex, but we seem to have forgotten the race
between read vs ioctl.

This patch simply applies (more exactly extends the already applied
range of) tu->ioctl_lock in snd_timer_user_tread() for closing the
race window.

Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
a8282f018b drm/nouveau/tmr: fully separate alarm execution/pending lists
commit b4e382ca75 upstream.

Reusing the list_head for both is a bad idea.  Callback execution is done
with the lock dropped so that alarms can be rescheduled from the callback,
which means that with some unfortunate timing, lists can get corrupted.

The execution list should not require its own locking, the single function
that uses it can only be called from a single context.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
a7e7ade1b4 x86/microcode/intel: Clear patch pointer before jettisoning the initrd
commit 5b0bc9ac2c upstream.

During early boot, load_ucode_intel_ap() uses __load_ucode_intel()
to obtain a pointer to the relevant microcode patch (embedded in the
initrd), and stores this value in 'intel_ucode_patch' to speed up the
microcode patch application for subsequent CPUs.

On resuming from suspend-to-RAM, however, load_ucode_ap() calls
load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is
long gone so the pointer stored in 'intel_ucode_patch' no longer points to
a valid microcode patch.

Clear that pointer so that we effectively fall back to the CPU hotplug
notifier callbacks to update the microcode.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
[ Edit and massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
3bc7a4a564 drm/vmwgfx: Make sure backup_handle is always valid
commit 07678eca2c upstream.

When vmw_gb_surface_define_ioctl() is called with an existing buffer,
we end up returning an uninitialized variable in the backup_handle.

The fix is to first initialize backup_handle to 0 just to be sure, and
second, when a user-provided buffer is found, we will use the
req->buffer_handle as the backup_handle.

Reported-by: Murray McAllister <murray.mcallister@insomniasec.com>
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
6a6a485719 drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
commit ee9c4e681e upstream.

The 'req->mip_levels' parameter in vmw_gb_surface_define_ioctl() is
a user-controlled 'uint32_t' value which is used as a loop count limit.
This can lead to a kernel lockup and DoS. Add check for 'req->mip_levels'.

References:
https://bugzilla.redhat.com/show_bug.cgi?id=1437431

Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
8c704276e2 drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
commit f0c62e9878 upstream.

If vmalloc() fails then we need to a bit of cleanup before returning.

Fixes: fb1d9738ca ("drm/vmwgfx: Add DRM driver for VMware Virtual GPU")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:49 +02:00
3e857aaa8f net: qcom/emac: do not use hardware mdio automatic polling
commit 246096690b upstream.

Use software polling (PHY_POLL) to check for link state changes instead
of relying on the EMAC's hardware polling feature.  Some PHY drivers
are unable to get a functioning link because the HW polling is not
robust enough.

The EMAC is able to poll the PHY on the MDIO bus looking for link state
changes (via the Link Status bit in the Status Register at address 0x1).
When the link state changes, the EMAC triggers an interrupt and tells the
driver what the new state is.  The feature eliminates the need for
software to poll the MDIO bus.

Unfortunately, this feature is incompatible with phylib, because it
ignores everything that the PHY core and PHY drivers are trying to do.
In particular:

1. It assumes a compatible register set, so PHYs with different registers
   may not work.

2. It doesn't allow for hardware errata that have work-arounds implemented
   in the PHY driver.

3. It doesn't support multiple register pages. If the PHY core switches
   the register set to another page, the EMAC won't know the page has
   changed and will still attempt to read the same PHY register.

4. It only checks the copper side of the link, not the SGMII side.  Some
   PHY drivers (e.g. at803x) may also check the SGMII side, and
   report the link as not ready during autonegotiation if the SGMII link
   is still down.  Phylib then waits for another interrupt to query
   the PHY again, but the EMAC won't send another interrupt because it
   thinks the link is up.

Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
cadb844501 srcu: Allow use of Classic SRCU from both process and interrupt context
commit 1123a60416 upstream.

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a3d ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Fixes: 719d93cd5f ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: Linu Cherian <linuc.decode@gmail.com>
Suggested-by: Linu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
574f76fc58 perf/core: Drop kernel samples even though :u is specified
commit cc1582c231 upstream.

When doing sampling, for example:

  perf record -e cycles:u ...

On workloads that do a lot of kernel entry/exits we see kernel
samples, even though :u is specified. This is due to skid existing.

This might be a security issue because it can leak kernel addresses even
though kernel sampling support is disabled.

The patch drops the kernel samples if exclude_kernel is specified.

For example, test on Haswell desktop:

  perf record -e cycles:u <mgen>
  perf report --stdio

Before patch applied:

    99.77%  mgen     mgen              [.] buf_read
     0.20%  mgen     mgen              [.] rand_buf_init
     0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
     0.00%  mgen     mgen              [.] last_free_elem
     0.00%  mgen     libc-2.23.so      [.] __random_r
     0.00%  mgen     libc-2.23.so      [.] _int_malloc
     0.00%  mgen     mgen              [.] rand_array_init
     0.00%  mgen     [kernel.vmlinux]  [k] page_fault
     0.00%  mgen     libc-2.23.so      [.] __random
     0.00%  mgen     libc-2.23.so      [.] __strcasestr
     0.00%  mgen     ld-2.23.so        [.] strcmp
     0.00%  mgen     ld-2.23.so        [.] _dl_start
     0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so        [.] _start

We can see kernel symbols apic_timer_interrupt and page_fault.

After patch applied:

    99.79%  mgen     mgen           [.] buf_read
     0.19%  mgen     mgen           [.] rand_buf_init
     0.00%  mgen     libc-2.23.so   [.] __random_r
     0.00%  mgen     mgen           [.] rand_array_init
     0.00%  mgen     mgen           [.] last_free_elem
     0.00%  mgen     libc-2.23.so   [.] vfprintf
     0.00%  mgen     libc-2.23.so   [.] rand
     0.00%  mgen     libc-2.23.so   [.] __random
     0.00%  mgen     libc-2.23.so   [.] _int_malloc
     0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
     0.00%  mgen     ld-2.23.so     [.] do_lookup_x
     0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
     0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
     0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
     0.00%  mgen     ld-2.23.so     [.] _start

There are only userspace symbols.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Cc: kan.liang@intel.com
Cc: mark.rutland@arm.com
Cc: will.deacon@arm.com
Cc: yao.jin@intel.com
Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
52782a1e16 Revert "ata: sata_mv: Convert to devm_ioremap_resource()"
commit 3e4240da0e upstream.

This reverts commit 368e5fbdfc.

devm_ioremap_resource() enforces that there are no overlapping
resources, where as devm_ioremap() does not. The sata phy driver needs
a subset of the sata IO address space, so maps some of the sata
address space. As a result, sata_mv now fails to probe, reporting it
cannot get its resources, and so we don't have any SATA disks.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
93dfb4e6a7 powerpc/kernel: Initialize load_tm on task creation
commit 7f22ced437 upstream.

Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.

This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).

Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
154b657008 powerpc/kernel: Fix FP and vector register restoration
commit 1195892c09 upstream.

Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).

These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.

Fixes: 70fe3d980f ("powerpc: Restore FPU/VEC/VSX if previously used")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
c3e2874653 powerpc/hotplug-mem: Fix missing endian conversion of aa_index
commit dc421b200f upstream.

When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property.  Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.

Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
840407e3da powerpc/numa: Fix percpu allocations to be NUMA aware
commit ba4a648f12 upstream.

In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:48 +02:00
2bcd25cbc6 powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs function
commit 6f553912ee upstream.

of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference.

of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data, therefore ->save_regs() cannot use gpiochip_get_data()

Fixes: 937daafca7 ("powerpc: simple-gpio: use gpiochip data pointer")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
0ce1692273 scsi: qla2xxx: Fix mailbox pointer error in fwdump capture
commit 74939a0bc7 upstream.

Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
223b1549f0 scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC
commit 1d63496516 upstream.

Set bit (BIT_15) to send right ECHO payload information for Diagnostic
Echo Test command.

Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
67034eaa0d scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues
commit ce6c668b14 upstream.

Firmware dump allows for debugging customer issues. This patch fixes
start/end pointer calculation to capture T262 template entry for dump
tool.

Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
21fffaa1b7 scsi: qla2xxx: Fix NULL pointer access due to redundant fc_host_port_name call
commit 0ea88662b5 upstream.

Remove redundant fc_host_port_name calls to prevent early access of
scsi_host->shost_data buffer. This prevent null pointer access.

Following stack trace is seen:

BUG: unable to handle kernel NULL pointer dereference at 00000000000008
IP: qla24xx_report_id_acquisition+0x22d/0x3a0 [qla2xxx]

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
96e0d45552 scsi: qla2xxx: Fix crash due to mismatch mumber of Q-pair creation for Multi queue
commit b95b9452aa upstream.

when driver is loaded with Multi Queue enabled, it was noticed that
there was one less queue pair created.

Following message would indicate this:

"No resources to create additional q pair."

The result of one less queue pair means that system can crash, if the
block mq layer thinks there is an extra hardware queue available, and
the driver will use a NULL ptr qpair in that instance.

Following stack trace is seen in one of the crash:

irq_create_affinity_masks+0x98/0x530
irq_create_affinity_masks+0x98/0x530
__pci_enable_msix+0x321/0x4e0
mutex_lock+0x12/0x40
pci_alloc_irq_vectors_affinity+0xb5/0x140
qla24xx_enable_msix+0x79/0x530 [qla2xxx]
qla2x00_request_irqs+0x61/0x2d0 [qla2xxx]
qla2x00_probe_one+0xc73/0x2390 [qla2xxx]
ida_simple_get+0x98/0x100
kernfs_next_descendant_post+0x40/0x50
local_pci_probe+0x45/0xa0
pci_device_probe+0xfc/0x140
driver_probe_device+0x2c5/0x470
__driver_attach+0xdd/0xe0
driver_probe_device+0x470/0x470
bus_for_each_dev+0x6c/0xc0
driver_attach+0x1e/0x20
bus_add_driver+0x45/0x270
driver_register+0x60/0xe0
__pci_register_driver+0x4c/0x50
qla2x00_module_init+0x1ce/0x21e [qla2xxx]

Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
083dca440b scsi: qla2xxx: Fix recursive loop during target mode configuration for ISP25XX leaving system unresponsive
commit cb590700e0 upstream.

Following messages are seen into system logs

qla2xxx [0000:09:00.0]-00af:9: Performing ISP error recovery - ha=ffff98315ee30000.
qla2xxx [0000:09:00.0]-504b:9: RISC paused -- HCCR=40, Dumping firmware.
qla2xxx [0000:09:00.0]-d009:9: Firmware has been previously dumped (ffffba488c001000) -- ignoring request.
qla2xxx [0000:09:00.0]-504b:9: RISC paused -- HCCR=40, Dumping firmware.

See Bugzilla for details
https://bugzilla.kernel.org/show_bug.cgi?id=195285

Fixes: d74595278f ("scsi: qla2xxx: Add multiple queue pair functionality.")
Reported-by: Laurence Oberman <loberman@redhat.com>
Reported-by: Anthony Bloodoff <anthony.bloodoff@gmail.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Anthony Bloodoff <anthony.bloodoff@gmail.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
daa15f6f84 scsi: qla2xxx: don't disable a not previously enabled PCI device
commit ddff7ed45e upstream.

When pci_enable_device() or pci_enable_device_mem() fail in
qla2x00_probe_one() we bail out but do a call to
pci_disable_device(). This causes the dev_WARN_ON() in
pci_disable_device() to trigger, as the device wasn't enabled
previously.

So instead of taking the 'probe_out' error path we can directly return
*iff* one of the pci_enable_device() calls fails.

Additionally rename the 'probe_out' goto label's name to the more
descriptive 'disable_device'.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: e315cd28b9 ("[SCSI] qla2xxx: Code changes for qla data structure refactoring")
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:47 +02:00
2bae71aa85 KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
commit d6dbdd3c85 upstream.

Under memory pressure, we start ageing pages, which amounts to parsing
the page tables. Since we don't want to allocate any extra level,
we pass NULL for our private allocation cache. Which means that
stage2_get_pud() is allowed to fail. This results in the following
splat:

[ 1520.409577] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 1520.417741] pgd = ffff810f52fef000
[ 1520.421201] [00000008] *pgd=0000010f636c5003, *pud=0000010f56f48003, *pmd=0000000000000000
[ 1520.429546] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 1520.435156] Modules linked in:
[ 1520.438246] CPU: 15 PID: 53550 Comm: qemu-system-aar Tainted: G        W       4.12.0-rc4-00027-g1885c397eaec #7205
[ 1520.448705] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB12A 10/26/2016
[ 1520.463726] task: ffff800ac5fb4e00 task.stack: ffff800ce04e0000
[ 1520.469666] PC is at stage2_get_pmd+0x34/0x110
[ 1520.474119] LR is at kvm_age_hva_handler+0x44/0xf0
[ 1520.478917] pc : [<ffff0000080b137c>] lr : [<ffff0000080b149c>] pstate: 40000145
[ 1520.486325] sp : ffff800ce04e33d0
[ 1520.489644] x29: ffff800ce04e33d0 x28: 0000000ffff40064
[ 1520.494967] x27: 0000ffff27e00000 x26: 0000000000000000
[ 1520.500289] x25: ffff81051ba65008 x24: 0000ffff40065000
[ 1520.505618] x23: 0000ffff40064000 x22: 0000000000000000
[ 1520.510947] x21: ffff810f52b20000 x20: 0000000000000000
[ 1520.516274] x19: 0000000058264000 x18: 0000000000000000
[ 1520.521603] x17: 0000ffffa6fe7438 x16: ffff000008278b70
[ 1520.526940] x15: 000028ccd8000000 x14: 0000000000000008
[ 1520.532264] x13: ffff7e0018298000 x12: 0000000000000002
[ 1520.537582] x11: ffff000009241b93 x10: 0000000000000940
[ 1520.542908] x9 : ffff0000092ef800 x8 : 0000000000000200
[ 1520.548229] x7 : ffff800ce04e36a8 x6 : 0000000000000000
[ 1520.553552] x5 : 0000000000000001 x4 : 0000000000000000
[ 1520.558873] x3 : 0000000000000000 x2 : 0000000000000008
[ 1520.571696] x1 : ffff000008fd5000 x0 : ffff0000080b149c
[ 1520.577039] Process qemu-system-aar (pid: 53550, stack limit = 0xffff800ce04e0000)
[...]
[ 1521.510735] [<ffff0000080b137c>] stage2_get_pmd+0x34/0x110
[ 1521.516221] [<ffff0000080b149c>] kvm_age_hva_handler+0x44/0xf0
[ 1521.522054] [<ffff0000080b0610>] handle_hva_to_gpa+0xb8/0xe8
[ 1521.527716] [<ffff0000080b3434>] kvm_age_hva+0x44/0xf0
[ 1521.532854] [<ffff0000080a58b0>] kvm_mmu_notifier_clear_flush_young+0x70/0xc0
[ 1521.539992] [<ffff000008238378>] __mmu_notifier_clear_flush_young+0x88/0xd0
[ 1521.546958] [<ffff00000821eca0>] page_referenced_one+0xf0/0x188
[ 1521.552881] [<ffff00000821f36c>] rmap_walk_anon+0xec/0x250
[ 1521.558370] [<ffff000008220f78>] rmap_walk+0x78/0xa0
[ 1521.563337] [<ffff000008221104>] page_referenced+0x164/0x180
[ 1521.569002] [<ffff0000081f1af0>] shrink_active_list+0x178/0x3b8
[ 1521.574922] [<ffff0000081f2058>] shrink_node_memcg+0x328/0x600
[ 1521.580758] [<ffff0000081f23f4>] shrink_node+0xc4/0x328
[ 1521.585986] [<ffff0000081f2718>] do_try_to_free_pages+0xc0/0x340
[ 1521.592000] [<ffff0000081f2a64>] try_to_free_pages+0xcc/0x240
[...]

The trivial fix is to handle this NULL pud value early, rather than
dereferencing it blindly.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
7751d94da9 Btrfs: fix delalloc accounting leak caused by u32 overflow
commit 70e7af244f upstream.

btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
For a nodesize of 16kB, this overflow happens at 16k items. Usually,
num_items is a small constant passed to btrfs_start_transaction(), but
we also use btrfs_calc_trans_metadata_size() for metadata reservations
for extent items in btrfs_delalloc_{reserve,release}_metadata().

In drop_outstanding_extents(), num_items is calculated as
inode->reserved_extents - inode->outstanding_extents. The difference
between these two counters is usually small, but if many delalloc
extents are reserved and then the outstanding extents are merged in
btrfs_merge_extent_hook(), the difference can become large enough to
overflow in btrfs_calc_trans_metadata_size().

The overflow manifests itself as a leak of a multiple of 4 GB in
delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
can cause early ENOSPC errors. Additionally, these WARN_ONs in
extent-tree.c will be hit when unmounting:

    WARN_ON(fs_info->delalloc_block_rsv.size > 0);
    WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
    WARN_ON(space_info->bytes_pinned > 0 ||
            space_info->bytes_reserved > 0 ||
            space_info->bytes_may_use > 0);

Fix it by casting nodesize to a u64 so that
btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
can't overflow with any existing uses, but it's better to be safe here
than have another hard-to-debug problem later on.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
acccdbef2c btrfs: fix race with relocation recovery and fs_root setup
commit a9b3311ef3 upstream.

If we have to recover relocation during mount, we'll ultimately have to
evict the orphan inode.  That goes through the reservation dance, where
priority_reclaim_metadata_space and flush_space expect fs_info->fs_root
to be valid.  That's the next thing to be set up during mount, so we
crash, almost always in flush_space trying to join the transaction
but priority_reclaim_metadata_space is possible as well.  This call
path has been problematic in the past WRT whether ->fs_root is valid
yet.  Commit 957780eb27 (Btrfs: introduce ticketed enospc
infrastructure) added new users that are called in the direct path
instead of the async path that had already been worked around.

The thing is that we don't actually need the fs_root, specifically, for
anything.  We either use it to determine whether the root is the
chunk_root for use in choosing an allocation profile or as a root to pass
btrfs_join_transaction before immediately committing it.  Anything that
isn't the chunk root works in the former case and any root works in
the latter.

A simple fix is to use a root we know will always be there: the
extent_root.

Fixes: 957780eb27 (Btrfs: introduce ticketed enospc infrastructure)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
5ca9daf722 btrfs: fix memory leak in update_space_info failure path
commit 896533a7da upstream.

If we fail to add the space_info kobject, we'll leak the memory
for the percpu counter.

Fixes: 6ab0a2029c (btrfs: publish allocation data in sysfs)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
6bc3d6a633 btrfs: use correct types for page indices in btrfs_page_exists_in_range
commit cc2b702c52 upstream.

Variables start_idx and end_idx are supposed to hold a page index
derived from the file offsets. The int type is not the right one though,
offsets larger than 1 << 44 will get silently trimmed off the high bits.
(1 << 44 is 16TiB)

What can go wrong, if start is below the boundary and end gets trimmed:
- if there's a page after start, we'll find it (radix_tree_gang_lookup_slot)
- the final check "if (page->index <= end_idx)" will unexpectedly fail

The function will return false, ie. "there's no page in the range",
although there is at least one.

btrfs_page_exists_in_range is used to prevent races in:

* in hole punching, where we make sure there are not pages in the
  truncated range, otherwise we'll wait for them to finish and redo
  truncation, but we're going to replace the pages with holes anyway so
  the only problem is the intermediate state

* lock_extent_direct: we want to make sure there are no pages before we
  lock and start DIO, to prevent stale data reads

For practical occurence of the bug, there are several constaints.  The
file must be quite large, the affected range must cross the 16TiB
boundary and the internal state of the file pages and pending operations
must match.  Also, we must not have started any ordered data in the
range, otherwise we don't even reach the buggy function check.

DIO locking tries hard in several places to avoid deadlocks with
buffered IO and avoids waiting for ranges. The worst consequence seems
to be stale data read.

CC: Liu Bo <bo.li.liu@oracle.com>
Fixes: fc4adbff82 ("btrfs: Drop EXTENT_UPTODATE check in hole punching and direct locking")
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
574ab1b8fb cxl: Avoid double free_irq() for psl,slice interrupts
commit b3aa20ba2b upstream.

During an eeh call to cxl_remove can result in double free_irq of
psl,slice interrupts. This can happen if perst_reloads_same_image == 1
and call to cxl_configure_adapter() fails during slot_reset
callback. In such a case we see a kernel oops with following back-trace:

Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
  free_irq+0x88/0xd0 (unreliable)
  cxl_unmap_irq+0x20/0x40 [cxl]
  cxl_native_release_psl_irq+0x78/0xd8 [cxl]
  pci_deconfigure_afu+0xac/0x110 [cxl]
  cxl_remove+0x104/0x210 [cxl]
  pci_device_remove+0x6c/0x110
  device_release_driver_internal+0x204/0x2e0
  pci_stop_bus_device+0xa0/0xd0
  pci_stop_and_remove_bus_device+0x28/0x40
  pci_hp_remove_devices+0xb0/0x150
  pci_hp_remove_devices+0x68/0x150
  eeh_handle_normal_event+0x140/0x580
  eeh_handle_event+0x174/0x360
  eeh_event_handler+0x1e8/0x1f0

This patch fixes the issue of double free_irq by checking that
variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
not '0' before un-mapping and resetting these variables to '0' when
they are un-mapped.

Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
0c94348b23 cxl: Fix error path on bad ioctl
commit cec422c11c upstream.

Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
ioctl. We shouldn't unlock the context status mutex as it was not
locked (yet).

Fixes: 0712dc7e73 ("cxl: Fix issues when unmapping contexts")
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
4907e3bb67 excessive checks in ufs_write_failed() and ufs_evict_inode()
commit babef37dcc upstream.

As it is, short copy in write() to append-only file will fail
to truncate the excessive allocated blocks.  As the matter of
fact, all checks in ufs_truncate_blocks() are either redundant
or wrong for that caller.  As for the only other caller
(ufs_evict_inode()), we only need the file type checks there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:46 +02:00
6af5db5d39 ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
commit 006351ac8e upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
c12c0c4ff5 ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
commit 940ef1a0ed upstream.

... and it really needs splitting into "new" and "extend" cases, but that's for
later

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
728154e963 ufs: set correct ->s_maxsize
commit 6b0d144fa7 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
d426b9575f ufs: restore maintaining ->i_blocks
commit eb315d2ae6 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
386e884c85 fix ufs_isblockset()
commit 414cf7186d upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
823c065a40 ufs: restore proper tail allocation
commit 8785d84d00 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
9be0c9d62d cpuset: consider dying css as offline
commit 41c25707d2 upstream.

In most cases, a cgroup controller don't care about the liftimes of
cgroups.  For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not.  Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period.  The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending.  This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
48b2c7c865 Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
commit 47eb0c8b4d upstream.

The Lifebook E546 and E557 touchpad were also not functioning and
worked after running:

        echo "1" > /sys/devices/platform/i8042/serio2/crc_enabled

Add them to the list of machines that need this workaround.

Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Reviewed-by: Arjan Opmeer <arjan@opmeer.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:45 +02:00
f3c1dfa84d cgroup: Prevent kill_css() from being called more than once
commit 33c35aa481 upstream.

The kill_css() function may be called more than once under the condition
that the css was killed but not physically removed yet followed by the
removal of the cgroup that is hosting the css. This patch prevents any
harmm from being done when that happens.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
d31fff8cbc rc-core: race condition during ir_raw_event_register()
commit 963761a0b2 upstream.

A rc device can call ir_raw_event_handle() after rc_allocate_device(),
but before rc_register_device() has completed. This is racey because
rcdev->raw is set before rcdev->raw->thread has a valid value.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
d9f48c4661 ahci: Acer SA5-271 SSD Not Detected Fix
commit 8bfd174312 upstream.

(Correction in this resend: fixed function name acer_sa5_271_workaround; fixed
 the always-true condition in the function; fixed description.)

On the Acer Switch Alpha 12 (model number: SA5-271), the internal SSD may not
get detected because the port_map and CAP.nr_ports combination causes the driver
to skip the port that is actually connected to the SSD. More specifically,
either all SATA ports are identified as DUMMY, or all ports get ``link down''
and never get up again.

This problem occurs occasionally. When this problem occurs, CAP may hold a
value of 0xC734FF00 or 0xC734FF01 and port_map may hold a value of 0x00 or 0x01.
When this problem does not occur, CAP holds a value of 0xC734FF02 and port_map
may hold a value of 0x07. Overriding the CAP value to 0xC734FF02 and port_map to
0x7 significantly reduces the occurrence of this problem.

Link: https://bugzilla.kernel.org/attachment.cgi?id=253091
Signed-off-by: Sui Chen <suichen6@gmail.com>
Tested-by: Damian Ivanov <damianatorrpm@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
39d584db44 drm/msm/mdp5: use __drm_atomic_helper_plane_duplicate_state()
commit 786813c343 upstream.

Somehow the helper was never retrofitted for mdp5.  Which meant when
plane_state->fence was added, it could get copied into new state in
mdp5_plane_duplicate_state().

If an update to disable the plane (for example on rmfb) managed to sneak
in after an nonblock update had swapped state, but before it was
committed, we'd get a splat:

    WARNING: CPU: 1 PID: 69 at ../drivers/gpu/drm/drm_atomic_helper.c:1061 drm_atomic_helper_wait_for_fences+0xe0/0xf8
   Modules linked in:

   CPU: 1 PID: 69 Comm: kworker/1:1 Tainted: G        W       4.11.0-rc8+ #1187
   Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
   Workqueue: events drm_mode_rmfb_work_fn
   task: ffffffc036560d00 task.stack: ffffffc036550000
   PC is at drm_atomic_helper_wait_for_fences+0xe0/0xf8
   LR is at complete_commit.isra.1+0x44/0x1c0
   pc : [<ffffff80084f6040>] lr : [<ffffff800854176c>] pstate: 20000145
   sp : ffffffc036553b60
   x29: ffffffc036553b60 x28: ffffffc0264e6a00
   x27: ffffffc035659000 x26: 0000000000000000
   x25: ffffffc0240e8000 x24: 0000000000000038
   x23: 0000000000000000 x22: ffffff800858f200
   x21: ffffffc0240e8000 x20: ffffffc02f56a800
   x19: 0000000000000000 x18: 0000000000000000
   x17: 0000000000000000 x16: 0000000000000000
   x15: 0000000000000000 x14: ffffffc00a192700
   x13: 0000000000000004 x12: 0000000000000000
   x11: ffffff80089a1690 x10: 00000000000008f0
   x9 : ffffffc036553b20 x8 : ffffffc036561650
   x7 : ffffffc03fe6cb40 x6 : 0000000000000000
   x5 : 0000000000000001 x4 : 0000000000000002
   x3 : ffffffc035659000 x2 : ffffffc0240e8c80
   x1 : 0000000000000000 x0 : ffffffc02adbe588

   ---[ end trace 13aeec77c3fb55e2 ]---
   Call trace:
   Exception stack(0xffffffc036553990 to 0xffffffc036553ac0)
   3980:                                   0000000000000000 0000008000000000
   39a0: ffffffc036553b60 ffffff80084f6040 0000000000004ff0 0000000000000038
   39c0: ffffffc0365539d0 ffffff800857e098 ffffffc036553a00 ffffff800857e1b0
   39e0: ffffffc036553a10 ffffff800857c554 ffffffc0365e8400 ffffffc0365e8400
   3a00: ffffffc036553a20 ffffff8008103358 000000000001aad7 ffffff800851b72c
   3a20: ffffffc036553a50 ffffff80080e9228 ffffffc02adbe588 0000000000000000
   3a40: ffffffc0240e8c80 ffffffc035659000 0000000000000002 0000000000000001
   3a60: 0000000000000000 ffffffc03fe6cb40 ffffffc036561650 ffffffc036553b20
   3a80: 00000000000008f0 ffffff80089a1690 0000000000000000 0000000000000004
   3aa0: ffffffc00a192700 0000000000000000 0000000000000000 0000000000000000
   [<ffffff80084f6040>] drm_atomic_helper_wait_for_fences+0xe0/0xf8
   [<ffffff800854176c>] complete_commit.isra.1+0x44/0x1c0
   [<ffffff8008541c64>] msm_atomic_commit+0x32c/0x350
   [<ffffff8008516230>] drm_atomic_commit+0x50/0x60
   [<ffffff8008517548>] drm_atomic_remove_fb+0x158/0x250
   [<ffffff80085186d0>] drm_framebuffer_remove+0x50/0x158
   [<ffffff8008518818>] drm_mode_rmfb_work_fn+0x40/0x58
   [<ffffff80080d5668>] process_one_work+0x1d0/0x378
   [<ffffff80080d5a54>] worker_thread+0x244/0x488
   [<ffffff80080db7fc>] kthread+0xfc/0x128
   [<ffffff8008082ec0>] ret_from_fork+0x10/0x50

Fixes: 9626014 ("drm/fence: add in-fences support")
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reported-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
a5ab52b38f drm/msm: Expose our reservation object when exporting a dmabuf.
commit 43523eba79 upstream.

Without this, polling on the dma-buf (and presumably other devices
synchronizing against our rendering) would return immediately, even
while the BO was busy.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
0354d1d64f target: Re-add check to reject control WRITEs with overflow data
commit 4ff83daa02 upstream.

During v4.3 when the overflow/underflow check was relaxed by
commit c72c525022:

  commit c72c525022
  Author: Roland Dreier <roland@purestorage.com>
  Date:   Wed Jul 22 15:08:18 2015 -0700

       target: allow underflow/overflow for PR OUT etc. commands

to allow underflow/overflow for Windows compliance + FCP, a
consequence was to allow control CDBs to process overflow
data for iscsi-target with immediate data as well.

As per Roland's original change, continue to allow underflow
cases for control CDBs to make Windows compliance + FCP happy,
but until overflow for control CDBs is supported tree-wide,
explicitly reject all control WRITEs with overflow following
pre v4.3.y logic.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
0eedb7832e cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
commit 6c77003677 upstream.

For a driver that does not set the CPUFREQ_STICKY flag, if all of the
->init() calls fail, cpufreq_register_driver() should return an error.
This will prevent the driver from loading.

Fixes: ce1bcfe94d (cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs)
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
86f95e53ed random: invalidate batched entropy after crng init
commit b169c13de4 upstream.

It's possible that get_random_{u32,u64} is used before the crng has
initialized, in which case, its output might not be cryptographically
secure. For this problem, directly, this patch set is introducing the
*_wait variety of functions, but even with that, there's a subtle issue:
what happens to our batched entropy that was generated before
initialization. Prior to this commit, it'd stick around, supplying bad
numbers. After this commit, we force the entropy to be re-extracted
after each phase of the crng has initialized.

In order to avoid a race condition with the position counter, we
introduce a simple rwlock for this invalidation. Since it's only during
this awkward transition period, after things are all set up, we stop
using it, so that it doesn't have an impact on performance.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:44 +02:00
0524867ee2 mei: make sysfs modalias format similar as uevent modalias
commit 6f9193ec04 upstream.

modprobe is not able to resolve sysfs modalias for mei devices.

 # cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
 # modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
 # cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt

commit b26864cad1 ("mei: bus: add client protocol
version to the device alias"), however sysfs modalias
is still in formmat mei:S:uuid:*.

This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.

Fixes: commit b26864cad1 ("mei: bus: add client protocol version to the device alias")
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
6712554818 block: Avoid that blk_exit_rl() triggers a use-after-free
commit b425e50492 upstream.

Since the introduction of .init_rq_fn() and .exit_rq_fn() it is
essential that the memory allocated for struct request_queue
stays around until all blk_exit_rl() calls have finished. Hence
make blk_init_rl() take a reference on struct request_queue.

This patch fixes the following crash:

general protection fault: 0000 [#2] SMP
CPU: 3 PID: 28 Comm: ksoftirqd/3 Tainted: G      D         4.12.0-rc2-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
task: ffff88013a108040 task.stack: ffffc9000071c000
RIP: 0010:free_request_size+0x1a/0x30
RSP: 0018:ffffc9000071fd38 EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX: ffff880067362a88 RCX: 0000000000000003
RDX: ffff880067464178 RSI: ffff880067362a88 RDI: ffff880135ea4418
RBP: ffffc9000071fd40 R08: 0000000000000000 R09: 0000000100180009
R10: ffffc9000071fd38 R11: ffffffff81110800 R12: ffff88006752d3d8
R13: ffff88006752d3d8 R14: ffff88013a108040 R15: 000000000000000a
FS:  0000000000000000(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa8ec1edb00 CR3: 0000000138ee8000 CR4: 00000000001406e0
Call Trace:
 mempool_destroy.part.10+0x21/0x40
 mempool_destroy+0xe/0x10
 blk_exit_rl+0x12/0x20
 blkg_free+0x4d/0xa0
 __blkg_release_rcu+0x59/0x170
 rcu_process_callbacks+0x260/0x4e0
 __do_softirq+0x116/0x250
 smpboot_thread_fn+0x123/0x1e0
 kthread+0x109/0x140
 ret_from_fork+0x31/0x40

Fixes: commit e9c787e65c ("scsi: allocate scsi_cmnd structures as part of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
698aa72067 iio: proximity: as3935: fix iio_trigger_poll issue
commit 9122b54f26 upstream.

Using iio_trigger_poll() can oops when multiple interrupts
happen before the first is handled.

Use iio_trigger_poll_chained() instead and use the timestamp
when processed, since it will be in theory be 2 ms max latency.

Fixes: 24ddb0e4bb ("iio: Add AS3935 lightning sensor support")
Signed-off-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
71c0950cd7 iio: proximity: as3935: fix AS3935_INT mask
commit 275292d3a3 upstream.

AS3935 interrupt mask has been incorrect so valid lightning events
would never trigger an buffer event. Also noise interrupt should be
BIT(0).

Fixes: 24ddb0e4bb ("iio: Add AS3935 lightning sensor support")
Signed-off-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
7b5d3c1a14 iio: trigger: fix NULL pointer dereference in iio_trigger_write_current()
commit 4eecbe8188 upstream.

In case oldtrig == trig == NULL (which happens when we set none
trigger, when there is already none set) there is a NULL pointer
dereference during iio_trigger_put(trig). Below is kernel output when
this occurs:

[   26.741790] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[   26.750179] pgd = cacc0000
[   26.752936] [00000000] *pgd=8adc6835, *pte=00000000, *ppte=00000000
[   26.759531] Internal error: Oops: 17 [#1] SMP ARM
[   26.764261] Modules linked in: usb_f_ncm u_ether usb_f_acm u_serial usb_f_fs libcomposite configfs evbug
[   26.773844] CPU: 0 PID: 152 Comm: synchro Not tainted 4.12.0-rc1 #2
[   26.780128] Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[   26.786329] task: cb1de200 task.stack: cac92000
[   26.790892] PC is at iio_trigger_write_current+0x188/0x1f4
[   26.796403] LR is at lock_release+0xf8/0x20c
[   26.800696] pc : [<c0736f34>]    lr : [<c016efb0>]    psr: 600d0013
[   26.800696] sp : cac93e30  ip : cac93db0  fp : cac93e5c
[   26.812193] r10: c0e64fe8  r9 : 00000000  r8 : 00000001
[   26.817436] r7 : cb190810  r6 : 00000010  r5 : 00000001  r4 : 00000000
[   26.823982] r3 : 00000000  r2 : 00000000  r1 : cb1de200  r0 : 00000000
[   26.830528] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   26.837683] Control: 10c5387d  Table: 8acc006a  DAC: 00000051
[   26.843448] Process synchro (pid: 152, stack limit = 0xcac92210)
[   26.849475] Stack: (0xcac93e30 to 0xcac94000)
[   26.853857] 3e20:                                     00000001 c0736dac c054033c cae6b680
[   26.862060] 3e40: cae6b680 00000000 00000001 cb3f8610 cac93e74 cac93e60 c054035c c0736db8
[   26.870264] 3e60: 00000001 c054033c cac93e94 cac93e78 c029bf34 c0540348 00000000 00000000
[   26.878469] 3e80: cb3f8600 cae6b680 cac93ed4 cac93e98 c029b320 c029bef0 00000000 00000000
[   26.886672] 3ea0: 00000000 cac93f78 cb2d41fc caed3280 c029b214 cac93f78 00000001 000e20f8
[   26.894874] 3ec0: 00000001 00000000 cac93f44 cac93ed8 c0221dcc c029b220 c0e1ca39 cb2d41fc
[   26.903079] 3ee0: cac93f04 cac93ef0 c0183ef0 c0183ab0 cb2d41fc 00000000 cac93f44 cac93f08
[   26.911282] 3f00: c0225eec c0183ebc 00000001 00000000 c0223728 00000000 c0245454 00000001
[   26.919485] 3f20: 00000001 caed3280 000e20f8 cac93f78 000e20f8 00000001 cac93f74 cac93f48
[   26.927690] 3f40: c0223680 c0221da4 c0246520 c0245460 caed3283 caed3280 00000000 00000000
[   26.935893] 3f60: 000e20f8 00000001 cac93fa4 cac93f78 c0224520 c02235e4 00000000 00000000
[   26.944096] 3f80: 00000001 000e20f8 00000001 00000004 c0107f84 cac92000 00000000 cac93fa8
[   26.952299] 3fa0: c0107de0 c02244e8 00000001 000e20f8 0000000e 000e20f8 00000001 fbad2484
[   26.960502] 3fc0: 00000001 000e20f8 00000001 00000004 beb6b698 00064260 0006421c beb6b4b4
[   26.968705] 3fe0: 00000000 beb6b450 b6f219a0 b6e2f268 800d0010 0000000e cac93ff4 cac93ffc
[   26.976896] Backtrace:
[   26.979388] [<c0736dac>] (iio_trigger_write_current) from [<c054035c>] (dev_attr_store+0x20/0x2c)
[   26.988289]  r10:cb3f8610 r9:00000001 r8:00000000 r7:cae6b680 r6:cae6b680 r5:c054033c
[   26.996138]  r4:c0736dac r3:00000001
[   26.999747] [<c054033c>] (dev_attr_store) from [<c029bf34>] (sysfs_kf_write+0x50/0x54)
[   27.007686]  r5:c054033c r4:00000001
[   27.011290] [<c029bee4>] (sysfs_kf_write) from [<c029b320>] (kernfs_fop_write+0x10c/0x224)
[   27.019579]  r7:cae6b680 r6:cb3f8600 r5:00000000 r4:00000000
[   27.025271] [<c029b214>] (kernfs_fop_write) from [<c0221dcc>] (__vfs_write+0x34/0x120)
[   27.033214]  r10:00000000 r9:00000001 r8:000e20f8 r7:00000001 r6:cac93f78 r5:c029b214
[   27.041059]  r4:caed3280
[   27.043622] [<c0221d98>] (__vfs_write) from [<c0223680>] (vfs_write+0xa8/0x170)
[   27.050959]  r9:00000001 r8:000e20f8 r7:cac93f78 r6:000e20f8 r5:caed3280 r4:00000001
[   27.058731] [<c02235d8>] (vfs_write) from [<c0224520>] (SyS_write+0x44/0x98)
[   27.065806]  r9:00000001 r8:000e20f8 r7:00000000 r6:00000000 r5:caed3280 r4:caed3283
[   27.073582] [<c02244dc>] (SyS_write) from [<c0107de0>] (ret_fast_syscall+0x0/0x1c)
[   27.081179]  r9:cac92000 r8:c0107f84 r7:00000004 r6:00000001 r5:000e20f8 r4:00000001
[   27.088947] Code: 1a000009 e1a04009 e3a06010 e1a05008 (e5943000)
[   27.095244] ---[ end trace 06d1dab86d6e6bab ]---

To fix that problem call iio_trigger_put(trig) only when trig is not
NULL.

Fixes: d5d24bcc0a ("iio: trigger: close race condition in acquiring trigger reference")
Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
80c8ac6b9b iio: light: ltr501 Fix interchanged als/ps register field
commit 7cc3bff4ef upstream.

The register mapping for the IIO driver for the Liteon Light and Proximity
sensor LTR501 interrupt mode is interchanged (ALS/PS).
There is a register called INTERRUPT register (address 0x8F)
Bit 0 represents PS measurement trigger.
Bit 1 represents ALS measurement trigger.
This two bit fields are interchanged within the driver.
see datasheet page 24:
http://optoelectronics.liteon.com/upload/download/DS86-2012-0006/S_110_LTR-501ALS-01_PrelimDS_ver1%5B1%5D.pdf

Signed-off-by: Franziska Naepelt <franziska.naepelt@idt.com>
Fixes: 7ac702b314 ("iio: ltr501: Add interrupt support")
Acked-by: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
1cb7bbe7b5 iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's
commit f7d86ecf83 upstream.

The third argument of devm_request_threaded_irq() is the primary
handler. It is called in hardirq context and checks whether the
interrupt is relevant to the device. If the primary handler returns
IRQ_WAKE_THREAD, the secondary handler (a.k.a. handler thread) is
scheduled to run in process context.

bcm_iproc_adc.c uses the secondary handler as the primary one
and the other way around. So this patch fixes the same, along with
re-naming the secondary handler and primary handler names properly.

Tested on the BCM9583XX iProc SoC based boards.

Fixes: 4324c97ece ("iio: Add driver for Broadcom iproc-static-adc")
Reported-by: Pavel Roskin <plroskin@gmail.com>
Signed-off-by: Raveendra Padasalagi <raveendra.padasalagi@broadcom.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
d9b0c95585 staging/lustre/lov: remove set_fs() call from lov_getstripe()
commit 0a33252e06 upstream.

lov_getstripe() calls set_fs(KERNEL_DS) so that it can handle a struct
lov_user_md pointer from user- or kernel-space.  This changes the
behavior of copy_from_user() on SPARC and may result in a misaligned
access exception which in turn oopses the kernel.  In fact the
relevant argument to lov_getstripe() is never called with a
kernel-space pointer and so changing the address limits is unnecessary
and so we remove the calls to save, set, and restore the address
limits.

Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-on: http://review.whamcloud.com/6150
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3221
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Li Wei <wei.g.li@intel.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
dd8980bb03 usb: chipidea: debug: check before accessing ci_role
commit 0340ff83cd upstream.

ci_role BUGs when the role is >= CI_ROLE_END.

Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:43 +02:00
f2967b72b6 usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
commit aa1f058d7d upstream.

Fix below NULL pointer dereference. we set ci->roles[CI_ROLE_GADGET]
too early in ci_hdrc_gadget_init(), if udc_start() fails due to some
reason, the ci->roles[CI_ROLE_GADGET] check in  ci_hdrc_gadget_destroy
can't protect us.

We fix this issue by only setting ci->roles[CI_ROLE_GADGET] if
udc_start() succeed.

[    1.398550] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
...
[    1.448600] PC is at dma_pool_free+0xb8/0xf0
[    1.453012] LR is at dma_pool_free+0x28/0xf0
[    2.113369] [<ffffff80081817d8>] dma_pool_free+0xb8/0xf0
[    2.118857] [<ffffff800841209c>] destroy_eps+0x4c/0x68
[    2.124165] [<ffffff8008413770>] ci_hdrc_gadget_destroy+0x28/0x50
[    2.130461] [<ffffff800840fa30>] ci_hdrc_probe+0x588/0x7e8
[    2.136129] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
[    2.142066] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
[    2.148270] [<ffffff800837f68c>] __device_attach_driver+0x9c/0xf8
[    2.154563] [<ffffff800837d570>] bus_for_each_drv+0x58/0x98
[    2.160317] [<ffffff800837f174>] __device_attach+0xc4/0x138
[    2.166072] [<ffffff800837f738>] device_initial_probe+0x10/0x18
[    2.172185] [<ffffff800837e58c>] bus_probe_device+0x94/0xa0
[    2.177940] [<ffffff800837c560>] device_add+0x3f0/0x560
[    2.183337] [<ffffff8008380d20>] platform_device_add+0x180/0x240
[    2.189541] [<ffffff800840f0e8>] ci_hdrc_add_device+0x440/0x4f8
[    2.195654] [<ffffff8008414194>] ci_hdrc_usb2_probe+0x13c/0x2d8
[    2.201769] [<ffffff8008380fb8>] platform_drv_probe+0x50/0xb8
[    2.207705] [<ffffff800837f494>] driver_probe_device+0x1fc/0x2a8
[    2.213910] [<ffffff800837f5ec>] __driver_attach+0xac/0xb0
[    2.219575] [<ffffff800837d4b0>] bus_for_each_dev+0x60/0xa0
[    2.225329] [<ffffff800837ec80>] driver_attach+0x20/0x28
[    2.230816] [<ffffff800837e880>] bus_add_driver+0x1d0/0x238
[    2.236571] [<ffffff800837fdb0>] driver_register+0x60/0xf8
[    2.242237] [<ffffff8008380ef4>] __platform_driver_register+0x44/0x50
[    2.248891] [<ffffff80086fd440>] ci_hdrc_usb2_driver_init+0x18/0x20
[    2.255365] [<ffffff8008082950>] do_one_initcall+0x38/0x128
[    2.261121] [<ffffff80086e0d00>] kernel_init_freeable+0x1ac/0x250
[    2.267414] [<ffffff800852f0b8>] kernel_init+0x10/0x100
[    2.272810] [<ffffff8008082680>] ret_from_fork+0x10/0x50

Fixes: 3f124d233e ("usb: chipidea: add role init and destroy APIs")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
f26ac1fc9f usb: chipidea: imx: Do not access CLKONOFF on i.MX51
commit 62b97d502b upstream.

Unlike i.MX53, i.MX51's USBOH3 register file does not implemenent
registers past offset 0x018, which includes
MX53_USB_CLKONOFF_CTRL_OFFSET and trying to access that register on
said platform results in external abort.

Fix it by enabling CLKONOFF accessing codepath only for i.MX53.

Fixes 3be3251db0 ("usb: chipidea: imx: Disable internal 60Mhz clock with ULPI PHY")
Cc: cphealy@gmail.com
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
b89de0406a usb: musb: dsps: keep VBUS on for host-only mode
commit b3addcf0d1 upstream.

Currently VBUS is turned off while a usb device is detached, and turned
on again by the polling routine. This short period VBUS loss prevents
usb modem to switch mode.

VBUS should be constantly on for host-only mode, so this changes the
driver to not turn off VBUS for host-only mode.

Fixes: 2f3fd2c5bd ("usb: musb: Prepare dsps glue layer for PM runtime support")
Reported-by: Moreno Bartalucci <moreno.bartalucci@tecnorama.it>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
c33b087c83 usb: gadget: f_mass_storage: Serialize wake and sleep execution
commit dc9217b69d upstream.

f_mass_storage has a memorry barrier issue with the sleep and wake
functions that can cause a deadlock. This results in intermittent hangs
during MSC file transfer. The host will reset the device after receiving
no response to resume the transfer. This issue is seen when dwc3 is
processing 2 transfer-in-progress events at the same time, invoking
completion handlers for CSW and CBW. Also this issue occurs depending on
the system timing and latency.

To increase the chance to hit this issue, you can force dwc3 driver to
wait and process those 2 events at once by adding a small delay (~100us)
in dwc3_check_event_buf() whenever the request is for CSW and read the
event count again. Avoid debugging with printk and ftrace as extra
delays and memory barrier will mask this issue.

Scenario which can lead to failure:
-----------------------------------
1) The main thread sleeps and waits for the next command in
   get_next_command().
2) bulk_in_complete() wakes up main thread for CSW.
3) bulk_out_complete() tries to wake up the running main thread for CBW.
4) thread_wakeup_needed is not loaded with correct value in
   sleep_thread().
5) Main thread goes to sleep again.

The pattern is shown below. Note the 2 critical variables.
 * common->thread_wakeup_needed
 * bh->state

	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
	==============================  ===============================

					bh->state = BH_STATE_FULL;
					smp_wmb();
	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
	smp_rmb();
	if (bh->state != BH_STATE_FULL)
		sleep again ...

As pointed out by Alan Stern, this is an R-pattern issue. The issue can
be seen when there are two wakeups in quick succession. The
thread_wakeup_needed can be overwritten in sleep_thread, and the read of
the bh->state maybe reordered before the write to thread_wakeup_needed.

This patch applies full memory barrier smp_mb() in both sleep_thread()
and wakeup_thread() to ensure the order which the thread_wakeup_needed
and bh->state are written and loaded.

However, a better solution in the future would be to use wait_queue
method that takes care of managing memory barrier between waker and
waiter.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
3d7ba52b5f drm: Fix oops + Xserver hang when unplugging USB drm devices
commit 75fb636324 upstream.

commit a39be606f9 ("drm: Do a full device unregister when unplugging")
causes backtraces like this one when unplugging an usb drm device while
it is in use:

usb 2-3: USB disconnect, device number 25
------------[ cut here ]------------
WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424
   drm_mode_config_cleanup+0x220/0x280 [drm]
...
RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm]
...
Call Trace:
 gm12u320_modeset_cleanup+0xe/0x10 [gm12u320]
 gm12u320_driver_unload+0x35/0x70 [gm12u320]
 drm_dev_unregister+0x3c/0xe0 [drm]
 drm_unplug_dev+0x12/0x60 [drm]
 gm12u320_usb_disconnect+0x36/0x40 [gm12u320]
 usb_unbind_interface+0x72/0x280
 device_release_driver_internal+0x158/0x210
 device_release_driver+0x12/0x20
 bus_remove_device+0x104/0x180
 device_del+0x1d2/0x350
 usb_disable_device+0x9f/0x270
 usb_disconnect+0xc6/0x260
...
[drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458
   drm_mode_config_cleanup+0x268/0x280 [drm]
...
<same Call Trace>
---[ end trace 80df975dae439ed6 ]---
general protection fault: 0000 [#1] SMP
...
Call Trace:
 ? __switch_to+0x225/0x450
 drm_mode_rmfb_work_fn+0x55/0x70 [drm]
 process_one_work+0x193/0x3c0
 worker_thread+0x4a/0x3a0
...
RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98
---[ end trace 80df975dae439ed7 ]---

After which the system is unusable this is caused by drm_dev_unregister
getting called immediately on unplug, which calls the drivers unload
function which calls drm_mode_config_cleanup which removes the framebuffer
object while userspace is still holding a reference to it.

Reverting commit a39be606f9 ("drm: Do a full device unregister
when unplugging") leads to the following oops on unplug instead,
when userspace closes the last fd referencing the drm_dev:

sysfs group 'power' not found for kobject 'card1-Unknown-1'
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237
   sysfs_remove_group+0x80/0x90
...
RIP: 0010:sysfs_remove_group+0x80/0x90
...
Call Trace:
 dpm_sysfs_remove+0x57/0x60
 device_del+0xfd/0x350
 device_unregister+0x1a/0x60
 drm_sysfs_connector_remove+0x39/0x50 [drm]
 drm_connector_unregister+0x5a/0x70 [drm]
 drm_connector_unregister_all+0x45/0xa0 [drm]
 drm_modeset_unregister_all+0x12/0x30 [drm]
 drm_dev_unregister+0xca/0xe0 [drm]
 drm_put_dev+0x32/0x60 [drm]
 drm_release+0x2f3/0x380 [drm]
 __fput+0xdf/0x1e0
...
---[ end trace ecfb91ac85688bbe ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: down_write+0x1f/0x40
...
Call Trace:
 debugfs_remove_recursive+0x55/0x1b0
 drm_debugfs_connector_remove+0x21/0x40 [drm]
 drm_connector_unregister+0x62/0x70 [drm]
 drm_connector_unregister_all+0x45/0xa0 [drm]
 drm_modeset_unregister_all+0x12/0x30 [drm]
 drm_dev_unregister+0xca/0xe0 [drm]
 drm_put_dev+0x32/0x60 [drm]
 drm_release+0x2f3/0x380 [drm]
 __fput+0xdf/0x1e0
...
---[ end trace ecfb91ac85688bbf ]---

This is caused by the revert moving back to drm_unplug_dev calling
drm_minor_unregister which does:

        device_del(minor->kdev);
        dev_set_drvdata(minor->kdev, NULL); /* safety belt */
        drm_debugfs_cleanup(minor);

Causing the sysfs entries to already be removed even though we still
have references to them in e.g. drm_connector.

Note we must call drm_minor_unregister to notify userspace of the unplug
of the device, so calling drm_dev_unregister is not completely wrong the
problem is that drm_dev_unregister does too much.

This commit fixes drm_unplug_dev by not only reverting
commit a39be606f9 ("drm: Do a full device unregister when unplugging")
but by also adding a call to drm_modeset_unregister_all before the
drm_minor_unregister calls to make sure all sysfs entries are removed
before calling device_del(minor->kdev) thereby also fixing the second
set of oopses caused by just reverting the commit.

Fixes: a39be606f9 ("drm: Do a full device unregister when unplugging")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeffy <jeffy.chen@rock-chips.com>
Cc: Marco Diego Aurélio Mesquita <marcodiegomesquita@gmail.com>
Reported-by: Marco Diego Aurélio Mesquita <marcodiegomesquita@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601115430.4113-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
de8f4aeaa1 ext4: fix fdatasync(2) after extent manipulation operations
commit 67a7d5f561 upstream.

Currently, extent manipulation operations such as hole punch, range
zeroing, or extent shifting do not record the fact that file data has
changed and thus fdatasync(2) has a work to do. As a result if we crash
e.g. after a punch hole and fdatasync, user can still possibly see the
punched out data after journal replay. Test generic/392 fails due to
these problems.

Fix the problem by properly marking that file data has changed in these
operations.

Fixes: a4bb6b64e3
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
875d084e97 ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO
commit 4f8caa60a5 upstream.

When ext4_map_blocks() is called with EXT4_GET_BLOCKS_ZERO to zero-out
allocated blocks and these blocks are actually converted from unwritten
extent the following race can happen:

CPU0					CPU1

page fault				page fault
...					...
ext4_map_blocks()
  ext4_ext_map_blocks()
    ext4_ext_handle_unwritten_extents()
      ext4_ext_convert_to_initialized()
	- zero out converted extent
	ext4_zeroout_es()
	  - inserts extent as initialized in status tree

					ext4_map_blocks()
					  ext4_es_lookup_extent()
					    - finds initialized extent
					write data
  ext4_issue_zeroout()
    - zeroes out new extent overwriting data

This problem can be reproduced by generic/340 for the fallocated case
for the last block in the file.

Fix the problem by avoiding zeroing out the area we are mapping with
ext4_map_blocks() in ext4_ext_convert_to_initialized(). It is pointless
to zero out this area in the first place as the caller asked us to
convert the area to initialized because he is just going to write data
there before the transaction finishes. To achieve this we delete the
special case of zeroing out full extent as that will be handled by the
cases below zeroing only the part of the extent that needs it. We also
instruct ext4_split_extent() that the middle of extent being split
contains data so that ext4_split_extent_at() cannot zero out full extent
in case of ENOSPC.

Fixes: 12735f8819
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
22fb074c67 ext4: keep existing extra fields when inode expands
commit 887a973061 upstream.

ext4_expand_extra_isize() should clear only space between old and new
size.

Fixes: 6dd4ee7cab # v2.6.23
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:42 +02:00
699dc1080d ext4: fix SEEK_HOLE
commit 7d95eddf31 upstream.

Currently, SEEK_HOLE implementation in ext4 may both return that there's
a hole at some offset although that offset already has data and skip
some holes during a search for the next hole. The first problem is
demostrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (2.054 GiB/sec and 538461.5385 ops/sec)
Whence	Result
HOLE	0

Where we can see that SEEK_HOLE wrongly returned offset 0 as containing
a hole although we have written data there. The second problem can be
demonstrated by:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file

wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (1.978 GiB/sec and 518518.5185 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (2 GiB/sec and 500000.0000 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offsets 56k..128k has been ignored by the
SEEK_HOLE call.

The underlying problem is in the ext4_find_unwritten_pgoff() which is
just buggy. In some cases it fails to update returned offset when it
finds a hole (when no pages are found or when the first found page has
higher index than expected), in some cases conditions for detecting hole
are just missing (we fail to detect a situation where indices of
returned pages are not contiguous).

Fix ext4_find_unwritten_pgoff() to properly detect non-contiguous page
indices and also handle all cases where we got less pages then expected
in one place and handle it properly there.

Fixes: c8c0df241c
CC: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
628ee50458 xen/privcmd: Support correctly 64KB page granularity when mapping memory
commit 753c09b565 upstream.

Commit 5995a68 "xen/privcmd: Add support for Linux 64KB page granularity" did
not go far enough to support 64KB in mmap_batch_fn.

The variable 'nr' is the number of 4KB chunk to map. However, when Linux
is using 64KB page granularity the array of pages (vma->vm_private_data)
contain one page per 64KB. Fix it by incrementing st->index correctly.

Furthermore, st->va is not correctly incremented as PAGE_SIZE !=
XEN_PAGE_SIZE.

Fixes: 5995a68 ("xen/privcmd: Add support for Linux 64KB page granularity")
Reported-by: Feng Kan <fkan@apm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
b9d013c005 mtd: nand: tango: Update ecc_stats.corrected
commit 60cf0ce14b upstream.

According to Boris, some user-space tools expect MTD drivers to
update ecc_stats.corrected, and it's better to provide a lower
bound than to provide no information at all.

Fixes: 6956e2385a ("mtd: nand: add tango NAND flash controller support")
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
6d916021a1 mtd: nand: tango: Export OF device ID table as module aliases
commit 2761b4f12b upstream.

The device table is required to load modules based on
modaliases. After adding MODULE_DEVICE_TABLE, below entries
for example will be added to module.alias:
alias:          of:N*T*Csigma,smp8758-nandC*
alias:          of:N*T*Csigma,smp8758-nand

Fixes: 6956e2385a ("mtd: nand: add tango NAND flash controller support")
Signed-off-by: Andres Galacho <andresgalacho@gmail.com>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
f0aa7a0415 reiserfs: Make flush bios explicitely sync
commit d8747d642e upstream.

Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65a
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
f2fee0c4cd cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode
commit 5be6b75610 upstream.

When adding a cfq_group into the cfq service tree, we use CFQ_IDLE_DELAY
as the delay of cfq_group's vdisktime if there have been other cfq_groups
already.

When cfq is under iops mode, commit 9a7f38c42c ("cfq-iosched: Convert
from jiffies to nanoseconds") could result in a large iops delay and
lead to an abnormal io schedule delay for the added cfq_group. To fix
it, we just need to revert to the old CFQ_IDLE_DELAY value: HZ / 5
when iops mode is enabled.

Despite having the same value, the delay of a cfq_queue in idle class
and the delay of cfq_group are different things, so I define two new
macros for the delay of a cfq_group under time-slice mode and iops mode.

Fixes: 9a7f38c42c ("cfq-iosched: Convert from jiffies to nanoseconds")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
067b131e20 dmaengine: mv_xor_v2: set DMA mask to 40 bits
commit b2d3c270f9 upstream.

The XORv2 engine on Armada 7K/8K can only access the first 40 bits of
the physical address space, so the DMA mask must be set accordingly.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
6d41a85b29 dmaengine: mv_xor_v2: remove interrupt coalescing
commit 9dd4f319ba upstream.

The current implementation of interrupt coalescing doesn't work, because
it doesn't configure the coalescing timer, which is needed to make sure
we get an interrupt at some point.

As a fix for stable, we simply remove the interrupt coalescing
functionality. It will be re-introduced properly in a future commit.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:41 +02:00
6f4c739de9 dmaengine: mv_xor_v2: fix tx_submit() implementation
commit 44d5887a8b upstream.

The mv_xor_v2_tx_submit() gets the next available HW descriptor by
calling mv_xor_v2_get_desq_write_ptr(), which reads a HW register
telling the next available HW descriptor. This was working fine when HW
descriptors were issued for processing directly in tx_submit().

However, as part of the review process of the driver, a change was
requested to move the actual kick-off of HW descriptors processing to
->issue_pending(). Due to this, reading the HW register to know the next
available HW descriptor no longer works.

So instead of using this HW register, we implemented a software index
pointing to the next available HW descriptor.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
a4634dfb7c dmaengine: mv_xor_v2: enable XOR engine after its configuration
commit ab2c5f0a77 upstream.

The engine was enabled prior to its configuration, which isn't
correct. This patch relocates the activation of the XOR engine, to be
after the configuration of the XOR engine.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Hanna Hawa <hannah@marvell.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
cba2cd6df4 dmaengine: mv_xor_v2: do not use descriptors not acked by async_tx
commit bc473da1ed upstream.

Descriptors that have not been acknowledged by the async_tx layer
should not be re-used, so this commit adjusts the implementation of
mv_xor_v2_prep_sw_desc() to skip descriptors for which
async_tx_test_ack() is false.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
2384df2bfe dmaengine: mv_xor_v2: properly handle wrapping in the array of HW descriptors
commit 2aab4e1815 upstream.

mv_xor_v2_tasklet() is looping over completed HW descriptors. Before the
loop, it initializes 'next_pending_hw_desc' to the first HW descriptor
to handle, and then the loop simply increments this point, without
taking care of wrapping when we reach the last HW descriptor. The
'pending_ptr' index was being wrapped back to 0 at the end, but it
wasn't used in each iteration of the loop to calculate
next_pending_hw_desc.

This commit fixes that, and makes next_pending_hw_desc a variable local
to the loop itself.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
fdcadb5f70 dmaengine: mv_xor_v2: handle mv_xor_v2_prep_sw_desc() error properly
commit eb8df543e4 upstream.

The mv_xor_v2_prep_sw_desc() is called from a few different places in
the driver, but we never take into account the fact that it might
return NULL. This commit fixes that, ensuring that we don't panic if
there are no more descriptors available.

Fixes: 19a340b1a8 ("dmaengine: mv_xor_v2: new driver")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
5c039176bc dmaengine: ep93xx: Don't drain the transfers in terminate_all()
commit 98f9de366f upstream.

Draining the transfers in terminate_all callback happens with IRQs disabled,
therefore induces huge latency:

 irqsoff latency trace v1.1.5 on 4.11.0
 --------------------------------------------------------------------
 latency: 39770 us, #57/57, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
    -----------------
    | task: process-129 (uid:0 nice:0 policy:2 rt_prio:50)
    -----------------
  => started at: _snd_pcm_stream_lock_irqsave
  => ended at:   snd_pcm_stream_unlock_irqrestore

                  _------=> CPU#
                 / _-----=> irqs-off
                | / _----=> need-resched
                || / _---=> hardirq/softirq
                ||| / _--=> preempt-depth
                |||| /     delay
  cmd     pid   ||||| time  |   caller
     \   /      |||||  \    |   /
process-129     0d.s.    3us : _snd_pcm_stream_lock_irqsave
process-129     0d.s1    9us : snd_pcm_stream_lock <-_snd_pcm_stream_lock_irqsave
process-129     0d.s1   15us : preempt_count_add <-snd_pcm_stream_lock
process-129     0d.s2   22us : preempt_count_add <-snd_pcm_stream_lock
process-129     0d.s3   32us : snd_pcm_update_hw_ptr0 <-snd_pcm_period_elapsed
process-129     0d.s3   41us : soc_pcm_pointer <-snd_pcm_update_hw_ptr0
process-129     0d.s3   50us : dmaengine_pcm_pointer <-soc_pcm_pointer
process-129     0d.s3   58us+: snd_dmaengine_pcm_pointer_no_residue <-dmaengine_pcm_pointer
process-129     0d.s3   96us : update_audio_tstamp <-snd_pcm_update_hw_ptr0
process-129     0d.s3  103us : snd_pcm_update_state <-snd_pcm_update_hw_ptr0
process-129     0d.s3  112us : xrun <-snd_pcm_update_state
process-129     0d.s3  119us : snd_pcm_stop <-xrun
process-129     0d.s3  126us : snd_pcm_action <-snd_pcm_stop
process-129     0d.s3  134us : snd_pcm_action_single <-snd_pcm_action
process-129     0d.s3  141us : snd_pcm_pre_stop <-snd_pcm_action_single
process-129     0d.s3  150us : snd_pcm_do_stop <-snd_pcm_action_single
process-129     0d.s3  157us : soc_pcm_trigger <-snd_pcm_do_stop
process-129     0d.s3  166us : snd_dmaengine_pcm_trigger <-soc_pcm_trigger
process-129     0d.s3  175us : ep93xx_dma_terminate_all <-snd_dmaengine_pcm_trigger
process-129     0d.s3  182us : preempt_count_add <-ep93xx_dma_terminate_all
process-129     0d.s4  189us*: m2p_hw_shutdown <-ep93xx_dma_terminate_all
process-129     0d.s4 39472us : m2p_hw_setup <-ep93xx_dma_terminate_all

 ... rest skipped...

process-129     0d.s. 40080us : <stack trace>
 => ep93xx_dma_tasklet
 => tasklet_action
 => __do_softirq
 => irq_exit
 => __handle_domain_irq
 => vic_handle_irq
 => __irq_usr
 => 0xb66c6668

Just abort the transfers and warn if the HW state is not what we expect.
Move draining into device_synchronize callback.

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
81a38a595d dmaengine: ep93xx: Always start from BASE0
commit 0037ae4781 upstream.

The current buffer is being reset to zero on device_free_chan_resources()
but not on device_terminate_all(). It could happen that HW is restarted and
expects BASE0 to be used, but the driver is not synchronized and will start
from BASE1. One solution is to reset the buffer explicitly in
m2p_hw_setup().

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
9812dc2abf dmaengine: usb-dmac: Fix DMAOR AE bit definition
commit 9a445bbb16 upstream.

This patch fixes the register definition of AE (Address Error flag) bit.

Fixes: 0c1c8ff32f ("dmaengine: usb-dmac: Add Renesas USB DMA Controller (USB-DMAC) driver")
Signed-off-by: Hiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
[Shimoda: add Fixes and Cc tags in the commit log]
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:40 +02:00
3e456059a4 KVM: async_pf: avoid async pf injection when in guest mode
commit 9bc1f09f6f upstream.

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patch fixes the hang by doing async pf when executing L1 guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
37b1521501 arm: KVM: Allow unaligned accesses at HYP
commit 33b5c38852 upstream.

We currently have the HSCTLR.A bit set, trapping unaligned accesses
at HYP, but we're not really prepared to deal with it.

Since the rest of the kernel is pretty happy about that, let's follow
its example and set HSCTLR.A to zero. Modern CPUs don't really care.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
d06ca326dd arm64: KVM: Allow unaligned accesses at EL2
commit 78fd6dcf11 upstream.

We currently have the SCTLR_EL2.A bit set, trapping unaligned accesses
at EL2, but we're not really prepared to deal with it. So far, this
has been unnoticed, until GCC 7 started emitting those (in particular
64bit writes on a 32bit boundary).

Since the rest of the kernel is pretty happy about that, let's follow
its example and set SCTLR_EL2.A to zero. Modern CPUs don't really
care.

Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
ff384c499b arm64: KVM: Preserve RES1 bits in SCTLR_EL2
commit d68c1f7fd1 upstream.

__do_hyp_init has the rather bad habit of ignoring RES1 bits and
writing them back as zero. On a v8.0-8.2 CPU, this doesn't do anything
bad, but may end-up being pretty nasty on future revisions of the
architecture.

Let's preserve those bits so that we don't have to fix this later on.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
0d2c539ead KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
commit a3641631d1 upstream.

If "i" is the last element in the vcpu->arch.cpuid_entries[] array, it
potentially can be exploited the vulnerability. this will out-of-bounds
read and write.  Luckily, the effect is small:

	/* when no next entry is found, the current entry[i] is reselected */
	for (j = i + 1; ; j = (j + 1) % nent) {
		struct kvm_cpuid_entry2 *ej = &vcpu->arch.cpuid_entries[j];
		if (ej->function == e->function) {

It reads ej->maxphyaddr, which is user controlled.  However...

			ej->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;

After cpuid_entries there is

	int maxphyaddr;
	struct x86_emulate_ctxt emulate_ctxt;  /* 16-byte aligned */

So we have:

- cpuid_entries at offset 1B50 (6992)
- maxphyaddr at offset 27D0 (6992 + 3200 = 10192)
- padding at 27D4...27DF
- emulate_ctxt at 27E0

And it writes in the padding.  Pfew, writing the ops field of emulate_ctxt
would have been much worse.

This patch fixes it by modding the index to avoid the out-of-bounds
access. Worst case, i == j and ej->function == e->function,
the loop can bail out.

Reported-by: Moguofang <moguofang@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Guofang Mo <moguofang@huawei.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
406aa22b29 kvm: async_pf: fix rcu_irq_enter() with irqs enabled
commit bbaf0e2b1c upstream.

native_safe_halt enables interrupts, and you just shouldn't
call rcu_irq_enter() with interrupts enabled.  Reorder the
call with the following local_irq_disable() to respect the
invariant.

Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
f9ecf4222a efi/bgrt: Skip efi_bgrt_init() in case of non-EFI boot
commit 7425826f4f upstream.

Sabrina Dubroca reported an early panic:

  BUG: unable to handle kernel paging request at ffffffffff240001
  IP: efi_bgrt_init+0xdc/0x134

  [...]

  ---[ end Kernel panic - not syncing: Attempted to kill the idle task!

... which was introduced by:

  7b0a911478 ("efi/x86: Move the EFI BGRT init code to early init code")

The cause is that on this machine the firmware provides the EFI ACPI BGRT
table even on legacy non-EFI bootups - which table should be EFI only.

The garbage BGRT data causes the efi_bgrt_init() panic.

Add a check to skip efi_bgrt_init() in case non-EFI bootup to work around
this firmware bug.

Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 7b0a911478 ("efi/x86: Move the EFI BGRT init code to early init code")
Link: http://lkml.kernel.org/r/20170526113652.21339-6-matt@codeblueprint.co.uk
[ Rewrote the changelog to be more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
c76a4a0771 efi: Don't issue error message when booted under Xen
commit 1ea34adb87 upstream.

When booted as Xen dom0 there won't be an EFI memmap allocated. Avoid
issuing an error message in this case:

  [    0.144079] efi: Failed to allocate new EFI memmap

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:39 +02:00
b8745dbb65 gfs2: Make flush bios explicitely sync
commit 0f0b9b63e1 upstream.

Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65a
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
836fb216da nfsd4: fix null dereference on replay
commit 9a307403d3 upstream.

if we receive a compound such that:

	- the sessionid, slot, and sequence number in the SEQUENCE op
	  match a cached succesful reply with N ops, and
	- the Nth operation of the compound is a PUTFH, PUTPUBFH,
	  PUTROOTFH, or RESTOREFH,

then nfsd4_sequence will return 0 and set cstate->status to
nfserr_replay_cache.  The current filehandle will not be set.  This will
cause us to call check_nfsd_access with first argument NULL.

To nfsd4_compound it looks like we just succesfully executed an
operation that set a filehandle, but the current filehandle is not set.

Fix this by moving the nfserr_replay_cache earlier.  There was never any
reason to have it after the encode_op label, since the only case where
he hit that is when opdesc->op_func sets it.

Note that there are two ways we could hit this case:

	- a client is resending a previously sent compound that ended
	  with one of the four PUTFH-like operations, or
	- a client is sending a *new* compound that (incorrectly) shares
	  sessionid, slot, and sequence number with a previously sent
	  compound, and the length of the previously sent compound
	  happens to match the position of a PUTFH-like operation in the
	  new compound.

The second is obviously incorrect client behavior.  The first is also
very strange--the only purpose of a PUTFH-like operation is to set the
current filehandle to be used by the following operation, so there's no
point in having it as the last in a compound.

So it's likely this requires a buggy or malicious client to reproduce.

Reported-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
8015cc81a6 drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
commit 0a646f331d upstream.

Even if the vblank period would allow it, it still seems to
be problematic on some cards.

v2: fix logic inversion (Nils)

bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
0f7ff02f7d kthread: Fix use-after-free if kthread fork fails
commit 4d6501dce0 upstream.

If a kthread forks (e.g. usermodehelper since commit 1da5c46fa9) but
fails in copy_process() between calling dup_task_struct() and setting
p->set_child_tid, then the value of p->set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().

    kthread()
     - worker_thread()
        - process_one_work()
        |  - call_usermodehelper_exec_work()
        |     - kernel_thread()
        |        - _do_fork()
        |           - copy_process()
        |              - dup_task_struct()
        |                 - arch_dup_task_struct()
        |                    - tsk->set_child_tid = current->set_child_tid // implied
        |              - ...
        |              - goto bad_fork_*
        |              - ...
        |              - free_task(tsk)
        |                 - free_kthread_struct(tsk)
        |                    - kfree(tsk->set_child_tid)
        - ...
        - schedule()
           - __schedule()
              - wq_worker_sleeping()
                 - kthread_data(task)->flags // UAF

The problem started showing up with commit 1da5c46fa9 since it reused
->set_child_tid for the kthread worker data.

A better long-term solution might be to get rid of the ->set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.

Debugged-by: Jamie Iles <jamie.iles@oracle.com>
Fixes: 1da5c46fa9 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jamie Iles <jamie.iles@oracle.com>
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
a5505a656f ovl: fix creds leak in copy up error path
commit 8137ae26d2 upstream.

Fixes: 42f269b925 ("ovl: rearrange code in ovl_copy_up_locked()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
f59fdb278e crypto: gcm - wait for crypto op not signal safe
commit f3ad587070 upstream.

crypto_gcm_setkey() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the data buffer that is kfree'ed in this case.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
1286652e80 crypto: drbg - wait for crypto op not signal safe
commit a5dfefb1c3 upstream.

drbg_kcapi_sym_ctr() was using wait_for_completion_interruptible() to
wait for completion of async crypto op but if a signal occurs it
may return before DMA ops of HW crypto provider finish, thus
corrupting the output buffer.

Resolve this by using wait_for_completion() instead.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
7e4c3a6da3 KEYS: encrypted: avoid encrypting/decrypting stack buffers
commit e9ff56ac35 upstream.

Since v4.9, the crypto API cannot (normally) be used to encrypt/decrypt
stack buffers because the stack may be virtually mapped.  Fix this for
the padding buffers in encrypted-keys by using ZERO_PAGE for the
encryption padding and by allocating a temporary heap buffer for the
decryption padding.

Tested with CONFIG_DEBUG_SG=y:
	keyctl new_session
	keyctl add user master "abcdefghijklmnop" @s
	keyid=$(keyctl add encrypted desc "new user:master 25" @s)
	datablob="$(keyctl pipe $keyid)"
	keyctl unlink $keyid
	keyid=$(keyctl add encrypted desc "load $datablob" @s)
	datablob2="$(keyctl pipe $keyid)"
	[ "$datablob" = "$datablob2" ] && echo "Success!"

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:38 +02:00
8a28f221a0 KEYS: fix freeing uninitialized memory in key_update()
commit 63a0b0509e upstream.

key_update() freed the key_preparsed_payload even if it was not
initialized first.  This would cause a crash if userspace called
keyctl_update() on a key with type like "asymmetric" that has a
->preparse() method but not an ->update() method.  Possibly it could
even be triggered for other key types by racing with keyctl_setperm() to
make the KEY_NEED_WRITE check fail (the permission was already checked,
so normally it wouldn't fail there).

Reproducer with key type "asymmetric", given a valid cert.der:

keyctl new_session
keyid=$(keyctl padd asymmetric desc @s < cert.der)
keyctl setperm $keyid 0x3f000000
keyctl update $keyid data

[  150.686666] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[  150.687601] IP: asymmetric_key_free_kids+0x12/0x30
[  150.688139] PGD 38a3d067
[  150.688141] PUD 3b3de067
[  150.688447] PMD 0
[  150.688745]
[  150.689160] Oops: 0000 [#1] SMP
[  150.689455] Modules linked in:
[  150.689769] CPU: 1 PID: 2478 Comm: keyctl Not tainted 4.11.0-rc4-xfstests-00187-ga9f6b6b8cd2f #742
[  150.690916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
[  150.692199] task: ffff88003b30c480 task.stack: ffffc90000350000
[  150.692952] RIP: 0010:asymmetric_key_free_kids+0x12/0x30
[  150.693556] RSP: 0018:ffffc90000353e58 EFLAGS: 00010202
[  150.694142] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
[  150.694845] RDX: ffffffff81ee3920 RSI: ffff88003d4b0700 RDI: 0000000000000001
[  150.697569] RBP: ffffc90000353e60 R08: ffff88003d5d2140 R09: 0000000000000000
[  150.702483] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  150.707393] R13: 0000000000000004 R14: ffff880038a4d2d8 R15: 000000000040411f
[  150.709720] FS:  00007fcbcee35700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[  150.711504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  150.712733] CR2: 0000000000000001 CR3: 0000000039eab000 CR4: 00000000003406e0
[  150.714487] Call Trace:
[  150.714975]  asymmetric_key_free_preparse+0x2f/0x40
[  150.715907]  key_update+0xf7/0x140
[  150.716560]  ? key_default_cmp+0x20/0x20
[  150.717319]  keyctl_update_key+0xb0/0xe0
[  150.718066]  SyS_keyctl+0x109/0x130
[  150.718663]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  150.719440] RIP: 0033:0x7fcbce75ff19
[  150.719926] RSP: 002b:00007ffd5d167088 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
[  150.720918] RAX: ffffffffffffffda RBX: 0000000000404d80 RCX: 00007fcbce75ff19
[  150.721874] RDX: 00007ffd5d16785e RSI: 000000002866cd36 RDI: 0000000000000002
[  150.722827] RBP: 0000000000000006 R08: 000000002866cd36 R09: 00007ffd5d16785e
[  150.723781] R10: 0000000000000004 R11: 0000000000000206 R12: 0000000000404d80
[  150.724650] R13: 00007ffd5d16784d R14: 00007ffd5d167238 R15: 000000000040411f
[  150.725447] Code: 83 c4 08 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 85 ff 74 23 55 48 89 e5 53 48 89 fb <48> 8b 3f e8 06 21 c5 ff 48 8b 7b 08 e8 fd 20 c5 ff 48 89 df e8
[  150.727489] RIP: asymmetric_key_free_kids+0x12/0x30 RSP: ffffc90000353e58
[  150.728117] CR2: 0000000000000001
[  150.728430] ---[ end trace f7f8fe1da2d5ae8d ]---

Fixes: 4d8c0250b8 ("KEYS: Call ->free_preparse() even after ->preparse() returns an error")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
5def69023a KEYS: fix dereferencing NULL payload with nonzero length
commit 5649645d72 upstream.

sys_add_key() and the KEYCTL_UPDATE operation of sys_keyctl() allowed a
NULL payload with nonzero length to be passed to the key type's
->preparse(), ->instantiate(), and/or ->update() methods.  Various key
types including asymmetric, cifs.idmap, cifs.spnego, and pkcs7_test did
not handle this case, allowing an unprivileged user to trivially cause a
NULL pointer dereference (kernel oops) if one of these key types was
present.  Fix it by doing the copy_from_user() when 'plen' is nonzero
rather than when '_payload' is non-NULL, causing the syscall to fail
with EFAULT as expected when an invalid buffer is specified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
e423898fd8 crypto: asymmetric_keys - handle EBUSY due to backlog correctly
commit e68368aed5 upstream.

public_key_verify_signature() was passing the CRYPTO_TFM_REQ_MAY_BACKLOG
flag to akcipher_request_set_callback() but was not handling correctly
the case where a -EBUSY error could be returned from the call to
crypto_akcipher_verify() if backlog was used, possibly casuing
data corruption due to use-after-free of buffers.

Resolve this by handling -EBUSY correctly.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
c10acee89d ARM: dts: keystone-k2l: fix broken Ethernet due to disabled OSR
commit 791229f1d5 upstream.

Ethernet networking on K2L has been broken since v4.11-rc1. This was
caused by commit 32a34441a9 ("ARM: keystone: dts: fix netcp clocks
and add names"). This commit inadvertently moves on-chip static RAM
clock to the end of list of clocks provided for netcp. Since keystone
PM domain support does not have a list of recognized con_ids, only the
first clock in the list comes under runtime PM management. This means
the OSR (On-chip Static RAM) clock remains disabled and that broke
networking on K2L.

The OSR is used by QMSS on K2L as an external linking RAM. However this
is a standalone RAM that can be used for non-QMSS usage (as well as from
DSP side). So add a SRAM device node for the same and add the OSR clock
to the node.

Remove the now redundant OSR clock node from netcp.

To manage all clocks defined for netCP's use by runtime PM needs keystone
generic power domain (genpd) driver support which is under works.
Meanwhile, this patch restores K2L networking and is correct irrespective
of any future genpd work since OSR is an independent module and not part
of NetCP anyway.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
[nsekhar@ti.com: commit message updates, port to latest mainline]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
ff6c1649b4 ptrace: Properly initialize ptracer_cred on fork
commit c70d9d809f upstream.

When I introduced ptracer_cred I failed to consider the weirdness of
fork where the task_struct copies the old value by default.  This
winds up leaving ptracer_cred set even when a process forks and
the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose
parents were ptraced this has broken the ability of the enlightenment
window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value
of ptracer_cred when ptrace carries through fork.  Re-reading the
ptracer_cred from the ptracing process at this point is inconsistent
with how PT_PTRACE_CAP has been maintained all of these years.

Tested-by: Takashi Iwai <tiwai@suse.de>
Fixes: 64b875f7ac ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ralph Sennhauser <ralph.sennhauser@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
1c8faeebd5 serial: core: fix crash in uart_suspend_port
commit 88e2582e90 upstream.

With serdev we might end up with serial ports that have no cdev exported
to userspace, as they are used as the bus interface to other devices. In
that case serial_match_port() won't be able to find a matching tty_dev.

Skip the irq wakeup enabling in that case, as serdev will make sure to
keep the port active, as long as there are devices depending on it.

Fixes: 8ee3fde047 (tty_port: register tty ports with serdev bus)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
70876e7c02 serial: ifx6x60: fix use-after-free on module unload
commit 1e948479b3 upstream.

Make sure to deregister the SPI driver before releasing the tty driver
to avoid use-after-free in the SPI remove callback where the tty
devices are deregistered.

Fixes: 72d4724ea5 ("serial: ifx6x60: Add modem power off function in the platform reboot process")
Cc: Jun Chen <jun.d.chen@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:37 +02:00
7faf3f5f89 serial: exar: Fix stuck MSIs
commit 2c0ac5b48a upstream.

After migrating 8250_exar to MSI in 172c33cb61, we can get stuck
without further interrupts because of the special wake-up event these
chips send. They are only cleared by reading INT0. As we fail to do so
during startup and shutdown, we can leave the interrupt line asserted,
which is fatal with edge-triggered MSIs.

Add the required reading of INT0 to startup and shutdown. Also account
for the fact that a pending wake-up interrupt means we have to return 1
from exar_handle_irq. Drop the unneeded reading of INT1..3 along with
this - those never reset anything.

An alternative approach would have been disabling the wake-up interrupt.
Unfortunately, this feature (REGB[17] = 1) is not available on the
XR17D15X.

Fixes: 172c33cb61 ("serial: exar: Enable MSI support")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
bc0734aac3 ftrace: Fix memory leak in ftrace_graph_release()
commit f9797c2f20 upstream.

ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not.  This results in a memory leak that is easily
captured by kmemleak:

unreferenced object 0xffff880038afe000 (size 8192):
  comm "trace-cmd", pid 238, jiffies 4294916898 (age 9.736s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff815f561e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff8113964d>] __kmalloc+0x12d/0x1a0
    [<ffffffff810bf6d1>] alloc_ftrace_hash+0x51/0x80
    [<ffffffff810c0523>] __ftrace_graph_open.isra.39.constprop.46+0xa3/0x100
    [<ffffffff810c05e8>] ftrace_graph_open+0x68/0xa0
    [<ffffffff8114003d>] do_dentry_open.isra.1+0x1bd/0x2d0
    [<ffffffff81140df7>] vfs_open+0x47/0x60
    [<ffffffff81150f95>] path_openat+0x2a5/0x1020
    [<ffffffff81152d6a>] do_filp_open+0x8a/0xf0
    [<ffffffff811411df>] do_sys_open+0x12f/0x200
    [<ffffffff811412ce>] SyS_open+0x1e/0x20
    [<ffffffff815fa6e0>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/20170525152038.7661-1-lhenriques@suse.com

Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
1a223727ae arch/sparc: support NR_CPUS = 4096
[ Upstream commit c79a13734d ]

Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>

Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
10e3a2945e sparc64: delete old wrap code
[ Upstream commit 0197e41ce7 ]

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
b60d9051cc sparc64: new context wrap
[ Upstream commit a0582f26ec ]

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
be5e2c7026 sparc64: add per-cpu mm of secondary contexts
[ Upstream commit 7a5b4bbf49 ]

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
d5fb553c51 sparc64: redefine first version
[ Upstream commit c4415235b2 ]

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
557145f44c sparc64: combine activate_mm and switch_mm
[ Upstream commit 14d0334c67 ]

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:36 +02:00
aa6349e030 sparc64: reset mm cpumask after wrap
[ Upstream commit 5889748573 ]

After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
c8c7bb2f5b sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
[ Upstream commit f322980b74 ]

hugetlb_bad_size needs to be called on invalid values.  Also change the
pr_warn to a pr_err to better align with other platforms.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
a091e625ed sparc: Machine description indices can vary
[ Upstream commit c982aa9c30 ]

VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
376452d48e sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
[ Upstream commit 654f480762 ]

When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used.  As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB.  When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB.  We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
35b284c739 sparc64: Add __multi3 for gcc 7.x and later.
[ Upstream commit 1b4af13ff2 ]

Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
6bf499d388 net: stmmac: fix completely hung TX when using TSO
[ Upstream commit 426849e661 ]

stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.

This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.

When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.

When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
d44184237a net: ethoc: enable NAPI before poll may be scheduled
[ Upstream commit d220b942a4 ]

ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a170285772 ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
9719e31a96 net: bridge: fix a null pointer dereference in br_afspec
[ Upstream commit 1020ce3108 ]

We might call br_afspec() with p == NULL which is a valid use case if
the action is on the bridge device itself, but the bridge tunnel code
dereferences the p pointer without checking, so check if p is null
first.

Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Fixes: efa5356b0d ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:35 +02:00
d673c7641c ravb: Fix use-after-free on ifconfig eth0 down
[ Upstream commit 79514ef670 ]

Commit a47b70ea86 ("ravb: unmap descriptors when freeing rings") has
introduced the issue seen in [1] reproduced on H3ULCB board.

Fix this by relocating the RX skb ringbuffer free operation, so that
swiotlb page unmapping can be done first. Freeing of aligned TX buffers
is not relevant to the issue seen in [1]. Still, reposition TX free
calls as well, to have all kfree() operations performed consistently
_after_ dma_unmap_*()/dma_free_*().

[1] Console screenshot with the problem reproduced:

salvator-x login: root
root@salvator-x:~# ifconfig eth0 up
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \
       attached PHY driver [Micrel KSZ9031 Gigabit PHY]   \
       (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235)
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@salvator-x:~#
root@salvator-x:~# ifconfig eth0 down

==================================================================
BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649

CPU: 0 PID: 1649 Comm: ifconfig Not tainted 4.12.0-rc4-00004-g112eb07287d1 #32
Hardware name: Renesas H3ULCB board based on r8a7795 (DT)
Call trace:
[<ffff20000808f11c>] dump_backtrace+0x0/0x3a4
[<ffff20000808f4d4>] show_stack+0x14/0x1c
[<ffff20000865970c>] dump_stack+0xf8/0x150
[<ffff20000831f8b0>] print_address_description+0x7c/0x330
[<ffff200008320010>] kasan_report+0x2e0/0x2f4
[<ffff20000831eac0>] check_memory_region+0x20/0x14c
[<ffff20000831f054>] memcpy+0x48/0x68
[<ffff20000869ed50>] swiotlb_tbl_unmap_single+0xc4/0x35c
[<ffff20000869fcf4>] unmap_single+0x90/0xa4
[<ffff20000869fd14>] swiotlb_unmap_page+0xc/0x14
[<ffff2000080a2974>] __swiotlb_unmap_page+0xcc/0xe4
[<ffff2000088acdb8>] ravb_ring_free+0x514/0x870
[<ffff2000088b25dc>] ravb_close+0x288/0x36c
[<ffff200008aaf8c4>] __dev_close_many+0x14c/0x174
[<ffff200008aaf9b4>] __dev_close+0xc8/0x144
[<ffff200008ac2100>] __dev_change_flags+0xd8/0x194
[<ffff200008ac221c>] dev_change_flags+0x60/0xb0
[<ffff200008ba2dec>] devinet_ioctl+0x484/0x9d4
[<ffff200008ba7b78>] inet_ioctl+0x190/0x194
[<ffff200008a78c44>] sock_do_ioctl+0x78/0xa8
[<ffff200008a7a128>] sock_ioctl+0x110/0x3c4
[<ffff200008365a70>] vfs_ioctl+0x90/0xa0
[<ffff200008365dbc>] do_vfs_ioctl+0x148/0xc38
[<ffff2000083668f0>] SyS_ioctl+0x44/0x74
[<ffff200008083770>] el0_svc_naked+0x24/0x28

The buggy address belongs to the page:
page:ffff7e001b6213c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffff7e001b6213e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8006d884f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8006d884f780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff8006d884f800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
root@salvator-x:~#

Fixes: a47b70ea86 ("ravb: unmap descriptors when freeing rings")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
4990610df0 net/ipv6: Fix CALIPSO causing GPF with datagram support
[ Upstream commit e3ebdb20fd ]

When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
IP header may have moved.

Also update the payload length after adding the CALIPSO option.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
76d36dd1b1 net: ping: do not abuse udp_poll()
[ Upstream commit 77d4b1d369 ]

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
223313c9be net: dsa: Fix stale cpu_switch reference after unbind then bind
[ Upstream commit b07ac98946 ]

Commit 9520ed8fb8 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.

The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.

To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().

Fixes: 9520ed8fb8 ("net: dsa: use cpu_switch instead of ds[0]")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
a32f7198b2 ipv6: Fix leak in ipv6_gso_segment().
[ Upstream commit e3e86b5119 ]

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
f7f87871c4 geneve: fix needed_headroom and max_mtu for collect_metadata
[ Upstream commit 9a1c44d989 ]

Since commit 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.

For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
19456d4526 sock: reset sk_err when the error queue is empty
[ Upstream commit 38b257938a ]

Prior to f5f99309fa (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.

Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.

Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.

Fixes: f5f99309fa (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Tested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:34 +02:00
3ab7563eed ip6_tunnel: fix traffic class routing for tunnels
[ Upstream commit 5f733ee68f ]

ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.

Commit 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.

The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.

Fixes: 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
35e08f6e2b vxlan: fix use-after-free on deletion
[ Upstream commit a53cb29b0a ]

Adding a vxlan interface to a socket isn't symmetrical, while adding
is done in vxlan_open() the deletion is done in vxlan_dellink().
This can cause a use-after-free error when we close the vxlan
interface before deleting it.

We add vxlan_vs_del_dev() to match vxlan_vs_add_dev() and call
it from vxlan_stop() to match the call from vxlan_open().

Fixes: 56ef9c909b ("vxlan: Move socket initialization to within rtnl scope")
Acked-by: Jiri Benc <jbenc@redhat.com>
Tested-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
905ae1b1e6 tcp: disallow cwnd undo when switching congestion control
[ Upstream commit 44abafc4cc ]

When the sender switches its congestion control during loss
recovery, if the recovery is spurious then it may incorrectly
revert cwnd and ssthresh to the older values set by a previous
congestion control. Consider a congestion control (like BBR)
that does not use ssthresh and keeps it infinite: the connection
may incorrectly revert cwnd to an infinite value when switching
from BBR to another congestion control.

This patch fixes it by disallowing such cwnd undo operation
upon switching congestion control.  Note that undo_marker
is not reset s.t. the packets that were incorrectly marked
lost would be corrected. We only avoid undoing the cwnd in
tcp_undo_cwnd_reduction().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
fb62fe105b cxgb4: avoid enabling napi twice to the same queue
[ Upstream commit e7519f9926 ]

Take uld mutex to avoid race between cxgb_up() and
cxgb4_register_uld() to enable napi for the same uld
queue.

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
d935063ac4 ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
[ Upstream commit 6e80ac5cc9 ]

xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
ce45e4f995 net: systemport: Fix missing Wake-on-LAN interrupt for SYSTEMPORT Lite
[ Upstream commit d31353cd75 ]

On SYSTEMPORT Lite, since we have the main interrupt source in the first
cell, the second cell is the Wake-on-LAN interrupt, yet the code was not
properly updated to fetch the second cell, and instead looked at the
third and non-existing cell for Wake-on-LAN.

Fixes: 44a4524c54 ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
1f9f0e2822 vxlan: eliminate cached dst leak
[ Upstream commit 35cf284556 ]

After commit 0c1d70af92 ("net: use dst_cache for vxlan device"),
cached dst entries could be leaked when more than one remote was
present for a given vxlan_fdb entry, causing subsequent netns
operations to block indefinitely and "unregister_netdevice: waiting
for lo to become free." messages to appear in the kernel log.

Fix by properly releasing cached dst and freeing resources in this
case.

Fixes: 0c1d70af92 ("net: use dst_cache for vxlan device")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:33 +02:00
8b8fd1831b net: bridge: start hello timer only if device is up
[ Upstream commit aeb073241f ]

When the transition of NO_STP -> KERNEL_STP was fixed by always calling
mod_timer in br_stp_start, it introduced a new regression which causes
the timer to be armed even when the bridge is down, and since we stop
the timers in its ndo_stop() function, they never get disabled if the
device is destroyed before it's upped.

To reproduce:
$ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on;
ip l del br0; done;

CC: Xin Long <lucien.xin@gmail.com>
CC: Ivan Vecera <cera@cera.cz>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 6d18c732b9 ("bridge: start hello_timer when enabling KERNEL_STP in br_stp_start")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:32 +02:00
f5f67e441e bnx2x: Fix Multi-Cos
[ Upstream commit 3968d38917 ]

Apparently multi-cos isn't working for bnx2x quite some time -
driver implements ndo_select_queue() to allow queue-selection
for FCoE, but the regular L2 flow would cause it to modulo the
fallback's result by the number of queues.
The fallback would return a queue matching the needed tc
[via __skb_tx_hash()], but since the modulo is by the number of TSS
queues where number of TCs is not accounted, transmission would always
be done by a queue configured into using TC0.

Fixes: ada7c19e6d ("bnx2x: use XPS if possible for bnx2x_select_queue instead of pure hash")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 15:07:32 +02:00
553c942bef Linux 4.11.4 2017-06-07 12:10:31 +02:00
b5ff97c774 xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
commit d7fd24257a upstream.

There is an off-by-one error in loop termination conditions in
xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:17 +02:00
d514c634a4 xfs: fix unaligned access in xfs_btree_visit_blocks
commit a4d768e702 upstream.

This structure copy was throwing unaligned access warnings on sparc64:

Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]

xfs_btree_copy_ptrs does a memcpy, which avoids it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:17 +02:00
fff5729ce2 xfs: avoid mount-time deadlock in CoW extent recovery
commit 3ecb3ac7b9 upstream.

If a malicious user corrupts the refcount btree to cause a cycle between
different levels of the tree, the next mount attempt will deadlock in
the CoW recovery routine while grabbing buffer locks.  We can use the
ability to re-grab a buffer that was previous locked to a transaction to
avoid deadlocks, so do that here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:16 +02:00
ecb4261526 xfs: xfs_trans_alloc_empty
This is a partial cherry-pick of commit e89c041338
("xfs: implement the GETFSMAP ioctl"), which also adds this helper, and
a great example of why feature patches should be properly split into
their parts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: split from the larger patch for -stable]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-07 12:10:16 +02:00
2e08bd63dc xfs: bad assertion for delalloc an extent that start at i_size
commit 892d2a5f70 upstream.

By run fsstress long enough time enough in RHEL-7, I find an
assertion failure (harder to reproduce on linux-4.11, but problem
is still there):

  XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c

The assertion is in xfs_getbmap() funciton:

  if (map[i].br_startblock == DELAYSTARTBLOCK &&
-->   map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip)))
          ASSERT((iflags & BMV_IF_DELALLOC) != 0);

When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the
startoff is just at EOF. But we only need to make sure delalloc
extents that are within EOF, not include EOF.

Signed-off-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:16 +02:00
2dc6e27120 xfs: BMAPX shouldn't barf on inline-format directories
commit 6eadbf4c8b upstream.

When we're fulfilling a BMAPX request, jump out early if the data fork
is in local format.  This prevents us from hitting a debugging check in
bmapi_read and barfing errors back to userspace.  The on-disk extent
count check later isn't sufficient for IF_DELALLOC mode because da
extents are in memory and not on disk.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:16 +02:00
1ae26380c8 xfs: fix indlen accounting error on partial delalloc conversion
commit 0daaecacb8 upstream.

The delalloc -> real block conversion path uses an incorrect
calculation in the case where the middle part of a delalloc extent
is being converted. This is documented as a rare situation because
XFS generally attempts to maximize contiguity by converting as much
of a delalloc extent as possible.

If this situation does occur, the indlen reservation for the two new
delalloc extents left behind by the conversion of the middle range
is calculated and compared with the original reservation. If more
blocks are required, the delta is allocated from the global block
pool. This delta value can be characterized as the difference
between the new total requirement (temp + temp2) and the currently
available reservation minus those blocks that have already been
allocated (startblockval(PREV.br_startblock) - allocated).

The problem is that the current code does not account for previously
allocated blocks correctly. It subtracts the current allocation
count from the (new - old) delta rather than the old indlen
reservation. This means that more indlen blocks than have been
allocated end up stashed in the remaining extents and free space
accounting is broken as a result.

Fix up the calculation to subtract the allocated block count from
the original extent indlen and thus correctly allocate the
reservation delta based on the difference between the new total
requirement and the unused blocks from the original reservation.
Also remove a bogus assert that contradicts the fact that the new
indlen reservation can be larger than the original indlen
reservation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:16 +02:00
d1a8ae21c3 xfs: fix use-after-free in xfs_finish_page_writeback
commit 161f55efba upstream.

Commit 28b783e47a ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.

This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.

[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975]  dump_stack+0x63/0x84
[ 2747.924673]  kasan_object_err+0x21/0x70
[ 2747.928950]  kasan_report+0x271/0x530
[ 2747.933064]  ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409]  ? end_page_writeback+0xce/0x110
[ 2747.943171]  __asan_report_load8_noabort+0x19/0x20
[ 2747.948545]  xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724]  xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197]  process_one_work+0x5ff/0x1000
[ 2747.962766]  worker_thread+0xe4/0x10e0
[ 2747.966946]  kthread+0x2d3/0x3d0
[ 2747.970546]  ? process_one_work+0x1000/0x1000
[ 2747.975405]  ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457]  ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706]  ? do_page_fault+0x30/0x80
[ 2747.989887]  ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411]  save_stack_trace+0x1b/0x20
[ 2748.010688]  save_stack+0x46/0xd0
[ 2748.014383]  kasan_kmalloc+0xad/0xe0
[ 2748.018370]  kasan_slab_alloc+0x12/0x20
[ 2748.022648]  kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024]  alloc_buffer_head+0x22/0xc0
[ 2748.031399]  alloc_page_buffers+0xd1/0x250
[ 2748.035968]  create_empty_buffers+0x30/0x410
[ 2748.040730]  create_page_buffers+0x120/0x1b0
[ 2748.045493]  __block_write_begin_int+0x17a/0x1800
[ 2748.050740]  iomap_write_begin+0x100/0x2f0
[ 2748.055308]  iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362]  iomap_apply+0x157/0x270
[ 2748.064347]  iomap_zero_range+0x5a/0x80
[ 2748.068624]  iomap_truncate_page+0x6b/0xa0
[ 2748.073227]  xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312]  xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589]  xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838]  vfs_fallocate+0x2cf/0x780
[ 2748.093021]  SyS_fallocate+0x48/0x80
[ 2748.097006]  do_syscall_64+0x18a/0x430
[ 2748.101186]  return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816]  save_stack_trace+0x1b/0x20
[ 2748.115093]  save_stack+0x46/0xd0
[ 2748.118788]  kasan_slab_free+0x73/0xc0
[ 2748.122969]  kmem_cache_free+0x7a/0x200
[ 2748.127247]  free_buffer_head+0x41/0x80
[ 2748.131524]  try_to_free_buffers+0x178/0x250
[ 2748.136316]  xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563]  try_to_release_page+0x100/0x180
[ 2748.146325]  invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087]  xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557]  xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223]  xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462]  vfs_fallocate+0x2cf/0x780
[ 2748.172642]  SyS_fallocate+0x48/0x80
[ 2748.176629]  do_syscall_64+0x18a/0x430
[ 2748.180810]  return_from_SYSCALL_64+0x0/0x6a

Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
365625a670 xfs: reserve enough blocks to handle btree splits when remapping
commit fe0be23e68 upstream.

In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent.  This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.

Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.

With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
67e439ccfe xfs: wait on new inodes during quotaoff dquot release
commit e20c8a517f upstream.

The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
f8c68633bb xfs: update ag iterator to support wait on new inodes
commit ae2c4ac2dd upstream.

The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
56aab3095b xfs: support ability to wait on new inodes
commit 756baca27f upstream.

Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
bf16242614 xfs: fix up quotacheck buffer list error handling
commit 20e8a06378 upstream.

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:15 +02:00
89aab40e12 xfs: prevent multi-fsb dir readahead from reading random blocks
commit cb52ee334a upstream.

Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.

This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.

Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.

Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
91bb4f7da5 xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
commit 023cc840b4 upstream.

Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
23da04dcc3 xfs: fix integer truncation in xfs_bmap_remap_alloc
commit 52813fb13f upstream.

bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
eff615a670 xfs: drop iolock from reclaim context to appease lockdep
commit 3b4683c294 upstream.

Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.

The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
8a1f785887 xfs: actually report xattr extents via iomap
commit 84358536dc upstream.

Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
1c862f5a3e xfs: fix over-copying of getbmap parameters from userspace
commit be6324c00c upstream.

In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
2cd6bd867a xfs: use dedicated log worker wq to avoid deadlock with cil wq
commit 696a562072 upstream.

The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608df ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608df.

Diagnosed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:14 +02:00
a19348b729 xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
commit 8affebe16d upstream.

xfs_find_get_desired_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.

When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size XFS on x86_64 host.

  # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
  	    -c "seek -d 0" /mnt/xfs/testfile
  wrote 1024/1024 bytes at offset 2048
  1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
  Whence  Result
  DATA    EOF

Data at offset 2k was missed, and lseek(2) returned ENXIO.

This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
0364c225a5 xfs: use ->b_state to fix buffer I/O accounting release race
commit 63db7c815b upstream.

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa72 ("xfs: track and serialize in-flight async buffers against unmount")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
34836549fb xfs: Fix missed holes in SEEK_HOLE implementation
commit 5375023ae1 upstream.

XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
can be seen by the following command:

xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
       -c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
Whence	Result
HOLE	139264

Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
implementation. The bug is in xfs_find_get_desired_pgoff() which does
not properly detect the case when pages are not contiguous.

Fix the problem by properly detecting when found page has larger offset
than expected.

Fixes: d126d43f63
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
4ec50822a5 drm/gma500/psb: Actually use VBT mode when it is found
commit 82bc9a42cf upstream.

With LVDS we were incorrectly picking the pre-programmed mode instead of
the prefered mode provided by VBT. Make sure we pick the VBT mode if
one is provided. It is likely that the mode read-out code is still wrong
but this patch fixes the immediate problem on most machines.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78562
Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170418114332.12183-1-patrik.r.jakobsson@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
74b4db844f slub/memcg: cure the brainless abuse of sysfs attributes
commit 478fe3037b upstream.

memcg_propagate_slab_attrs() abuses the sysfs attribute file functions
to propagate settings from the root kmem_cache to a newly created
kmem_cache.  It does that with:

     attr->show(root, buf);
     attr->store(new, buf, strlen(bug);

Aside of being a lazy and absurd hackery this is broken because it does
not check the return value of the show() function.

Some of the show() functions return 0 w/o touching the buffer.  That
means in such a case the store function is called with the stale content
of the previous show().  That causes nonsense like invoking
kmem_cache_shrink() on a newly created kmem_cache.  In the worst case it
would cause handing in an uninitialized buffer.

This should be rewritten proper by adding a propagate() callback to
those slub_attributes which must be propagated and avoid that insane
conversion to and from ASCII, but that's too large for a hot fix.

Check at least the return value of the show() function, so calling
store() with stale content is prevented.

Steven said:
 "It can cause a deadlock with get_online_cpus() that has been uncovered
  by recent cpu hotplug and lockdep changes that Thomas and Peter have
  been doing.

     Possible unsafe locking scenario:

           CPU0                    CPU1
           ----                    ----
      lock(cpu_hotplug.lock);
                                   lock(slab_mutex);
                                   lock(cpu_hotplug.lock);
      lock(slab_mutex);

     *** DEADLOCK ***"

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705201244540.2255@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
6caa2db34d ksm: prevent crash after write_protect_page fails
commit a7306c3436 upstream.

"err" needs to be left set to -EFAULT if split_huge_page succeeds.
Otherwise if "err" gets clobbered with zero and write_protect_page
fails, try_to_merge_one_page() will succeed instead of returning -EFAULT
and then try_to_merge_with_ksm_page() will continue thinking kpage is a
PageKsm when in fact it's still an anonymous page.  Eventually it'll
crash in page_add_anon_rmap.

This has been reproduced on Fedora25 kernel but I can reproduce with
upstream too.

The bug was introduced in commit f765f54059 ("ksm: prepare to new THP
semantics") introduced in v4.5.

    page:fffff67546ce1cc0 count:4 mapcount:2 mapping:ffffa094551e36e1 index:0x7f0f46673
    flags: 0x2ffffc0004007c(referenced|uptodate|dirty|lru|active|swapbacked)
    page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
    page->mem_cgroup:ffffa09674bf0000
    ------------[ cut here ]------------
    kernel BUG at mm/rmap.c:1222!
    CPU: 1 PID: 76 Comm: ksmd Not tainted 4.9.3-200.fc25.x86_64 #1
    RIP: do_page_add_anon_rmap+0x1c4/0x240
    Call Trace:
      page_add_anon_rmap+0x18/0x20
      try_to_merge_with_ksm_page+0x50b/0x780
      ksm_scan_thread+0x1211/0x1410
      ? prepare_to_wait_event+0x100/0x100
      ? try_to_merge_with_ksm_page+0x780/0x780
      kthread+0xd9/0xf0
      ? kthread_park+0x60/0x60
      ret_from_fork+0x25/0x30

Fixes: f765f54059 ("ksm: prepare to new THP semantics")
Link: http://lkml.kernel.org/r/20170513131040.21732-1-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Federico Simoncelli <fsimonce@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:13 +02:00
d6eaf7a4d6 x86/boot: Use CROSS_COMPILE prefix for readelf
commit 3780578761 upstream.

The boot code Makefile contains a straight 'readelf' invocation. This
causes build warnings in cross compile environments, when there is no
unprefixed readelf accessible via $PATH.

Add the missing $(CROSS_COMPILE) prefix.

[ tglx: Rewrote changelog ]

Fixes: 98f7852537 ("x86/boot: Refuse to build with data relocations")
Signed-off-by: Rob Landley <rob@landley.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/r/ced18878-693a-9576-a024-113ef39a22c0@landley.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
f7e82ab3b6 RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
commit 1feb40067c upstream.

The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.

The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten.  The
the peer will send the identical single packet request when its RNR
timer pops.

The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.

Tested-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
68c98967e7 RDMA/srp: Fix NULL deref at srp_destroy_qp()
commit 95c2ef50c7 upstream.

If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.

Fixes: 9294000d6d ("IB/srp: Drain the send queue before destroying a QP")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>--
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
dbab023245 mm: consider memblock reservations for deferred memory initialization sizing
commit 864b9a393d upstream.

We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:

	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
	kthreadd cpuset=/ mems_allowed=7
	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
	Call Trace:
	  dump_stack+0xb0/0xf0 (unreliable)
	  dump_header+0xb0/0x258
	  out_of_memory+0x5f0/0x640
	  __alloc_pages_nodemask+0xa8c/0xc80
	  kmem_getpages+0x84/0x1a0
	  fallback_alloc+0x2a4/0x320
	  kmem_cache_alloc_node+0xc0/0x2e0
	  copy_process.isra.25+0x260/0x1b30
	  _do_fork+0x94/0x470
	  kernel_thread+0x48/0x60
	  kthreadd+0x264/0x330
	  ret_from_kernel_thread+0x5c/0xa4

	Mem-Info:
	active_anon:0 inactive_anon:0 isolated_anon:0
	 active_file:0 inactive_file:0 isolated_file:0
	 unevictable:0 dirty:0 writeback:0 unstable:0
	 slab_reclaimable:5 slab_unreclaimable:73
	 mapped:0 shmem:0 pagetables:0 bounce:0
	 free:0 free_pcp:0 free_cma:0
	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
	lowmem_reserve[]: 0 0 0 0
	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
	0 total pagecache pages
	0 pages in swap cache
	Swap cache stats: add 0, delete 0, find 0/0
	Free swap  = 0kB
	Total swap = 0kB
	819200 pages RAM
	0 pages HighMem/MovableOnly
	817481 pages reserved
	0 pages cma reserved
	0 pages hwpoisoned

the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done.  update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range.  In this particular case we've had

	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)

so the low 2GB is mostly depleted.

Fix this by considering memblock allocations in the initial static
initialization estimation.  Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address.  The cumulative size is than added on top
of the initial estimation.  This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.

Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
1cc8926344 mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified
commit 9a291a7c94 upstream.

KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
special case.  (check_user_page_hwpoison())

When huge pages are involved, this doesn't work so well.
get_user_pages() calls follow_hugetlb_page(), which stops early if it
receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
-EFAULT to the caller.  The step to map this to -EHWPOISON based on the
FOLL_ flags is missing.  The hwpoison special case is skipped, and
-EFAULT is returned to user-space, causing Qemu or kvmtool to exit.

Instead, move this VM_FAULT_ to errno mapping code into a header file
and use it from faultin_page() and follow_hugetlb_page().

With this, KVM works as expected.

This isn't a problem for arm64 today as we haven't enabled
MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
too, so I think this should be a fix.  This doesn't apply earlier than
stable's v4.11.1 due to all sorts of cleanup.

[james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
suggested.
  Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
f814bf4655 mlock: fix mlock count can not decrease in race condition
commit 70feee0e1e upstream.

Kefeng reported that when running the follow test, the mlock count in
meminfo will increase permanently:

 [1] testcase
 linux:~ # cat test_mlockal
 grep Mlocked /proc/meminfo
  for j in `seq 0 10`
  do
 	for i in `seq 4 15`
 	do
 		./p_mlockall >> log &
 	done
 	sleep 0.2
 done
 # wait some time to let mlock counter decrease and 5s may not enough
 sleep 5
 grep Mlocked /proc/meminfo

 linux:~ # cat p_mlockall.c
 #include <sys/mman.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define SPACE_LEN	4096

 int main(int argc, char ** argv)
 {
	 	int ret;
	 	void *adr = malloc(SPACE_LEN);
	 	if (!adr)
	 		return -1;

	 	ret = mlockall(MCL_CURRENT | MCL_FUTURE);
	 	printf("mlcokall ret = %d\n", ret);

	 	ret = munlockall();
	 	printf("munlcokall ret = %d\n", ret);

	 	free(adr);
	 	return 0;
	 }

In __munlock_pagevec() we should decrement NR_MLOCK for each page where
we clear the PageMlocked flag.  Commit 1ebb7cc6a5 ("mm: munlock: batch
NR_MLOCK zone state updates") has introduced a bug where we don't
decrement NR_MLOCK for pages where we clear the flag, but fail to
isolate them from the lru list (e.g.  when the pages are on some other
cpu's percpu pagevec).  Since PageMlocked stays cleared, the NR_MLOCK
accounting gets permanently disrupted by this.

Fix it by counting the number of page whose PageMlock flag is cleared.

Fixes: 1ebb7cc6a5 (" mm: munlock: batch NR_MLOCK zone state updates")
Link: http://lkml.kernel.org/r/1495678405-54569-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joern Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: zhongjiang <zhongjiang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:12 +02:00
a0189db30d mm/migrate: fix refcount handling when !hugepage_migration_supported()
commit 30809f559a upstream.

On failing to migrate a page, soft_offline_huge_page() performs the
necessary update to the hugepage ref-count.

But when !hugepage_migration_supported() , unmap_and_move_hugepage()
also decrements the page ref-count for the hugepage.  The combined
behaviour leaves the ref-count in an inconsistent state.

This leads to soft lockups when running the overcommitted hugepage test
from mce-tests suite.

  Soft offlining pfn 0x83ed600 at process virtual address 0x400000000000
  soft offline: 0x83ed600: migration failed 1, type 1fffc00000008008 (uptodate|head)
  INFO: rcu_preempt detected stalls on CPUs/tasks:
   Tasks blocked on level-0 rcu_node (CPUs 0-7): P2715
    (detected by 7, t=5254 jiffies, g=963, c=962, q=321)
    thugetlb_overco R  running task        0  2715   2685 0x00000008
    Call trace:
      dump_backtrace+0x0/0x268
      show_stack+0x24/0x30
      sched_show_task+0x134/0x180
      rcu_print_detail_task_stall_rnp+0x54/0x7c
      rcu_check_callbacks+0xa74/0xb08
      update_process_times+0x34/0x60
      tick_sched_handle.isra.7+0x38/0x70
      tick_sched_timer+0x4c/0x98
      __hrtimer_run_queues+0xc0/0x300
      hrtimer_interrupt+0xac/0x228
      arch_timer_handler_phys+0x3c/0x50
      handle_percpu_devid_irq+0x8c/0x290
      generic_handle_irq+0x34/0x50
      __handle_domain_irq+0x68/0xc0
      gic_handle_irq+0x5c/0xb0

Address this by changing the putback_active_hugepage() in
soft_offline_huge_page() to putback_movable_pages().

This only triggers on systems that enable memory failure handling
(ARCH_SUPPORTS_MEMORY_FAILURE) but not hugepage migration
(!ARCH_ENABLE_HUGEPAGE_MIGRATION).

I imagine this wasn't triggered as there aren't many systems running
this configuration.

[akpm@linux-foundation.org: remove dead comment, per Naoya]
Link: http://lkml.kernel.org/r/20170525135146.32011-1-punit.agrawal@arm.com
Reported-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
bdaac1fe71 dax: fix race between colliding PMD & PTE entries
commit e2093926a0 upstream.

We currently have two related PMD vs PTE races in the DAX code.  These
can both be easily triggered by having two threads reading and writing
simultaneously to the same private mapping, with the key being that
private mapping reads can be handled with PMDs but private mapping
writes are always handled with PTEs so that we can COW.

Here is the first race:

  CPU 0					CPU 1

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
    handle_pte_fault()
      passes check for pmd_devmap()

					(private mapping read)
					__handle_mm_fault()
					  create_huge_pmd()
					    dax_iomap_pmd_fault() inserts PMD

      dax_iomap_pte_fault() does a PTE fault, but we already have a DAX PMD
      			  installed in our page tables at this spot.

Here's the second race:

  CPU 0					CPU 1

  (private mapping read)
  __handle_mm_fault()
    passes check for pmd_none()
    create_huge_pmd()
      dax_iomap_pmd_fault() inserts PMD

  (private mapping write)
  __handle_mm_fault()
    create_huge_pmd() - FALLBACK
					(private mapping read)
					__handle_mm_fault()
					  passes check for pmd_none()
					  create_huge_pmd()

    handle_pte_fault()
      dax_iomap_pte_fault() inserts PTE
					    dax_iomap_pmd_fault() inserts PMD,
					       but we already have a PTE at
					       this spot.

The core of the issue is that while there is isolation between faults to
the same range in the DAX fault handlers via our DAX entry locking,
there is no isolation between faults in the code in mm/memory.c.  This
means for instance that this code in __handle_mm_fault() can run:

	if (pmd_none(*vmf.pmd) && transparent_hugepage_enabled(vma)) {
		ret = create_huge_pmd(&vmf);

But by the time we actually get to run the fault handler called by
create_huge_pmd(), the PMD is no longer pmd_none() because a racing PTE
fault has installed a normal PMD here as a parent.  This is the cause of
the 2nd race.  The first race is similar - there is the following check
in handle_pte_fault():

	} else {
		/* See comment in pte_alloc_one_map() */
		if (pmd_devmap(*vmf->pmd) || pmd_trans_unstable(vmf->pmd))
			return 0;

So if a pmd_devmap() PMD (a DAX PMD) has been installed at vmf->pmd, we
will bail and retry the fault.  This is correct, but there is nothing
preventing the PMD from being installed after this check but before we
actually get to the DAX PTE fault handlers.

In my testing these races result in the following types of errors:

  BUG: Bad rss-counter state mm:ffff8800a817d280 idx:1 val:1
  BUG: non-zero nr_ptes on freeing mm: 15

Fix this issue by having the DAX fault handlers verify that it is safe
to continue their fault after they have taken an entry lock to block
other racing faults.

[ross.zwisler@linux.intel.com: improve fix for colliding PMD & PTE entries]
  Link: http://lkml.kernel.org/r/20170526195932.32178-1-ross.zwisler@linux.intel.com
Link: http://lkml.kernel.org/r/20170522215749.23516-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Pawel Lebioda <pawel.lebioda@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
b16e6ab5ad mm: avoid spurious 'bad pmd' warning messages
commit d0f0931de9 upstream.

When the pmd_devmap() checks were added by 5c7fb56e5e ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
de12c73fa2 mm/page_alloc.c: make sure OOM victim can try allocations with no watermarks once
commit c288983ddd upstream.

Roman Gushchin has reported that the OOM killer can trivially selects
next OOM victim when a thread doing memory allocation from page fault
path was selected as first OOM victim.

    allocate invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     __alloc_pages_slowpath+0xc84/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 492 (allocate) score 899 or sacrifice child
    Killed process 492 (allocate) total-vm:2052368kB, anon-rss:1894576kB, file-rss:4kB, shmem-rss:0kB
    allocate: page allocation failure: order:0, mode:0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null)
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     __alloc_pages_slowpath+0xd32/0xd40
     __alloc_pages_nodemask+0x245/0x260
     alloc_pages_vma+0xa2/0x270
     __handle_mm_fault+0xca9/0x10c0
     handle_mm_fault+0xf3/0x210
     __do_page_fault+0x240/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    oom_reaper: reaped process 492 (allocate), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    ...
    allocate invoked oom-killer: gfp_mask=0x0(), nodemask=(null),  order=0, oom_score_adj=0
    allocate cpuset=/ mems_allowed=0
    CPU: 1 PID: 492 Comm: allocate Not tainted 4.12.0-rc1-mm1+ #181
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
    Call Trace:
     oom_kill_process+0x219/0x3e0
     out_of_memory+0x11d/0x480
     pagefault_out_of_memory+0x68/0x80
     mm_fault_error+0x8f/0x190
     ? handle_mm_fault+0xf3/0x210
     __do_page_fault+0x4b2/0x4e0
     trace_do_page_fault+0x37/0xe0
     do_async_page_fault+0x19/0x70
     async_page_fault+0x28/0x30
    ...
    Out of memory: Kill process 233 (firewalld) score 10 or sacrifice child
    Killed process 233 (firewalld) total-vm:246076kB, anon-rss:20956kB, file-rss:0kB, shmem-rss:0kB

There is a race window that the OOM reaper completes reclaiming the
first victim's memory while nothing but mutex_trylock() prevents the
first victim from calling out_of_memory() from pagefault_out_of_memory()
after memory allocation for page fault path failed due to being selected
as an OOM victim.

This is a side effect of commit 9a67f6488e ("mm: consolidate
GFP_NOFAIL checks in the allocator slowpath") because that commit
silently changed the behavior from

    /* Avoid allocations with no watermarks from looping endlessly */

to

    /*
     * Give up allocations without trying memory reserves if selected
     * as an OOM victim
     */

in __alloc_pages_slowpath() by moving the location to check TIF_MEMDIE
flag.  I have noticed this change but I didn't post a patch because I
thought it is an acceptable change other than noise by warn_alloc()
because !__GFP_NOFAIL allocations are allowed to fail.  But we
overlooked that failing memory allocation from page fault path makes
difference due to the race window explained above.

While it might be possible to add a check to pagefault_out_of_memory()
that prevents the first victim from calling out_of_memory() or remove
out_of_memory() from pagefault_out_of_memory(), changing
pagefault_out_of_memory() does not suppress noise by warn_alloc() when
allocating thread was selected as an OOM victim.  There is little point
with printing similar backtraces and memory information from both
out_of_memory() and warn_alloc().

Instead, if we guarantee that current thread can try allocations with no
watermarks once when current thread looping inside
__alloc_pages_slowpath() was selected as an OOM victim, we can follow "who
can use memory reserves" rules and suppress noise by warn_alloc() and
prevent memory allocations from page fault path from calling
pagefault_out_of_memory().

If we take the comment literally, this patch would do

  -    if (test_thread_flag(TIF_MEMDIE))
  -        goto nopage;
  +    if (alloc_flags == ALLOC_NO_WATERMARKS || (gfp_mask & __GFP_NOMEMALLOC))
  +        goto nopage;

because gfp_pfmemalloc_allowed() returns false if __GFP_NOMEMALLOC is
given.  But if I recall correctly (I couldn't find the message), the
condition is meant to apply to only OOM victims despite the comment.
Therefore, this patch preserves TIF_MEMDIE check.

Fixes: 9a67f6488e ("mm: consolidate GFP_NOFAIL checks in the allocator slowpath")
Link: http://lkml.kernel.org/r/201705192112.IAF69238.OQOHSJLFOFFMtV@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Roman Gushchin <guro@fb.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
af03bb0cab ALSA: usb: Fix a typo in Tascam US-16x08 mixer element
commit 617163fc25 upstream.

A mixer element created in a quirk for Tascam US-16x08 contains a
typo: it should be "EQ MidLow Q" instead of "EQ MidQLow Q".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Fixes: d2bb390a20 ("ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
0be9a9a422 Revert "ALSA: usb-audio: purge needless variable length array"
commit 64188cfbe5 upstream.

This reverts commit 89b593c30e ("ALSA: usb-audio: purge needless
variable length array").  The patch turned out to cause a severe
regression, triggering an Oops at snd_usb_ctl_msg().  It was overseen
that snd_usb_ctl_msg() writes back the response to the given buffer,
while the patch changed it to a read-only const buffer.  (One should
always double-check when an extra pointer cast is present...)

As a simple fix, just revert the affected commit.  It was merely a
cleanup.  Although it brings VLA again, it's clearer as a fix.  We'll
address the VLA later in another patch.

Fixes: 89b593c30e ("ALSA: usb-audio: purge needless variable length array")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195875
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
ffb97b001b ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
commit 1fc2e41f7a upstream.

This model is actually called 92XXM2-8 in Windows driver. But since pin
configs for M22 and M28 are identical, just reuse M22 quirk.

Fixes external microphone (tested) and probably docking station ports
(not tested).

Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:11 +02:00
0c4afdc6d8 ALSA: hda - No loopback on ALC299 codec
commit fa16b69f12 upstream.

ALC299 has no loopback mixer, but the driver still tries to add a beep
control over the mixer NID which leads to the error at accessing it.
This patch fixes it by properly declaring mixer_nid=0 for this codec.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195775
Fixes: 28f1f9b26c ("ALSA: hda/realtek - Add new codec ID ALC299")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
d5bc54d0a3 pcmcia: remove left-over %Z format
commit ff5a20169b upstream.

Commit 5b5e0928f7 ("lib/vsprintf.c: remove %Z support") removed some
usages of format %Z but forgot "%.2Zx".  This makes clang 4.0 reports a
-Wformat-extra-args warning because it does not know about %Z.

Replace %Z with %z.

Link: http://lkml.kernel.org/r/20170520090946.22562-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Harald Welte <laforge@gnumonks.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
2609770993 drm/radeon: Unbreak HPD handling for r600+
commit 3d18e33735 upstream.

We end up reading the interrupt register for HPD5, and then writing it
to HPD6 which on systems without anything using HPD5 results in
permanently disabling hotplug on one of the display outputs after the
first time we acknowledge a hotplug interrupt from the GPU.

This code is really bad. But for now, let's just fix this. I will
hopefully have a large patch series to refactor all of this soon.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lyude <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
6b693bbf9c drm/radeon/ci: disable mclk switching for high refresh rates (v2)
commit 58d7e3e427 upstream.

Even if the vblank period would allow it, it still seems to
be problematic on some cards.

v2: fix logic inversion (Nils)

bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
d6ba1a4407 drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
commit 2275a3a2fe upstream.

Even if the vblank period would allow it, it still seems to
be problematic on some cards.

bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
662dbfcc66 drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
commit 09be4a5219 upstream.

Check to make sure the vblank period is long enough to support
mclk switching.

v2: drop needless initial assignment (Nils)

bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
a8aa8a0c10 nvme: avoid to use blk_mq_abort_requeue_list()
commit 986f75c876 upstream.

NVMe may add request into requeue list simply and not kick off the
requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
is called in both nvme_kill_queues() and nvme_ns_remove() for
dealing with this issue.

Unfortunately blk_mq_abort_requeue_list() is absolutely a
race maker, for example, one request may be requeued during
the aborting. So this patch just calls blk_mq_kick_requeue_list() in
nvme_kill_queues() to handle this issue like what nvme_start_queues()
does. Now all requests in requeue list when queues are stopped will be
handled by blk_mq_kick_requeue_list() when queues are restarted, either
in nvme_start_queues() or in nvme_kill_queues().

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:10 +02:00
20c03f455c nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
commit 806f026f9b upstream.

Inside nvme_kill_queues(), we have to start hw queues for
draining requests in sw queues, .dispatch list and requeue list,
so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
which only run queues if queues are stopped, but the queues may have
been started already, for example nvme_start_queues() is called in reset work
function.

blk_mq_start_hw_queues() run hw queues in current context, instead
of running asynchronously like before. Given nvme_kill_queues() is
run from either remove context or reset worker context, both are fine
to run hw queue directly. And the mutex of namespaces_mutex isn't a
problem too becasue nvme_start_freeze() runs hw queue in this way
already.

Reported-by: Zhang Yi <yizhan@redhat.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
0fe9c55195 nvme-rdma: support devices with queue size < 32
commit 0544f5494a upstream.

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
f88d3d6e4f HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
commit 2ac97f0f66 upstream.

The following Smatch complaint was generated in response to commit
2a6cdbd ("HID: wacom: Introduce new 'touch_input' device"):

    drivers/hid/wacom_wac.c:1586 wacom_tpc_irq()
             error: we previously assumed 'wacom->touch_input' could be null (see line 1577)

The 'touch_input' and 'pen_input' variables point to the 'struct input_dev'
used for relaying touch and pen events to userspace, respectively. If a
device does not have a touch interface or pen interface, the associated
input variable is NULL. The 'wacom_tpc_irq()' function is responsible for
forwarding input reports to a more-specific IRQ handler function. An
unknown report could theoretically be mistaken as e.g. a touch report
on a device which does not have a touch interface. This can be prevented
by only calling the pen/touch functions are called when the pen/touch
pointers are valid.

Fixes: 2a6cdbd ("HID: wacom: Introduce new 'touch_input' device")
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
8d975ebd0a ibmvscsis: Fix the incorrect req_lim_delta
commit 75dbf2d36f upstream.

The current code is not correctly calculating the req_lim_delta.

We want to make sure vscsi->credit is always incremented when
we do not send a response for the scsi op. Thus for the case where
there is a successfully aborted task we need to make sure the
vscsi->credit is incremented.

v2 - Moves the original location of the vscsi->credit increment
to a better spot. Since if we increment credit, the next command
we send back will have increased req_lim_delta. But we probably
shouldn't be doing that until the aborted cmd is actually released.
Otherwise the client will think that it can send a new command, and
we could find ourselves short of command elements. Not likely, but could
happen.

This patch depends on both:
commit 25e7853126 ("ibmvscsis: Do not send aborted task response")
commit 98883f1b54 ("ibmvscsis: Clear left-over abort_cmd pointers")

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
e920be8367 ibmvscsis: Clear left-over abort_cmd pointers
commit 98883f1b54 upstream.

With the addition of ibmvscsis->abort_cmd pointer within
commit 25e7853126 ("ibmvscsis: Do not send aborted task response"),
make sure to explicitly NULL these pointers when clearing
DELAY_SEND flag.

Do this for two cases, when getting the new new ibmvscsis
descriptor in ibmvscsis_get_free_cmd() and before posting
the response completion in ibmvscsis_send_messages().

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
1fb66c6aad scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get()
commit 0648a07c9b upstream.

rdac_failover_get references struct rdac_controller as
ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
inefficient this also introduces a null-pointer dereference as
send_mode_select() sets ctlr->ms_sdev to NULL before calling
rdac_failover_get():

[   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
[   18.436124] BUG: unable to handle kernel NULL pointer dereference at 0000000000000790
[   18.436129] IP: send_mode_select+0xca/0x560
[   18.436129] PGD 0
[   18.436130] P4D 0
[   18.436130]
[   18.436132] Oops: 0000 [#1] SMP
[   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core dm_mirror dm_region_hash dm_log dm_mod
[   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 4.12.0-rc1.1.el7.test.x86_64 #1
[   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, BIOS -[L8E164AUS-1.07]- 05/25/2011
[   18.436145] Workqueue: kmpath_rdacd send_mode_select
[   18.436146] task: ffff880225116a40 task.stack: ffffc90002bd8000
[   18.436148] RIP: 0010:send_mode_select+0xca/0x560
[   18.436148] RSP: 0018:ffffc90002bdbda8 EFLAGS: 00010246
[   18.436149] RAX: 0000000000000000 RBX: ffffc90002bdbe08 RCX: ffff88017ef04a80
[   18.436150] RDX: ffffc90002bdbe08 RSI: ffff88017ef04a80 RDI: ffff8802248e4388
[   18.436151] RBP: ffffc90002bdbe48 R08: 0000000000000000 R09: ffffffff81c104c0
[   18.436151] R10: 00000000000001ff R11: 000000000000035a R12: ffffc90002bdbdd8
[   18.436152] R13: ffff8802248e4390 R14: ffff880225152800 R15: ffff8802248e4400
[   18.436153] FS:  0000000000000000(0000) GS:ffff880227d00000(0000) knlGS:0000000000000000
[   18.436154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.436154] CR2: 0000000000000790 CR3: 000000042535b000 CR4: 00000000000006e0
[   18.436155] Call Trace:
[   18.436159]  ? rdac_activate+0x14e/0x150
[   18.436161]  ? refcount_dec_and_test+0x11/0x20
[   18.436162]  ? kobject_put+0x1c/0x50
[   18.436165]  ? scsi_dh_activate+0x6f/0xd0
[   18.436168]  process_one_work+0x149/0x360
[   18.436170]  worker_thread+0x4d/0x3c0
[   18.436172]  kthread+0x109/0x140
[   18.436173]  ? rescuer_thread+0x380/0x380
[   18.436174]  ? kthread_park+0x60/0x60
[   18.436176]  ret_from_fork+0x2c/0x40
[   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
[   18.436192] RIP: send_mode_select+0xca/0x560 RSP: ffffc90002bdbda8
[   18.436192] CR2: 0000000000000790
[   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
[   18.436199] Kernel panic - not syncing: Fatal exception
[   18.436222] Kernel Offset: disabled
[-- MARK -- Thu May 18 11:45:00 2017]

Fixes: 3278255741 scsi_dh_rdac: switch to scsi_execute_req_flags()
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
14ba78937e iscsi-target: Fix initial login PDU asynchronous socket close OOPs
commit 25cdda95fd upstream.

This patch fixes a OOPs originally introduced by:

   commit bb048357da
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:09 +02:00
c732f30887 iscsi-target: Always wait for kthread_should_stop() before kthread exit
commit 5e0cf5e6c4 upstream.

There are three timing problems in the kthread usages of iscsi_target_mod:

 - np_thread of struct iscsi_np
 - rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

 send_sig(SIGINT, conn->tx_thread, 1);
 kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
 early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:08 +02:00
a168ac5b24 scsi: zero per-cmd private driver data for each MQ I/O
commit 1bad6c4a57 upstream.

In lower layer driver's (LLD) scsi_host_template, the driver may
optionally ask SCSI to allocate its private driver memory for each
command, by specifying cmd_size. This memory is allocated at the end of
scsi_cmnd by SCSI.  Later when SCSI queues a command, the LLD can use
scsi_cmd_priv to get to its private data.

Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In
this case, the LLD may get to stale or uninitialized data in its private
driver memory. This may result in unexpected driver and hardware
behavior.

Fix this problem by also zeroing the private driver memory before
passing them to LLD.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:08 +02:00
21f8aa4cfc mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
commit f5f968f237 upstream.

The stingray SDHCI hardware supports ACMD12 and automatically
issues after multi block transfer completed.

If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
on multi block read command with below error message:

Got data interrupt 0x00000002 even though no data
operation was in progress.

This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
ACM12 support in SDHCI hardware and suppress spurious interrupt.

Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: b580c52d58 ("mmc: sdhci-iproc: add IPROC SDHCI driver")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:08 +02:00
4c5681afdf Revert "ACPI / button: Change default behavior to lid_init_state=open"
commit 878d8db039 upstream.

Revert commit 77e9a4aa9d (ACPI / button: Change default behavior to
lid_init_state=open) which changed the kernel's behavior on laptops
that boot with closed lids and expect the lid switch state to be
reported accurately by the kernel.

If you boot or resume your laptop with the lid closed on a docking
station while using an external monitor connected to it, both internal
and external displays will light on, while only the external should.

There is a design choice in gdm to only provide the greeter on the
internal display when lit on, so users only see a gray area on the
external monitor. Also, the cursor will not show up as it's by
default on the internal display too.

To "fix" that, users have to open the laptop once and close it once
again to sync the state of the switch with the hardware state.

Even if the "method" operation mode implementation can be buggy on
some platforms, the "open" choice is worse.  It breaks docking
stations basically and there is no way to have a user-space hwdb to
fix that.

On the contrary, it's rather easy in user-space to have a hwdb
with the problematic platforms. Then,  libinput (1.7.0+) can fix
the state of the lid switch for us: you need to set the udev
property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.

When libinput detects internal keyboard events, it will overwrite the
state of the switch to open, making it reliable again.  Given that
logind only checks the lid switch value after a timeout, we can
assume the user will use the internal keyboard before this timeout
expires.

For example, such a hwdb entry is:

libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
 LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open

Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:08 +02:00
5d0e4205ea ACPICA: Tables: Fix regression introduced by a too early mechanism enabling
commit 2ea65321b8 upstream.

In the Linux kernel, acpi_get_table() "clones" haven't been fully
balanced by acpi_put_table() invocations.  In upstream ACPICA, due to
the design change, there are also unbalanced acpi_get_table_by_index()
invocations requiring special care.

acpi_get_table() reference counting mismatches may occor due to that
and printing error messages related to them is not useful at this
point.  The strict balanced validation count check should only be
enabled after confirming that all invocations are safe and aligned
with their designed purposes.

Thus this patch removes the error value returned by acpi_tb_get_table()
in that case along with the accompanying error message to fix the
issue.

Fixes: 174cc7187e (ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel)
Reported-by: Anush Seetharaman <anush.seetharaman@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:08 +02:00
34211cbf94 ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service
commit 0de0e198bc upstream.

Reading an ACPI table through the /sys/firmware/acpi/tables interface
more than 65,536 times leads to the following log message:

 ACPI Error: Table ffff88033595eaa8, Validation count is zero after increment
  (20170119/tbutils-423)

...and the table being unavailable until the next reboot. Add the
missing acpi_put_table() so the table ->validation_count is decremented
after each read.

Reported-by: Anush Seetharaman <anush.seetharaman@intel.com>
Fixes: 174cc7187e "ACPICA: Tables: Back port acpi_get_table_with_size() ..."
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:07 +02:00
93da4e6c45 acpi, nfit: Fix the memory error check in nfit_handle_mce()
commit fc08a4703a upstream.

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Use the new mce_is_memory_error() helper to detect the error
properly.

Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:07 +02:00
9183980a9e x86/MCE: Export memory_error()
commit 2d1f406139 upstream.

Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.

Integrate a piece from a patch from Vishal Verma
<vishal.l.verma@intel.com> to export it for modules (nfit).

The main reason we're exporting it is that the nfit handler
nfit_handle_mce() needs to detect a memory error properly before doing
its recovery actions.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:07 +02:00
8f8dca3c86 Revert "ACPI / button: Remove lid_init_state=method mode"
commit f369fdf4f6 upstream.

This reverts commit ecb10b694b.

The only expected ACPI control method lid device's usage model is

 1. Listen to the lid notification,
 2. Evaluate _LID after being notified by BIOS,
 3. Suspend the system (if users configure to do so) after seeing "close".

It's not ensured that BIOS will notify OS after boot/resume, and
it's not ensured that BIOS will always generate "open" event upon
opening the lid.

But there are 2 wrong usage models:

 1. When the lid device is responsible for suspend/resume the system,
    userspace requires to see "open" event to be paired with "close" after
    the system is resumed, or it will suspend the system again.

 2. When an external monitor connects to the laptop attached docks,
    userspace requires to see "close" event after the system is resumed so
    that it can determine whether the internal display should remain dark
    and the external display should be lit on.

After we made default kernel behavior to be suitable for usage model 1,
users of usage model 2 start to report regressions for such behavior
change.

Reversion of button.lid_init_state=method doesn't actually reverts to old
default behavior as doing so can enter a regression loop, but facilitates
users to work the reported regressions around with
button.lid_init_state=method.

Fixes: ecb10b694b (ACPI / button: Remove lid_init_state=method mode)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1430259
Tested-by: Steffen Weber <steffen.weber@gmail.com>
Tested-by: Julian Wiedmann <julian.wiedmann@jwi.name>
Reported-by: Joachim Frieben <jfrieben@hotmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:07 +02:00
f5eef8d245 crypto: skcipher - Add missing API setkey checks
commit 9933e113c2 upstream.

The API setkey checks for key sizes and alignment went AWOL during the
skcipher conversion.  This patch restores them.

Fixes: 4e6c3df4d7 ("crypto: skcipher - Add low-level skcipher...")
Reported-by: Baozeng <sploving1@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:07 +02:00
2da7518890 i2c: i2c-tiny-usb: fix buffer not being DMA capable
commit 5165da5923 upstream.

Since v4.9 i2c-tiny-usb generates the below call trace
and longer works, since it can't communicate with the
USB device. The reason is, that since v4.9 the USB
stack checks, that the buffer it should transfer is DMA
capable. This was a requirement since v2.2 days, but it
usually worked nevertheless.

[   17.504959] ------------[ cut here ]------------
[   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.506545] transfer buffer not dma capable
[   17.507022] Modules linked in:
[   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
[   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[   17.509039] Call Trace:
[   17.509320]  ? dump_stack+0x5c/0x78
[   17.509714]  ? __warn+0xbe/0xe0
[   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
[   17.510532]  ? nommu_map_sg+0xb0/0xb0
[   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
[   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
[   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
[   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
[   17.513125]  ? usb_start_wait_urb+0x65/0x160
[   17.513604]  ? usb_control_msg+0xdc/0x130
[   17.514061]  ? usb_xfer+0xa4/0x2a0
[   17.514445]  ? __i2c_transfer+0x108/0x3c0
[   17.514899]  ? i2c_transfer+0x57/0xb0
[   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
[   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
[   17.516408]  ? i2c_smbus_xfer+0x125/0x330
[   17.516876]  ? i2c_smbus_xfer+0x125/0x330
[   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
[   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
[   17.518248]  ? do_vfs_ioctl+0x9f/0x600
[   17.518671]  ? vfs_write+0x144/0x190
[   17.519078]  ? SyS_ioctl+0x74/0x80
[   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
[   17.519959] ---[ end trace d047c04982f5ac50 ]---

Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Till Harbaum <till@harbaum.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
e4bab31cf9 drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
commit 4c4fc90964 upstream.

Commit fa01e2ca9f ("serial: 8250: Integrate Fintek into 8250_base")
modified the probing logic for PNP0501 devices, to remove a collision
between the generic 16550A driver and the Fintek driver, which reused
the same ACPI _HID.

The Fintek device probe is now incorporated into the common 8250 probe
path, and gets called for all discovered 16550A compatible devices,
including ones that are MMIO mapped rather than IO mapped. However,
the Fintek driver assumes the port base is a I/O address, and proceeds
to probe some arbitrary offsets above it.

This is generally a wrong thing to do, but on ARM systems (having no
native port I/O), this may result in faulting accesses of completely
unrelated MMIO regions in the PCI I/O space. Given that this is at
serial probe time, this results in hard to diagnose crashes at boot.

So let's restrict the Fintek probe to devices that we know are using
port I/O in the first place.

Fixes: fa01e2ca9f ("serial: 8250: Integrate Fintek into 8250_base")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ricardo Ribalda <ricardo.ribalda@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
84ac7693f4 serdev: fix tty-port client deregistration
commit aee5da7838 upstream.

The port client data must be set when registering the serdev controller
or client deregistration will fail (and the serdev devices are left
registered and allocated) if the port was never opened in between.

Make sure to clear the port client data on any probe errors to avoid a
use-after-free when the client is later deregistered unconditionally
(e.g. in a tty-port deregistration helper).

Also move port client operation initialisation to registration. Note
that the client ops must be restored on failed probe.

Fixes: bed35c6dfa ("serdev: add a tty port controller driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
427fa8e391 Revert "tty_port: register tty ports with serdev bus"
commit d3ba126a22 upstream.

This reverts commit 8ee3fde047.

The new serdev bus hooked into the tty layer in
tty_port_register_device() by registering a serdev controller instead of
a tty device whenever a serdev client is present, and by deregistering
the controller in the tty-port destructor. This is broken in several
ways:

Firstly, it leads to a NULL-pointer dereference whenever a tty driver
later deregisters its devices as no corresponding character device will
exist.

Secondly, far from every tty driver uses tty-port refcounting (e.g.
serial core) so the serdev devices might never be deregistered or
deallocated.

Thirdly, deregistering at tty-port destruction is too late as the
underlying device and structures may be long gone by then. A port is not
released before an open tty device is closed, something which a
registered serdev client can prevent from ever happening. A driver
callback while the device is gone typically also leads to crashes.

Many tty drivers even keep their ports around until the driver is
unloaded (e.g. serial core), something which even if a late callback
never happens, leads to leaks if a device is unbound from its driver and
is later rebound.

The right solution here is to add a new tty_port_unregister_device()
helper and to never call tty_device_unregister() whenever the port has
been claimed by serdev, but since this requires modifying just about
every tty driver (and multiple subsystems) it will need to be done
incrementally.

Reverting the offending patch is the first step in fixing the broken
lifetime assumptions. A follow-up patch will add a new pair of
tty-device registration helpers, which a vetted tty driver can use to
support serdev (initially serial core). When every tty driver uses the
serdev helpers (at least for deregistration), we can add serdev
registration to tty_port_register_device() again.

Note that this also fixes another issue with serdev, which currently
allocates and registers a serdev controller for every tty device
registered using tty_port_device_register() only to immediately
deregister and deallocate it when the corresponding OF node or serdev
child node is missing. This should be addressed before enabling serdev
for hot-pluggable buses.

Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
baa4d4112e powerpc/spufs: Fix hash faults for kernel regions
commit d75e4919cc upstream.

Commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.

However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.

This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.

Fixes: ac29c64089 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
919c7173e0 powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
commit d957fb4d17 upstream.

Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)

This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.

We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.

Fixes: 17a3dd2f5f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:06 +02:00
72351ac5cd fs/ufs: Set UFS default maximum bytes per file
commit 239e250e4a upstream.

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
f351b1226d sparc/ftrace: Fix ftrace graph time measurement
[ Upstream commit 48078d2dac ]

The ftrace function_graph time measurements of a given function is not
accurate according to those recorded by ftrace using the function
filters.  This change pulls the x86_64 fix from 'commit 722b3c7469
("ftrace/graph: Trace function entry before updating index")' into the
sparc specific prepare_ftrace_return which stops ftrace from
counting interrupted tasks in the time measurement.

Example measurements for select_task_rq_fair running "hackbench 100
process 1000":

              |  tracing/trace_stat/function0  |  function_graph
 Before patch |  2.802 us                      |  4.255 us
 After patch  |  2.749 us                      |  3.094 us

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
76037bf9e1 sparc: Fix -Wstringop-overflow warning
[ Upstream commit deba804c90 ]

Greetings,

GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
in calls to string handling functions [1][2]. Due to the way
``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
causes a warning to trigger at compile time in the function mem_init(),
which is subsequently converted to an error. The ensuing patch fixes
this issue and aligns the declaration of empty_zero_page to that of
other architectures. Thank you.

Cheers,
Orlando.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
[2] https://gcc.gnu.org/gcc-7/changes.html

Signed-off-by: Orlando Arias <oarias@knights.ucf.edu>

--------------------------------------------------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
e346489fac sparc64: Fix mapping of 64k pages with MAP_FIXED
[ Upstream commit b6c41cb050 ]

An incorrect huge page alignment check caused
mmap failure for 64K pages when MAP_FIXED is used
with address not aligned to HPAGE_SIZE.

Orabug: 25885991

Fixes: dcd1912d21 ("sparc64: Add 64K page size support")
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
21dccb0f7c bpf: adjust verifier heuristics
[ Upstream commit 3c2ce60bdd ]

Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
87cebd0f19 bpf: fix wrong exposure of map_flags into fdinfo for lpm
[ Upstream commit a316338cb7 ]

trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
d6d2860eee bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
[ Upstream commit 41703a7310 ]

The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.

Fixes: 36bbef52c7 ("bpf: direct packet write and access for helpers for clsact progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:05 +02:00
3b69d6516e ipv4: add reference counting to metrics
[ Upstream commit 3fb07daff8 ]

Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe8 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
d3edf403e2 ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
[ Upstream commit 0e9a709560 ]

This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Commit 90427ef5d2 ("ipv6: fix flow labels when the traffic class
 is non-0") caused the regression by masking out the flowlabel
 which exposed the incorrect handling of the DSCP portion of the
 flowlabel in ip6_tunnel and ip6_gre.

Fixes: 90427ef5d2 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
90e7c3322d sctp: fix ICMP processing if skb is non-linear
[ Upstream commit 804ec7ebe8 ]

sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
0236d8c44e tcp: avoid fastopen API to be used on AF_UNSPEC
[ Upstream commit ba615f6752 ]

Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.

One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A:                            Thread B:
------------------------------------------------------------------------
sendto()
 - tcp_sendmsg()
     - sk_stream_memory_free() = 0
         - goto wait_for_sndbuf
	     - sk_stream_wait_memory()
	        - sk_wait_event() // sleep
          |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
	  |                           - tcp_sendmsg()
	  |                              - tcp_sendmsg_fastopen()
	  |                                 - __inet_stream_connect()
	  |                                    - tcp_disconnect() //because of AF_UNSPEC
	  |                                       - tcp_transmit_skb()// send RST
	  |                                    - return 0; // no reconnect!
	  |                           - sk_stream_wait_connect()
	  |                                 - sock_error()
	  |                                    - xchg(&sk->sk_err, 0)
	  |                                    - return -ECONNRESET
	- ... // wake up, see sk->sk_err == 0
    - skb_entail() on TCP_CLOSE socket

If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.

When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.

Fixes: cf60af03ca ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
1642394fff geneve: fix fill_info when using collect_metadata
[ Upstream commit 11387fe4a9 ]

Since 9b4437a5b8 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.

Fix by checking for the presence of the actual sockets.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
4dbbbaad64 virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
[ Upstream commit 2836b4f224 ]

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:04 +02:00
acc866e9b5 be2net: Fix offload features for Q-in-Q packets
[ Upstream commit cc6e9de62a ]

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:03 +02:00
423c1b4324 vlan: Fix tcp checksum offloads in Q-in-Q vlans
[ Upstream commit 35d2f80b07 ]

It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series
    commit afb0bc972b ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:03 +02:00
f1cd4c6331 net: phy: marvell: Limit errata to 88m1101
[ Upstream commit f289978835 ]

The 88m1101 has an errata when configuring autoneg. However, it was
being applied to many other Marvell PHYs as well. Limit its scope to
just the 88m1101.

Fixes: 76884679c6 ("phylib: Add support for Marvell 88e1111S and 88e1145")
Reported-by: Daniel Walker <danielwa@cisco.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Harini Katakam <harinik@xilinx.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:03 +02:00
bea278025c net/mlx5: Avoid using pending command interface slots
[ Upstream commit 73dd3a4839 ]

Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebcf3 ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:03 +02:00
1cdcbe7c70 bonding: fix accounting of active ports in 3ad
[ Upstream commit 751da2a69b ]

As of 7bb11dc9f5 and 0622cab034, bond slaves in a 3ad bond are not
removed from the aggregator when they are down, and the active slave count
is NOT equal to number of ports in the aggregator, but rather the number
of ports in the aggregator that are still enabled. The sysfs spew for
bonding_show_ad_num_ports() has a comment that says "Show number of active
802.3ad ports.", but it's currently showing total number of ports, both
active and inactive. Remedy it by using the same logic introduced in
0622cab034 in __bond_3ad_get_active_agg_info(), so sysfs, procfs and
netlink all report the number of active ports. Note that this means that
IFLA_BOND_AD_INFO_NUM_PORTS really means NUM_ACTIVE_PORTS instead of
NUM_PORTS, and thus perhaps should be renamed for clarity.

Lightly tested on a dual i40e lacp bond, simulating link downs with an ip
link set dev <slave2> down, was able to produce the state where I could
see both in the same aggregator, but a number of ports count of 1.

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2 <---
Slave Interface: ens10
MII Status: up <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

MII Status: up
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1 <---
Slave Interface: ens10
MII Status: down <---
Aggregator ID: 1
Slave Interface: ens11
MII Status: up
Aggregator ID: 1

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:03 +02:00
827624c3d1 ipv6: fix out of bound writes in __ip6_append_data()
[ Upstream commit 232cd35d08 ]

Andrey Konovalov and idaifish@gmail.com reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()

Andrey program lead to following state :

copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200

The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info

Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.

Once again, many thanks to Andrey and syzkaller team.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: <idaifish@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
99c971a56d bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
[ Upstream commit 6d18c732b9 ]

Since commit 76b91c32dd ("bridge: stp: when using userspace stp stop
kernel hello and hold timers"), bridge would not start hello_timer if
stp_enabled is not KERNEL_STP when br_dev_open.

The problem is even if users set stp_enabled with KERNEL_STP later,
the timer will still not be started. It causes that KERNEL_STP can
not really work. Users have to re-ifup the bridge to avoid this.

This patch is to fix it by starting br->hello_timer when enabling
KERNEL_STP in br_stp_start.

As an improvement, it's also to start hello_timer again only when
br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
no reason to start the timer again when it's NO_STP.

Fixes: 76b91c32dd ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
Reported-by: Haidong Li <haili@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
bf97c6bf24 qmi_wwan: add another Lenovo EM74xx device ID
[ Upstream commit 486181bcb3 ]

In their infinite wisdom, and never ending quest for end user frustration,
Lenovo has decided to use a new USB device ID for the wwan modules in
their 2017 laptops.  The actual hardware is still the Sierra Wireless
EM7455 or EM7430, depending on region.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
7b570175b3 bridge: netlink: check vlan_default_pvid range
[ Upstream commit a285860211 ]

Currently it is allowed to set the default pvid of a bridge to a value
above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
returns -EINVAL in case the pvid is out of bounds.

Reproduce by calling:

[root@test ~]# ip l a type bridge
[root@test ~]# ip l a type dummy
[root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
[root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
[root@test ~]# ip l s dummy0 master bridge0
[root@test ~]# bridge vlan
port	vlan ids
bridge0	 9999 PVID Egress Untagged

dummy0	 9999 PVID Egress Untagged

Fixes: 0f963b7592 ("bridge: netlink: add support for default_pvid")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Tobias Jungel <tobias.jungel@bisdn.de>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
4b3a6fa35e ipv6: Check ip6_find_1stfragopt() return value properly.
[ Upstream commit 7dd7eb9513 ]

Do not use unsigned variables to see if it returns a negative
error or not.

Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
9909e4e4ff ipv6: Prevent overrun when parsing v6 header options
[ Upstream commit 2423496af3 ]

The KASAN warning repoted below was discovered with a syzkaller
program.  The reproducer is basically:
  int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
  send(s, &one_byte_of_data, 1, MSG_MORE);
  send(s, &more_than_mtu_bytes_data, 2000, 0);

The socket() call sets the nexthdr field of the v6 header to
NEXTHDR_HOP, the first send call primes the payload with a non zero
byte of data, and the second send call triggers the fragmentation path.

The fragmentation code tries to parse the header options in order
to figure out where to insert the fragment option.  Since nexthdr points
to an invalid option, the calculation of the size of the network header
can made to be much larger than the linear section of the skb and data
is read outside of it.

This fix makes ip6_find_1stfrag return an error if it detects
running out-of-bounds.

[   42.361487] ==================================================================
[   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
[   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
[   42.366469]
[   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
[   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[   42.368824] Call Trace:
[   42.369183]  dump_stack+0xb3/0x10b
[   42.369664]  print_address_description+0x73/0x290
[   42.370325]  kasan_report+0x252/0x370
[   42.370839]  ? ip6_fragment+0x11c8/0x3730
[   42.371396]  check_memory_region+0x13c/0x1a0
[   42.371978]  memcpy+0x23/0x50
[   42.372395]  ip6_fragment+0x11c8/0x3730
[   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
[   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
[   42.374263]  ? ip6_forward+0x2e30/0x2e30
[   42.374803]  ip6_finish_output+0x584/0x990
[   42.375350]  ip6_output+0x1b7/0x690
[   42.375836]  ? ip6_finish_output+0x990/0x990
[   42.376411]  ? ip6_fragment+0x3730/0x3730
[   42.376968]  ip6_local_out+0x95/0x160
[   42.377471]  ip6_send_skb+0xa1/0x330
[   42.377969]  ip6_push_pending_frames+0xb3/0xe0
[   42.378589]  rawv6_sendmsg+0x2051/0x2db0
[   42.379129]  ? rawv6_bind+0x8b0/0x8b0
[   42.379633]  ? _copy_from_user+0x84/0xe0
[   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
[   42.380878]  ? ___sys_sendmsg+0x162/0x930
[   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
[   42.382074]  ? sock_has_perm+0x1f6/0x290
[   42.382614]  ? ___sys_sendmsg+0x167/0x930
[   42.383173]  ? lock_downgrade+0x660/0x660
[   42.383727]  inet_sendmsg+0x123/0x500
[   42.384226]  ? inet_sendmsg+0x123/0x500
[   42.384748]  ? inet_recvmsg+0x540/0x540
[   42.385263]  sock_sendmsg+0xca/0x110
[   42.385758]  SYSC_sendto+0x217/0x380
[   42.386249]  ? SYSC_connect+0x310/0x310
[   42.386783]  ? __might_fault+0x110/0x1d0
[   42.387324]  ? lock_downgrade+0x660/0x660
[   42.387880]  ? __fget_light+0xa1/0x1f0
[   42.388403]  ? __fdget+0x18/0x20
[   42.388851]  ? sock_common_setsockopt+0x95/0xd0
[   42.389472]  ? SyS_setsockopt+0x17f/0x260
[   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
[   42.390650]  SyS_sendto+0x40/0x50
[   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.391731] RIP: 0033:0x7fbbb711e383
[   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
[   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
[   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
[   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
[   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
[   42.397257]
[   42.397411] Allocated by task 3789:
[   42.397702]  save_stack_trace+0x16/0x20
[   42.398005]  save_stack+0x46/0xd0
[   42.398267]  kasan_kmalloc+0xad/0xe0
[   42.398548]  kasan_slab_alloc+0x12/0x20
[   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
[   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
[   42.399654]  __alloc_skb+0xf8/0x580
[   42.400003]  sock_wmalloc+0xab/0xf0
[   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
[   42.400813]  ip6_append_data+0x1a8/0x2f0
[   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
[   42.401505]  inet_sendmsg+0x123/0x500
[   42.401860]  sock_sendmsg+0xca/0x110
[   42.402209]  ___sys_sendmsg+0x7cb/0x930
[   42.402582]  __sys_sendmsg+0xd9/0x190
[   42.402941]  SyS_sendmsg+0x2d/0x50
[   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.403718]
[   42.403871] Freed by task 1794:
[   42.404146]  save_stack_trace+0x16/0x20
[   42.404515]  save_stack+0x46/0xd0
[   42.404827]  kasan_slab_free+0x72/0xc0
[   42.405167]  kfree+0xe8/0x2b0
[   42.405462]  skb_free_head+0x74/0xb0
[   42.405806]  skb_release_data+0x30e/0x3a0
[   42.406198]  skb_release_all+0x4a/0x60
[   42.406563]  consume_skb+0x113/0x2e0
[   42.406910]  skb_free_datagram+0x1a/0xe0
[   42.407288]  netlink_recvmsg+0x60d/0xe40
[   42.407667]  sock_recvmsg+0xd7/0x110
[   42.408022]  ___sys_recvmsg+0x25c/0x580
[   42.408395]  __sys_recvmsg+0xd6/0x190
[   42.408753]  SyS_recvmsg+0x2d/0x50
[   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   42.409513]
[   42.409665] The buggy address belongs to the object at ffff88000969e780
[   42.409665]  which belongs to the cache kmalloc-512 of size 512
[   42.410846] The buggy address is located 24 bytes inside of
[   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
[   42.411941] The buggy address belongs to the page:
[   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
[   42.413298] flags: 0x100000000008100(slab|head)
[   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
[   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
[   42.415074] page dumped because: kasan: bad access detected
[   42.415604]
[   42.415757] Memory state around the buggy address:
[   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   42.418273]                    ^
[   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   42.419882] ==================================================================

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
df6342be40 net: Improve handling of failures on link and route dumps
[ Upstream commit f6c5775ff0 ]

In general, rtnetlink dumps do not anticipate failure to dump a single
object (e.g., link or route) on a single pass. As both route and link
objects have grown via more attributes, that is no longer a given.

netlink dumps can handle a failure if the dump function returns an
error; specifically, netlink_dump adds the return code to the response
if it is <= 0 so userspace is notified of the failure. The missing
piece is the rtnetlink dump functions returning the error.

Fix route and link dump functions to return the errors if no object is
added to an skb (detected by skb->len != 0). IPv6 route dumps
(rt6_dump_route) already return the error; this patch updates IPv4 and
link dumps. Other dump functions may need to be ajusted as well.

Reported-by: Jan Moskyto Matejka <mq@ucw.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:02 +02:00
a19f55f9bb net/smc: Add warning about remote memory exposure
[ Upstream commit 19a0f7e37c ]

The driver explicitly bypasses APIs to register all memory once a
connection is made, and thus allows remote access to memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
516b3ed999 smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY
[ Upstream commit 263eec9b2a ]

Currently, SMC enables remote access to physical memory when a user
has successfully configured and established an SMC-connection until ten
minutes after the last SMC connection is closed. Because this is considered
a security risk, drivers are supposed to use IB_PD_UNSAFE_GLOBAL_RKEY in
such a case.

This patch changes the current SMC code to use IB_PD_UNSAFE_GLOBAL_RKEY.
This improves user awareness, but does not remove the security risk itself.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
d6cb41cf30 tcp: eliminate negative reordering in tcp_clean_rtx_queue
[ Upstream commit bafbb9c732 ]

tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.

Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.

Fixes: c7caf8d3ed ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <risaacs@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
cc1d2a620d net/mlx5e: Fix ethtool pause support and advertise reporting
[ Upstream commit e3c1950371 ]

Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
c8a52aa79a net/mlx5e: Use the correct pause values for ethtool advertising
[ Upstream commit b383b544f2 ]

Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
03c10a87f1 net/packet: fix missing net_device reference release
[ Upstream commit d19b183cdc ]

When using a TX ring buffer, if an error occurs processing a control
message (e.g. invalid message), the net_device reference is not
released.

Fixes c14ac9451c ("sock: enable timestamping using control messages")
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:01 +02:00
703a208274 sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
[ Upstream commit fdcee2cbb8 ]

SCTP needs fixes similar to 83eaddab43 ("ipv6/dccp: do not inherit
ipv6_mc_list from parent"), otherwise bad things can happen.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:00 +02:00
dc9bf5513e sctp: fix src address selection if using secondary addresses for ipv6
[ Upstream commit dbc2b5e9a0 ]

Commit 0ca50d12fe ("sctp: fix src address selection if using secondary
addresses") has fixed a src address selection issue when using secondary
addresses for ipv4.

Now sctp ipv6 also has the similar issue. When using a secondary address,
sctp_v6_get_dst tries to choose the saddr which has the most same bits
with the daddr by sctp_v6_addr_match_len. It may make some cases not work
as expected.

hostA:
  [1] fd21:356b:459a:cf10::11 (eth1)
  [2] fd21:356b:459a:cf20::11 (eth2)

hostB:
  [a] fd21:356b:459a:cf30::2  (eth1)
  [b] fd21:356b:459a:cf40::2  (eth2)

route from hostA to hostB:
  fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500

The expected path should be:
  fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
But addr[2] matches addr[a] more bits than addr[1] does, according to
sctp_v6_addr_match_len. It causes the path to be:
  fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2

This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
if the saddr is in a dev instead.

Note that for backwards compatibility, it will still do the addr_match_len
check here when no optimal is found.

Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:00 +02:00
d9cb26d6a3 tipc: make macro tipc_wait_for_cond() smp safe
[ Upstream commit 844cf763fb ]

The macro tipc_wait_for_cond() is embedding the macro sk_wait_event()
to fulfil its task. The latter, in turn, is evaluating the stated
condition outside the socket lock context. This is problematic if
the condition is accessing non-trivial data structures which may be
altered by incoming interrupts, as is the case with the cong_links()
linked list, used by socket to keep track of the current set of
congested links. We sometimes see crashes when this list is accessed
by a condition function at the same time as a SOCK_WAKEUP interrupt
is removing an element from the list.

We fix this by expanding selected parts of sk_wait_event() into the
outer macro, while ensuring that all evaluations of a given condition
are performed under socket lock protection.

Fixes: commit 365ad353c2 ("tipc: reduce risk of user starvation during link congestion")
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:00 +02:00
c91cd74b32 tcp: avoid fragmenting peculiar skbs in SACK
[ Upstream commit b451e5d24b ]

This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.

The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.

Fixes: adb92db857 ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:00 +02:00
20d699e0ca net: fix compile error in skb_orphan_partial()
[ Upstream commit 9142e9007f ]

If CONFIG_INET is not set, net/core/sock.c can not compile :

net/core/sock.c: In function ‘skb_orphan_partial’:
net/core/sock.c:1810:2: error: implicit declaration of function
‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration]
  if (skb_is_tcp_pure_ack(skb))
  ^

Fix this by always including <net/tcp.h>

Fixes: f6ba8d33cf ("netem: fix skb_orphan_partial()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:10:00 +02:00
e13cb6c25b netem: fix skb_orphan_partial()
[ Upstream commit f6ba8d33cf ]

I should have known that lowering skb->truesize was dangerous :/

In case packets are not leaving the host via a standard Ethernet device,
but looped back to local sockets, bad things can happen, as reported
by Michael Madsen ( https://bugzilla.kernel.org/show_bug.cgi?id=195713 )

So instead of tweaking skb->truesize, lets change skb->destructor
and keep a reference on the owner socket via its sk_refcnt.

Fixes: f2f872f927 ("netem: Introduce skb_orphan_partial() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Michael Madsen <mkm@nabto.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
3bfb04d102 bpf, arm64: fix faulty emission of map access in tail calls
[ Upstream commit d8b54110ee ]

Shubham was recently asking on netdev why in arm64 JIT we don't multiply
the index for accessing the tail call map by 8. That led me into testing
out arm64 JIT wrt tail calls and it turned out I got a NULL pointer
dereference on the tail call.

The buggy access is at:

  prog = array->ptrs[index];
  if (prog == NULL)
      goto out;

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  f86a682a  ldr x10, [x1,x10]
  00000068:  f862694b  ldr x11, [x10,x2]
  0000006c:  b40000ab  cbz x11, 0x00000080
  [...]

The code triggering the crash is f862694b. x1 at the time contains the
address of the bpf array, x10 offsetof(struct bpf_array, ptrs). Meaning,
above we load the pointer to the program at map slot 0 into x10. x10
can then be NULL if the slot is not occupied, which we later on try to
access with a user given offset in x2 that is the map index.

Fix this by emitting the following instead:

  [...]
  00000060:  d2800e0a  mov x10, #0x70 // #112
  00000064:  8b0a002a  add x10, x1, x10
  00000068:  d37df04b  lsl x11, x2, #3
  0000006c:  f86b694b  ldr x11, [x10,x11]
  00000070:  b40000ab  cbz x11, 0x00000084
  [...]

This basically adds the offset to ptrs to the base address of the bpf
array we got and we later on access the map with an index * 8 offset
relative to that. The tail call map itself is basically one large area
with meta data at the head followed by the array of prog pointers.
This makes tail calls working again, tested on Cavium ThunderX ARMv8.

Fixes: ddb55992b0 ("arm64: bpf: implement bpf_tail_call() helper")
Reported-by: Shubham Bansal <illusionist.neo@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
617461cac6 s390/qeth: add missing hash table initializations
[ Upstream commit ebccc7397e ]

commit 5f78e29cee ("qeth: optimize IP handling in rx_mode callback")
added new hash tables, but missed to initialize them.

Fixes: 5f78e29cee ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
b8a9a79c89 s390/qeth: avoid null pointer dereference on OSN
[ Upstream commit 25e2c341e7 ]

Access card->dev only after checking whether's its valid.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
9b1042aa59 s390/qeth: unbreak OSM and OSN support
[ Upstream commit 2d2ebb3ed0 ]

commit b4d72c08b3 ("qeth: bridgeport support - basic control")
broke the support for OSM and OSN devices as follows:

As OSM and OSN are L2 only, qeth_core_probe_device() does an early
setup by loading the l2 discipline and calling qeth_l2_probe_device().
In this context, adding the l2-specific bridgeport sysfs attributes
via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
since the basic sysfs infrastructure for the device hasn't been
established yet.

Note that OSN actually has its own unique sysfs attributes
(qeth_osn_devtype), so the additional attributes shouldn't be created
at all.
For OSM, add a new qeth_l2_devtype that contains all the common
and l2-specific sysfs attributes.
When qeth_core_probe_device() does early setup for OSM or OSN, assign
the corresponding devtype so that the ccwgroup probe code creates the
full set of sysfs attributes.
This allows us to skip qeth_l2_create_device_attributes() in case
of an early setup.

Any device that can't do early setup will initially have only the
generic sysfs attributes, and when it's probed later
qeth_l2_probe_device() adds the l2-specific attributes.

If an early-setup device is removed (by calling ccwgroup_ungroup()),
device_unregister() will - using the devtype - delete the
l2-specific attributes before qeth_l2_remove_device() is called.
So make sure to not remove them twice.

What complicates the issue is that qeth_l2_probe_device() and
qeth_l2_remove_device() is also called on a device when its
layer2 attribute changes (ie. its layer mode is switched).
For early-setup devices this wouldn't work properly - we wouldn't
remove the l2-specific attributes when switching to L3.
But switching the layer mode doesn't actually make any sense;
we already decided that the device can only operate in L2!
So just refuse to switch the layer mode on such devices. Note that
OSN doesn't have a layer2 attribute, so we only need to special-case
OSM.

Based on an initial patch by Ursula Braun.

Fixes: b4d72c08b3 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
41b81ff056 s390/qeth: handle sysfs error during initialization
[ Upstream commit 9111e7880c ]

When setting up the device from within the layer discipline's
probe routine, creating the layer-specific sysfs attributes can fail.
Report this error back to the caller, and handle it by
releasing the layer discipline.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
[jwi: updated commit msg, moved an OSN change to a subsequent patch]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:59 +02:00
8e929937f8 ipv6/dccp: do not inherit ipv6_mc_list from parent
[ Upstream commit 83eaddab43 ]

Like commit 657831ffc3 ("dccp/tcp: do not inherit mc_list from parent")
we should clear ipv6_mc_list etc. for IPv6 sockets too.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:58 +02:00
03a275f5aa driver: vrf: Fix one possible use-after-free issue
[ Upstream commit 1a4a5bf52a ]

The current codes only deal with the case that the skb is dropped, it
may meet one use-after-free issue when NF_HOOK returns 0 that means
the skb is stolen by one netfilter rule or hook.

When one netfilter rule or hook stoles the skb and return NF_STOLEN,
it means the skb is taken by the rule, and other modules should not
touch this skb ever. Maybe the skb is queued or freed directly by the
rule.

Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
and check the return value of nf_hook. Only when its value equals 1, it
means the skb could go ahead. Or reset the skb as NULL.

BTW, because vrf_rcv_finish is empty function, so needn't invoke it
even though nf_hook returns 1. But we need to modify vrf_rcv_finish
to deal with the NF_STOLEN case.

There are two cases when skb is stolen.
1. The skb is stolen and freed directly.
   There is nothing we need to do, and vrf_rcv_finish isn't invoked.
2. The skb is queued and reinjected again.
   The vrf_rcv_finish would be invoked as okfn, so need to free the
   skb in it.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:58 +02:00
db8ebc6da8 dccp/tcp: do not inherit mc_list from parent
[ Upstream commit 657831ffc3 ]

syzkaller found a way to trigger double frees from ip_mc_drop_socket()

It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.

Very similar to commit 8b485ce698 ("tcp: do not inherit
fastopen_req from parent")

Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-07 12:09:58 +02:00
b43ae25b70 Linux 4.11.3 2017-05-25 15:46:45 +02:00
8d6d97abb8 IB/hfi1: Protect the global dev_cntr_names and port_cntr_names
commit 62eed66e98 upstream.

Protect the global dev_cntr_names and port_cntr_names with the global
mutex as they are allocated and freed in a function called per device.
Otherwise there is a danger of double free and memory leaks.

Fixes: Commit b7481944b0 ("IB/hfi1: Show statistics counters under IB stats interface")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
b770585918 drm/i915/gvt: Disable access to stolen memory as a guest
commit 04a68a35ce upstream.

Explicitly disable stolen memory when running as a guest in a virtual
machine, since the memory is not mediated between clients and reserved
entirely for the host. The actual size should be reported as zero, but
like every other quirk we want to tell the user what is happening.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99028
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161109103905.17860-1-chris@chris-wilson.co.uk
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
846d6e12d0 drivers: char: mem: Check for address space wraparound with mmap()
commit b299cde245 upstream.

/dev/mem currently allows mmap() mappings that wrap around the end of
the physical address space, which should probably be illegal. It
circumvents the existing STRICT_DEVMEM permission check because the loop
immediately terminates (as the start address is already higher than the
end address). On the x86_64 architecture it will then cause a panic
(from the BUG(start >= end) in arch/x86/mm/pat.c:reserve_memtype()).

This patch adds an explicit check to make sure offset + size will not
wrap around in the physical address type.

Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
326842c019 nfsd: Fix up the "supattr_exclcreat" attributes
commit b26b78cb72 upstream.

If an NFSv4 client asks us for the supattr_exclcreat, then we must
not return attributes that are unsupported by this minor version.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 75976de655 ("NFSD: Return word2 bitmask if setting security..,")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
9a4723626e nfsd: encoders mustn't use unitialized values in error cases
commit f961e3f2ac upstream.

In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.

This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.

GETDEVICEINFO has the same problem.

Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
06cc61e8f9 nfsd: fix undefined behavior in nfsd4_layout_verify
commit b550a32e60 upstream.

  UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34
  shift exponent 128 is too large for 32-bit type 'int'

Depending on compiler+architecture, this may cause the check for
layout_type to succeed for overly large values (which seems to be the
case with amd64). The large value will be later used in de-referencing
nfsd4_layout_ops for function pointers.

Reported-by: Jani Tuovila <tuovila@synopsys.com>
Signed-off-by: Ari Kauppi <ari@synopsys.com>
[colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32]
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
af069f63c5 NFSv4: Fix an rcu lock leak
commit 2e84611b3f upstream.

The intention in the original patch was to release the lock when
we put the inode, however something got screwed up.

Reported-by: Jason Yan <yanaijie@huawei.com>
Fixes: 7b410d9ce4 ("pNFS: Delay getting the layout header in..")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:30 +02:00
89207ffd22 pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connect
commit 260f32adb8 upstream.

The check in nfs4_ff_layout_prepare_ds() seems to be missing.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: a33e4b036d ("pNFS: return status from nfs4_pnfs_ds_connect")
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
bc38735f21 NFS: Use GFP_NOIO for two allocations in writeback
commit ae97aa524e upstream.

Prevent a deadlock that can occur if we wait on allocations
that try to write back our pages.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 00bfa30abe ("NFS: Create a common pgio_alloc and pgio_release...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
2aa6b5f2ac NFS: Fix use after free in write error path
commit 1f84ccdf37 upstream.

Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
Fixes: 0bcbf039f6 ("nfs: handle request add failure properly")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
e7ec98b137 NFSv4: Fix a hang in OPEN related to server reboot
commit 56e0d71ef1 upstream.

If the server fails to return the attributes as part of an OPEN
reply, and then reboots, we can end up hanging. The reason is that
the client attempts to send a GETATTR in order to pick up the
missing OPEN call, but fails to release the slot first, causing
reboot recovery to deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 2e80dbe7ac ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
acef1b8125 drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2
commit e345da82bd upstream.

The builtin eDP panel in the HP zBook 17 G2 supports 10 bpc,
as advertised by the Laptops product specs and verified via
injecting a fixed edid + photometer measurements, but edid
reports unknown depth, so drivers fall back to 6 bpc.

Add a quirk to get the full 10 bpc.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1492787108-23959-1-git-send-email-mario.kleiner.de@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
9c92bffbd8 mtd: nand: add ooblayout for old hamming layout
commit 6a623e0769 upstream.

The old 1-bit hamming layout requires ECC data to be placed at a
fixed offset, and not necessarily at the end of the OOB area.
Add this old layout back in order to fix legacy setups.

Fixes: 41b207a70d ("mtd: nand: implement the default mtd_ooblayout_ops")
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
2d3e850373 mtd: nand: omap2: Fix partition creation via cmdline mtdparts
commit 2d283ede59 upstream.

commit c9711ec525 ("mtd: nand: omap: Clean up device tree support")
caused the parent device name to be changed from "omap2-nand.0"
to "<base address>.nand"  (e.g. 30000000.nand on omap3 platforms).
This caused mtd->name to be changed as well. This breaks partition
creation via mtdparts passed by u-boot as it uses "omap2-nand.0"
for the mtd-id.

Fix this by explicitly setting the mtd->name to "omap2-nand.<CS number>"
if it isn't already set by nand_set_flash_node(). CS number is the
NAND controller instance ID.

Fixes: c9711ec525 ("mtd: nand: omap: Clean up device tree support")
Reported-by: Leto Enrico <enrico.leto@siemens.com>
Reported-by: Adam Ford <aford173@gmail.com>
Suggested-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Tested-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
750d21357e mtd: nand: orion: fix clk handling
commit 675b11d94c upstream.

The clk handling in orion_nand.c had two problems:

- In the probe function, clk_put() was called for an enabled clock,
  which violates the API (see documentation for clk_put() in
  include/linux/clk.h)

- In the error path of the probe function, clk_put() could be called
  twice for the same clock.

In order to clean this up, use the managed function devm_clk_get() and
store the pointer to the clk in the driver data.

Fixes: baffab28b1 ('ARM: Orion: fix driver probe error handling with respect to clk')
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
6003ac5842 PCI: Freeze PME scan before suspending devices
commit ea00353f36 upstream.

Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests.  Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):

  It occurs when the PME scan runs, once per second.  During PME scan, the
  PCI host bridge (rcar-pci) registers are accessed while its module clock
  has already been disabled, leading to the crash.

One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:

  # echo 0 > /sys/module/printk/parameters/console_suspend
  # echo s2idle > /sys/power/mem_sleep
  # echo mem > /sys/power/state

Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test.  It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs.  Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:

  # echo 0 > /sys/module/printk/parameters/console_suspend
  # echo platform > /sys/power/pm_test    # Or "processors"
  # echo mem > /sys/power/state

(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)

Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible.  To that end, queue the task on a
workqueue that gets frozen before devices suspend.

Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen.  If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.

Stacktrace for posterity:

  PM: Syncing filesystems ... [   38.566237] done.
  PM: Preparing system for sleep (mem)
  Freezing user space processes ... [   38.579813] (elapsed 0.001 seconds) done.
  Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
  PM: Suspending system (mem)
  PM: suspend of devices complete after 152.456 msecs
  PM: late suspend of devices complete after 2.809 msecs
  PM: noirq suspend of devices complete after 29.863 msecs
  suspend debug: Waiting for 5 second(s).
  Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
  pgd = c0003000
  [00000000] *pgd=80000040004003, *pmd=00000000
  Internal error: : 1211 [#1] SMP ARM
  Modules linked in:
  CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
  4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383
  Hardware name: Generic R8A7791 (Flattened Device Tree)
  Workqueue: events pci_pme_list_scan
  task: eb56e140 task.stack: eb58e000
  PC is at pci_generic_config_read+0x64/0x6c
  LR is at rcar_pci_cfg_base+0x64/0x84
  pc : [<c041d7b4>]    lr : [<c04309a0>]    psr: 600d0093
  sp : eb58fe98  ip : c041d750  fp : 00000008
  r10: c0e2283c  r9 : 00000000  r8 : 600d0013
  r7 : 00000008  r6 : eb58fed6  r5 : 00000002  r4 : eb58feb4
  r3 : 00000000  r2 : 00000044  r1 : 00000008  r0 : 00000000
  Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5387d  Table: 6a9f6c80  DAC: 55555555
  Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
  Stack: (0xeb58fe98 to 0xeb590000)
  fe80:                                                       00000002 00000044
  fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
  fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
  fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
  ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
  ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
  ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
  ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
  ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
  ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
  ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
  [<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>]
  (pci_bus_read_config_word+0x58/0x80)
  [<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>]
  (pci_check_pme_status+0x34/0x78)
  [<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54)
  [<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4)
  [<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>]
  (process_one_work+0x1bc/0x308)
  [<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0)
  [<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc)
  [<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
  Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
  ---[ end trace 667d43ba3aa9e589 ]---

Fixes: df17e62e5b ("PCI: Add support for polling PME state on suspended legacy PCI devices")
Reported-and-tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:29 +02:00
f21143a53f PCI: Only allow WC mmap on prefetchable resources
commit cef4d02305 upstream.

The /proc/bus/pci mmap interface allows the user to specify whether they
want WC or not.  Don't let them do so on non-prefetchable BARs.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
bee38f0f37 PCI: Fix another sanity check bug in /proc/pci mmap
commit 17caf56731 upstream.

Don't match MMIO maps with I/O BARs and vice versa.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
b87e133e3d PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
commit 6bccc7f426 upstream.

In the PCI_MMAP_PROCFS case when the address being passed by the user is a
'user visible' resource address based on the bus window, and not the actual
contents of the resource, that's what we need to be checking it against.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
3537380a05 PCI: hv: Specify CPU_AFFINITY_ALL for MSI affinity when >= 32 CPUs
commit 433fcf6b7b upstream.

When we have 32 or more CPUs in the affinity mask, we should use a special
constant to specify that to the host. Fix this issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
9dc0babcd2 PCI: hv: Allocate interrupt descriptors with GFP_ATOMIC
commit 59c58ceeea upstream.

The memory allocation here needs to be non-blocking.  Fix the issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
569ed828de PCI/ACPI: Add ThunderX pass2.x 2nd node MCFG quirk
commit cd18374048 upstream.

Currently SoCs pass2.x do not emulate EA headers for ACPI boot method at
all.  However, for pass2.x some devices (like EDAC) advertise incorrect
base addresses in their BARs which results in driver probe failure during
resource request.  Since all problematic blocks are on 2nd NUMA node under
domain 10 add necessary quirk entry to obtain BAR addresses correction
using EA header emulation.

Fixes: 44f22bd91e ("PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controller")
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
b4447dbb5c PCI/ACPI: Tidy up MCFG quirk whitespace
commit ced414a14f upstream.

With no blank lines, it's not obvious where the macro definitions end and
the uses begin.  Add some blank lines and reorder the ThunderX definitions.
No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
d76f918859 thermal: mt8173: minor mtk_thermal.c cleanups
commit 05d7839aa2 upstream.

If thermal bank with 4 sensors, thermal driver should read TEMP_MSR3.

However, currently thermal driver would not read TEMP_MSR3 since mt8173
thermal driver only use 3 sensors on each thermal bank at the same time,
so this patch would not effect temperature.
Only if mt mt8173 thermal driver use 4 sensors on any thermal bank, would
read third sensor two times, and lose fourth sensor of vale.

Fixes: b7cf005373 ("thermal: Add Mediatek thermal driver for mt2701.")
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Dawei Chien <dawei.chien@mediatek.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
63ab09e39d tracing/kprobes: Enforce kprobes teardown after testing
commit 30e7d894c1 upstream.

Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked
reserved.

The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.

Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.

Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6274de498 ("kprobes: Support delayed unoptimizing")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:28 +02:00
18b4b0ab65 firmware: ti_sci: fix strncat length check
commit 76cefef8e8 upstream.

gcc-7 notices that the length we pass to strncat is wrong:

drivers/firmware/ti_sci.c: In function 'ti_sci_probe':
drivers/firmware/ti_sci.c:204:32: error: specified bound 50 equals the size of the destination [-Werror=stringop-overflow=]

Instead of the total length, we must pass the length of the
remaining space here.

Fixes: aa276781a6 ("firmware: Add basic support for TI System Control Interface (TI-SCI) protocol")
Acked-by: Nishanth Menon <nm@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
e57f049b09 um: Fix to call read_initrd after init_bootmem
commit 5b4236e17c upstream.

Since read_initrd() invokes alloc_bootmem() for allocating
memory to load initrd image, it must be called after init_bootmem.

This makes read_initrd() called directly from setup_arch()
after init_bootmem() and mem_total_pages().

Fixes: b63236972e ("um: Setup physical memory in setup_arch()")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
ece6ff2261 drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()
commit a00ebd1cf1 upstream.

When killing kref_sub(), the unconditional additional kref_get()
was not properly paired with the necessary kref_put(), causing
a leak of struct drbd_requests (~ 224 Bytes) per submitted bio,
and breaking DRBD in general, as the destructor of those "drbd_requests"
does more than just the mempoll_free().

Fixes: bdfafc4ffd ("locking/atomic, kref: Kill kref_sub()")
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
7811108277 osf_wait4(): fix infoleak
commit a8c39544a6 upstream.

failing sys_wait4() won't fill struct rusage...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
90c676ffab kvm: arm/arm64: Force reading uncached stage2 PGD
commit 2952a6070e upstream.

Make sure we don't use a cached value of the KVM stage2 PGD while
resetting the PGD.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
59e1faddb3 kvm: arm/arm64: Fix use after free of stage2 page table
commit 0c428a6a92 upstream.

We yield the kvm->mmu_lock occassionaly while performing an operation
(e.g, unmap or permission changes) on a large area of stage2 mappings.
However this could possibly cause another thread to clear and free up
the stage2 page tables while we were waiting for regaining the lock and
thus the original thread could end up in accessing memory that was
freed. This patch fixes the problem by making sure that the stage2
pagetable is still valid after we regain the lock. The fact that
mmu_notifer->release() could be called twice (via __mmu_notifier_release
and mmu_notifier_unregsister) enhances the possibility of hitting
this race where there are two threads trying to unmap the entire guest
shadow pages.

While at it, cleanup the redudant checks around cond_resched_lock in
stage2_wp_range(), as cond_resched_lock already does the same checks.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: andreyknvl@google.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
3bbe9f26a0 kvm: arm/arm64: Fix race in resetting stage2 PGD
commit 6c0d706b56 upstream.

In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on or even holding the lock, could
potentially see that the PGD is still valid and proceed to perform
a stage2 operation and later encounter a NULL PGD.

[223090.242280] Unable to handle kernel NULL pointer dereference at
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
[223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [<ffff0000080a2478>]
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [<ffff0000082274f8>]
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
[223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
[223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
[223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
[223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
[223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
[223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
[223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
[223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
[223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
[223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
[223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c

This patch moves the stage2 PGD manipulation under the lock.

Reported-by: Alexander Graf <agraf@suse.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
a24a206192 MIPS: Loongson-3: Select MIPS_L1_CACHE_SHIFT_6
commit 17c99d9421 upstream.

Some newer Loongson-3 have 64 bytes cache lines, so select
MIPS_L1_CACHE_SHIFT_6.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J . Hill <Steven.Hill@caviumnetworks.com>
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15755/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
7ce12d8b97 nvme: unmap CMB and remove sysfs file in reset path
commit f63572dff1 upstream.

CMB doesn't get unmapped until removal while getting remapped on every
reset. Add the unmapping and sysfs file removal to the reset path in
nvme_pci_disable to match the mapping path in nvme_pci_enable.

Fixes: 202021c1a ("nvme : Add sysfs entry for NVMe CMBs when appropriate")

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-By: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:27 +02:00
ef1f1648fe genirq: Fix chained interrupt data ordering
commit 2c4569ca26 upstream.

irq_set_chained_handler_and_data() sets up the chained interrupt and then
stores the handler data.

That's racy against an immediate interrupt which gets handled before the
store of the handler data happened. The handler will dereference a NULL
pointer and crash.

Cure it by storing handler data before installing the chained handler.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
3a97bedaa3 uwb: fix device quirk on big-endian hosts
commit 41318a2b82 upstream.

Add missing endianness conversion when using the USB device-descriptor
idProduct field to apply a hardware quirk.

Fixes: 1ba47da527 ("uwb: add the i1480 DFU driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
4c44438b56 stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
commit 5ea30e4e58 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
88f7b253a9 metag/uaccess: Check access_ok in strncpy_from_user
commit 3a158a62da upstream.

The metag implementation of strncpy_from_user() doesn't validate the src
pointer, which could allow reading of arbitrary kernel memory. Add a
short access_ok() check to prevent that.

Its still possible for it to read across the user/kernel boundary, but
it will invariably reach a NUL character after only 9 bytes, leaking
only a static kernel address being loaded into D0Re0 at the beginning of
__start, which is acceptable for the immediate fix.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
160f7df55b metag/uaccess: Fix access_ok()
commit 8a8b56638b upstream.

The __user_bad() macro used by access_ok() has a few corner cases
noticed by Al Viro where it doesn't behave correctly:

 - The kernel range check has off by 1 errors which permit access to the
   first and last byte of the kernel mapped range.

 - The kernel range check ends at LINCORE_BASE rather than
   META_MEMORY_LIMIT, which is ineffective when the kernel is in global
   space (an extremely uncommon configuration).

There are a couple of other shortcomings here too:

 - Access to the whole of the other address space is permitted (i.e. the
   global half of the address space when the kernel is in local space).
   This isn't ideal as it could theoretically still contain privileged
   mappings set up by the bootloader.

 - The size argument is unused, permitting user copies which start on
   valid pages at the end of the user address range and cross the
   boundary into the kernel address space (e.g. addr = 0x3ffffff0, size
   > 0x10).

It isn't very convenient to add size checks when disallowing certain
regions, and it seems far safer to be sure and explicit about what
userland is able to access, so invert the logic to allow certain regions
instead, and fix the off by 1 errors and missing size checks. This also
allows the get_fs() == KERNEL_DS check to be more easily optimised into
the user address range case.

We now have 3 such allowed regions:

 - The user address range (incorporating the get_fs() == KERNEL_DS
   check).

 - NULL (some kernel code expects this to work, and we'll always catch
   the fault anyway).

 - The core code memory region.

Fixes: 373cd784d0 ("metag: Memory handling")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
4b70137931 cpuidle: check dev before usage in cpuidle_use_deepest_state()
commit 41dc750ea6 upstream.

In case of there is no cpuidle devices registered, dev will be null, and
panic will be triggered like below;
In this patch, add checking of dev before usage, like that done in
cpuidle_idle_call.

Panic without fix:
[  184.961328] BUG: unable to handle kernel NULL pointer dereference at
  (null)
[  184.961328] IP: cpuidle_use_deepest_state+0x30/0x60
...
[  184.961328]  play_idle+0x8d/0x210
[  184.961328]  ? __schedule+0x359/0x8e0
[  184.961328]  ? _raw_spin_unlock_irqrestore+0x28/0x50
[  184.961328]  ? kthread_queue_delayed_work+0x41/0x80
[  184.961328]  clamp_idle_injection_func+0x64/0x1e0

Fixes: bb8313b603 (cpuidle: Allow enforcing deepest idle state selection)
Signed-off-by: Li, Fei <fei.li@intel.com>
Tested-by: Shi, Feng <fengx.shi@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
4fd7fd47a0 iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings
commit f73a7eee90 upstream.

Ever since commit 091d42e43d ("iommu/vt-d: Copy translation tables from
old kernel") the kdump kernel copies the IOMMU context tables from the
previous kernel. Each device mappings will be destroyed once the driver
for the respective device takes over.

This unfortunately breaks the workflow of mapping and unmapping a new
context to the IOMMU. The mapping function assumes that either:

1) Unmapping did the proper IOMMU flushing and it only ever flush if the
   IOMMU unit supports caching invalid entries.
2) The system just booted and the initialization code took care of
   flushing all IOMMU caches.

This assumption is not true for the kdump kernel since the context
tables have been copied from the previous kernel and translations could
have been cached ever since. So make sure to flush the IOTLB as well
when we destroy these old copied mappings.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Fixes: 091d42e43d ("iommu/vt-d: Copy translation tables from old kernel")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
1cfc2a16c5 staging: rtl8192e: GetTs Fix invalid TID 7 warning.
commit 95d93e271d upstream.

TID 7 is a valid value for QoS IEEE 802.11e.

The switch statement that follows states 7 is valid.

Remove function IsACValid and use the default case to filter
invalid TIDs.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
7369a3b5ec staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD.
commit 90be652c9f upstream.

EPROM_CMD is 2 byte aligned on PCI map so calling with rtl92e_readl
will return invalid data so use rtl92e_readw.

The device is unable to select the right eeprom type.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
3e57290a19 staging: rtl8192e: fix 2 byte alignment of register BSSIDR.
commit 867510bde1 upstream.

BSSIDR has two byte alignment on PCI ioremap correct the write
by swapping to 16 bits first.

This fixes a problem that the device associates fail because
the filter is not set correctly.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:26 +02:00
d92507d9e5 staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
commit baabd567f8 upstream.

The driver attempts to alter memory that is mapped to PCI device.

This is because tx_fwinfo_8190pci points to skb->data

Move the pci_map_single to when completed buffer is ready to be mapped with
psdec is empty to drop on mapping error.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
e478ae4e63 staging: vc04_services: Fix bulk cache maintenance
commit ff92b9e3c9 upstream.

vchiq_arm supports transfers less than one page and at arbitrary
alignment, using the dma-mapping API to perform its cache maintenance
(even though the VPU drives the DMA hardware). Read (DMA_FROM_DEVICE)
operations use cache invalidation for speed, falling back to
clean+invalidate on partial cache lines, with writes (DMA_TO_DEVICE)
using flushes.

If a read transfer has ends which aren't page-aligned, performing cache
maintenance as if they were whole pages can lead to memory corruption
since the partial cache lines at the ends (and any cache lines before or
after the transfer area) will be invalidated. This bug was masked until
the disabling of the cache flush in flush_dcache_page().

Honouring the requested transfer start- and end-points prevents the
corruption.

Fixes: cf9caf1929 ("staging: vc04_services: Replace dmac_map_area with dmac_map_sg")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
e3b8dd546d arm64: documentation: document tagged pointer stack constraints
commit f0e421b1bf upstream.

Some kernel features don't currently work if a task puts a non-zero
address tag in its stack pointer, frame pointer, or frame record entries
(FP, LR).

For example, with a tagged stack pointer, the kernel can't deliver
signals to the process, and the task is killed instead. As another
example, with a tagged frame pointer or frame records, perf fails to
generate call graphs or resolve symbols.

For now, just document these limitations, instead of finding and fixing
everything that doesn't work, as it's not known if anyone needs to use
tags in these places anyway.

In addition, as requested by Dave Martin, generalize the limitations
into a general kernel address tag policy, and refactor
tagged-pointers.txt to include it.

Fixes: d50240a5f6 ("arm64: mm: permit use of tagged pointers at EL0")
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
45b1f35406 arm64: entry: improve data abort handling of tagged pointers
commit 276e93279a upstream.

When handling a data abort from EL0, we currently zero the top byte of
the faulting address, as we assume the address is a TTBR0 address, which
may contain a non-zero address tag. However, the address may be a TTBR1
address, in which case we should not zero the top byte. This patch fixes
that. The effect is that the full TTBR1 address is passed to the task's
signal handler (or printed out in the kernel log).

When handling a data abort from EL1, we leave the faulting address
intact, as we assume it's either a TTBR1 address or a TTBR0 address with
tag 0x00. This is true as far as I'm aware, we don't seem to access a
tagged TTBR0 address anywhere in the kernel. Regardless, it's easy to
forget about address tags, and code added in the future may not always
remember to remove tags from addresses before accessing them. So add tag
handling to the EL1 data abort handler as well. This also makes it
consistent with the EL0 data abort handler.

Fixes: d50240a5f6 ("arm64: mm: permit use of tagged pointers at EL0")
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
684f9dee46 arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
commit 7dcd9dd8ce upstream.

When we take a watchpoint exception, the address that triggered the
watchpoint is found in FAR_EL1. We compare it to the address of each
configured watchpoint to see which one was hit.

The configured watchpoint addresses are untagged, while the address in
FAR_EL1 will have an address tag if the data access was done using a
tagged address. The tag needs to be removed to compare the address to
the watchpoints.

Currently we don't remove it, and as a result can report the wrong
watchpoint as being hit (specifically, always either the highest TTBR0
watchpoint or lowest TTBR1 watchpoint). This patch removes the tag.

Fixes: d50240a5f6 ("arm64: mm: permit use of tagged pointers at EL0")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
5de6ca05b6 arm64: traps: fix userspace cache maintenance emulation on a tagged pointer
commit 81cddd65b5 upstream.

When we emulate userspace cache maintenance in the kernel, we can
currently send the task a SIGSEGV even though the maintenance was done
on a valid address. This happens if the address has a non-zero address
tag, and happens to not be mapped in.

When we get the address from a user register, we don't currently remove
the address tag before performing cache maintenance on it. If the
maintenance faults, we end up in either __do_page_fault, where find_vma
can't find the VMA if the address has a tag, or in do_translation_fault,
where the tagged address will appear to be above TASK_SIZE. In both
cases, the address is not mapped in, and the task is sent a SIGSEGV.

This patch removes the tag from the address before using it. With this
patch, the fault is handled correctly, the address gets mapped in, and
the cache maintenance succeeds.

As a second bug, if cache maintenance (correctly) fails on an invalid
tagged address, the address gets passed into arm64_notify_segfault,
where find_vma fails to find the VMA due to the tag, and the wrong
si_code may be sent as part of the siginfo_t of the segfault. With this
patch, the correct si_code is sent.

Fixes: 7dd01aef05 ("arm64: trap userspace "dc cvau" cache operation on errata-affected core")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
87e2b8c096 arm64: uaccess: ensure extension of access_ok() addr
commit a06040d7a7 upstream.

Our access_ok() simply hands its arguments over to __range_ok(), which
implicitly assummes that the addr parameter is 64 bits wide. This isn't
necessarily true for compat code, which might pass down a 32-bit address
parameter.

In these cases, we don't have a guarantee that the address has been zero
extended to 64 bits, and the upper bits of the register may contain
unknown values, potentially resulting in a suprious failure.

Avoid this by explicitly casting the addr parameter to an unsigned long
(as is done on other architectures), ensuring that the parameter is
widened appropriately.

Fixes: 0aea86a217 ("arm64: User access library functions")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
1f13dc4c22 arm64: armv8_deprecated: ensure extension of addr
commit 55de49f9aa upstream.

Our compat swp emulation holds the compat user address in an unsigned
int, which it passes to __user_swpX_asm(). When a 32-bit value is passed
in a register, the upper 32 bits of the register are unknown, and we
must extend the value to 64 bits before we can use it as a base address.

This patch casts the address to unsigned long to ensure it has been
suitably extended, avoiding the potential issue, and silencing a related
warning from clang.

Fixes: bd35a4adc4 ("arm64: Port SWP/SWPB emulation support from arm")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
2339f63ef9 arm64: ensure extension of smp_store_release value
commit 994870bead upstream.

When an inline assembly operand's type is narrower than the register it
is allocated to, the least significant bits of the register (up to the
operand type's width) are valid, and any other bits are permitted to
contain any arbitrary value. This aligns with the AAPCS64 parameter
passing rules.

Our __smp_store_release() implementation does not account for this, and
implicitly assumes that operands have been zero-extended to the width of
the type being stored to. Thus, we may store unknown values to memory
when the value type is narrower than the pointer type (e.g. when storing
a char to a long).

This patch fixes the issue by casting the value operand to the same
width as the pointer operand in all cases, which ensures that the value
is zero-extended as we expect. We use the same union trickery as
__smp_load_acquire and {READ,WRITE}_ONCE() to avoid GCC complaining that
pointers are potentially cast to narrower width integers in unreachable
paths.

A whitespace issue at the top of __smp_store_release() is also
corrected.

No changes are necessary for __smp_load_acquire(). Load instructions
implicitly clear any upper bits of the register, and the compiler will
only consider the least significant bits of the register as valid
regardless.

Fixes: 47933ad41a ("arch: Introduce smp_load_acquire(), smp_store_release()")
Fixes: 878a84d5a8 ("arm64: add missing data types in smp_load_acquire/smp_store_release")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:25 +02:00
b55563999f arm64: xchg: hazard against entire exchange variable
commit fee960bed5 upstream.

The inline assembly in __XCHG_CASE() uses a +Q constraint to hazard
against other accesses to the memory location being exchanged. However,
the pointer passed to the constraint is a u8 pointer, and thus the
hazard only applies to the first byte of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, as demonstrated with the following test case:

union u {
	unsigned long l;
	unsigned int i[2];
};

unsigned long update_char_hazard(union u *u)
{
	unsigned int a, b;

	a = u->i[1];
	asm ("str %1, %0" : "+Q" (*(char *)&u->l) : "r" (0UL));
	b = u->i[1];

	return a ^ b;
}

unsigned long update_long_hazard(union u *u)
{
	unsigned int a, b;

	a = u->i[1];
	asm ("str %1, %0" : "+Q" (*(long *)&u->l) : "r" (0UL));
	b = u->i[1];

	return a ^ b;
}

The linaro 15.08 GCC 5.1.1 toolchain compiles the above as follows when
using -O2 or above:

0000000000000000 <update_char_hazard>:
   0:	d2800001 	mov	x1, #0x0                   	// #0
   4:	f9000001 	str	x1, [x0]
   8:	d2800000 	mov	x0, #0x0                   	// #0
   c:	d65f03c0 	ret

0000000000000010 <update_long_hazard>:
  10:	b9400401 	ldr	w1, [x0,#4]
  14:	d2800002 	mov	x2, #0x0                   	// #0
  18:	f9000002 	str	x2, [x0]
  1c:	b9400400 	ldr	w0, [x0,#4]
  20:	4a000020 	eor	w0, w1, w0
  24:	d65f03c0 	ret

This patch fixes the issue by passing an unsigned long pointer into the
+Q constraint, as we do for our cmpxchg code. This may hazard against
more than is necessary, but this is better than missing a necessary
hazard.

Fixes: 305d454aaa ("arm64: atomics: implement native {relaxed, acquire, release} atomics")
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
ee6dfbff78 arm64: dts: hi6220: Reset the mmc hosts
commit 0fbdf9953b upstream.

The MMC hosts could be left in an unconsistent or uninitialized state from
the firmware. Instead of assuming, the firmware did the right things, let's
reset the host controllers.

This change fixes a bug when the mmc2/sdio is initialized leading to a hung
task:

[  242.704294] INFO: task kworker/7:1:675 blocked for more than 120 seconds.
[  242.711129]       Not tainted 4.9.0-rc8-00017-gcf0251f #3
[  242.716571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.724435] kworker/7:1     D    0   675      2 0x00000000
[  242.729973] Workqueue: events_freezable mmc_rescan
[  242.734796] Call trace:
[  242.737269] [<ffff00000808611c>] __switch_to+0xa8/0xb4
[  242.742437] [<ffff000008d07c04>] __schedule+0x1c0/0x67c
[  242.747689] [<ffff000008d08254>] schedule+0x40/0xa0
[  242.752594] [<ffff000008d0b284>] schedule_timeout+0x1c4/0x35c
[  242.758366] [<ffff000008d08e38>] wait_for_common+0xd0/0x15c
[  242.763964] [<ffff000008d09008>] wait_for_completion+0x28/0x34
[  242.769825] [<ffff000008a1a9f4>] mmc_wait_for_req_done+0x40/0x124
[  242.775949] [<ffff000008a1ab98>] mmc_wait_for_req+0xc0/0xf8
[  242.781549] [<ffff000008a1ac3c>] mmc_wait_for_cmd+0x6c/0x84
[  242.787149] [<ffff000008a26610>] mmc_io_rw_direct_host+0x9c/0x114
[  242.793270] [<ffff000008a26aa0>] sdio_reset+0x34/0x7c
[  242.798347] [<ffff000008a1d46c>] mmc_rescan+0x2fc/0x360

[ ... ]

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
1116274777 ARM: dts: imx6sx-sdb: Remove OPP override
commit d8581c7c8b upstream.

The board file for imx6sx-sdb overrides cpufreq operating points to use
higher voltages. This is done because the board has a shared rail for
VDD_ARM_IN and VDD_SOC_IN and when using LDO bypass the shared voltage
needs to be a value suitable for both ARM and SOC.

This only applies to LDO bypass mode, a feature not present in upstream.
When LDOs are enabled the effect is to use higher voltages than necessary
for no good reason.

Setting these higher voltages can make some boards fail to boot with ugly
semi-random crashes reminiscent of memory corruption. These failures only
happen on board rev. C, rev. B is reported to still work.

Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Fixes: 54183bd7f7 ("ARM: imx6sx-sdb: add revb board and make it default")
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
8539c6b56a ARM: dts: at91: sama5d3_xplained: not all ADC channels are available
commit d3df1ec063 upstream.

Remove ADC channels that are not available by default on the sama5d3_xplained
board (resistor not populated) in order to not create confusion.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
87cb676e83 ARM: dts: at91: sama5d3_xplained: fix ADC vref
commit 9cdd31e591 upstream.

The voltage reference for the ADC is not 3V but 3.3V since it is connected to
VDDANA.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
6b9e12331a ARM: 8670/1: V7M: Do not corrupt vector table around v7m_invalidate_l1 call
commit 6d80594936 upstream.

We save/restore registers around v7m_invalidate_l1 to address pointed
by r12, which is vector table, so the first eight entries are
overwritten with a garbage. We already have stack setup at that stage,
so use it to save/restore register.

Fixes: 6a8146f420 ("ARM: 8609/1: V7M: Add support for the Cortex-M7 processor")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
3e9570206d ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap
commit b089c31c51 upstream.

To cope with the variety in ARM architectures and configurations, the
pagetable attributes for kernel memory are generated at runtime to match
the system the kernel finds itself on. This calculated value is stored
in pgprot_kernel.

However, when early fixmap support was added for ARM (commit
a5f4c561b3) the attributes used for mappings were hard coded because
pgprot_kernel is not set up early enough. Unfortunately, when fixmap is
used after early boot this means the memory being mapped can have
different attributes to existing mappings, potentially leading to
unpredictable behaviour. A specific problem also exists due to the hard
coded values not include the 'shareable' attribute which means on
systems where this matters (e.g. those with multiple CPU clusters) the
cache contents for a memory location can become inconsistent between
CPUs.

To resolve these issues we change fixmap to use the same memory
attributes (from pgprot_kernel) that the rest of the kernel uses. To
enable this we need to refactor the initialisation code so
build_mem_type_table() is called early enough. Note, that relies on early
param parsing for memory type overrides passed via the kernel command
line, so we need to make sure this call is still after
parse_early_params().

[ardb: keep early_fixmap_init() before param parsing, for earlycon]

Fixes: a5f4c561b3 ("ARM: 8415/1: early fixmap support for earlycon")
Tested-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
7e2aa72f11 ARM: 8662/1: module: split core and init PLT sections
commit b7ede5a1f5 upstream.

Since commit 35fa91eed8 ("ARM: kernel: merge core and init PLTs"),
the ARM module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.

However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This puts the PLT entries out of reach of the relocated branch
instructions, defeating the whole purpose of PLTs.

So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section.

Fixes: 35fa91eed8 ("ARM: kernel: merge core and init PLTs")
Reported-by: Angus Clark <angus@angusclark.org>
Tested-by: Angus Clark <angus@angusclark.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
ae9bd04a13 KVM: arm: plug potential guest hardware debug leakage
commit 661e6b02b5 upstream.

Hardware debugging in guests is not intercepted currently, it means
that a malicious guest can bring down the entire machine by writing
to the debug registers.

This patch enable trapping of all debug registers, preventing the
guests to access the debug registers. This includes access to the
debug mode(DBGDSCR) in the guest world all the time which could
otherwise mess with the host state. Reads return 0 and writes are
ignored (RAZ_WI).

The result is the guest cannot detect any working hardware based debug
support. As debug exceptions are still routed to the guest normal
debug using software based breakpoints still works.

To support debugging using hardware registers we need to implement a
debug register aware world switch as well as special trapping for
registers that may affect the host state.

Signed-off-by: Zhichao Huang <zhichao.huang@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:24 +02:00
c870815609 KVM: arm/arm64: vgic-v3: Do not use Active+Pending state for a HW interrupt
commit 3d6e77ad14 upstream.

When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).

Fixes: 59529f69f5 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
181339b540 KVM: arm/arm64: vgic-v2: Do not use Active+Pending state for a HW interrupt
commit ddf42d068f upstream.

When an interrupt is injected with the HW bit set (indicating that
deactivation should be propagated to the physical distributor),
special care must be taken so that we never mark the corresponding
LR with the Active+Pending state (as the pending state is kept in
the physycal distributor).

Fixes: 140b086dd1 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
50f665c39d arm: KVM: Do not use stack-protector to compile HYP code
commit 501ad27c67 upstream.

We like living dangerously. Nothing explicitely forbids stack-protector
to be used in the HYP code, while distributions routinely compile their
kernel with it. We're just lucky that no code actually triggers the
instrumentation.

Let's not try our luck for much longer, and disable stack-protector
for code living at HYP.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
c735d4e3c2 arm64: KVM: Do not use stack-protector to compile EL2 code
commit cde13b5dad upstream.

We like living dangerously. Nothing explicitely forbids stack-protector
to be used in the EL2 code, while distributions routinely compile their
kernel with it. We're just lucky that no code actually triggers the
instrumentation.

Let's not try our luck for much longer, and disable stack-protector
for code living at EL2.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
75295195f7 powerpc/tm: Fix FP and VMX register corruption
commit f48e91e87e upstream.

In commit dc3106690b ("powerpc: tm: Always use fp_state and vr_state
to store live registers"), a section of code was removed that copied
the current state to checkpointed state. That code should not have been
removed.

When an FP (Floating Point) unavailable is taken inside a transaction,
we need to abort the transaction. This is because at the time of the
tbegin, the FP state is bogus so the state stored in the checkpointed
registers is incorrect. To fix this, we treclaim (to get the
checkpointed GPRs) and then copy the thread_struct FP live state into
the checkpointed state. We then trecheckpoint so that the FP state is
correctly restored into the CPU.

The copying of the FP registers from live to checkpointed is what was
missing.

This simplifies the logic slightly from the original patch.
tm_reclaim_thread() will now always write the checkpointed FP
state. Either the checkpointed FP state will be written as part of
the actual treclaim (in tm.S), or it'll be a copy of the live
state. Which one we use is based on MSR[FP] from userspace.

Similarly for VMX.

Fixes: dc3106690b ("powerpc: tm: Always use fp_state and vr_state to store live registers")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: cyrilbur@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
f9fde78726 powerpc/mm: Fix crash in page table dump with huge pages
commit bfb9956ab4 upstream.

The page table dump code doesn't know about huge pages, so currently
it crashes (or walks random memory, usually leading to a crash), if it
finds a huge page. On Book3S we only see huge pages in the Linux page
tables when we're using the P9 Radix MMU.

Teaching the code to properly handle huge pages is a bit more involved,
so for now just prevent the crash.

Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
1d19e3f139 powerpc/64e: Fix hang when debugging programs with relocated kernel
commit fd615f69a1 upstream.

Debug interrupts can be taken during interrupt entry, since interrupt
entry does not automatically turn them off.  The kernel will check
whether the faulting instruction is between [interrupt_base_book3e,
__end_interrupts], and if so clear MSR[DE] and return.

However, when the kernel is built with CONFIG_RELOCATABLE, it can't use
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) and
LOAD_REG_IMMEDIATE(r15,__end_interrupts), as they ignore relocation.
Thus, if the kernel is actually running at a different address than it
was built at, the address comparison will fail, and the exception entry
code will hang at kernel_dbg_exc.

r2(toc) is also not usable here, as r2 still holds data from the
interrupted context, so LOAD_REG_ADDR() doesn't work either.  So we use
the *name@got* to get the EV of two labels directly.

Test programs test.c shows as follows:
int main(int argc, char *argv[])
{
	if (access("/proc/sys/kernel/perf_event_paranoid", F_OK) == -1)
		printf("Kernel doesn't have perf_event support\n");
}

Steps to reproduce the bug, for example:
 1) ./gdb ./test
 2) (gdb) b access
 3) (gdb) r
 4) (gdb) s

Signed-off-by: Liu Hailong <liu.hailong6@zte.com.cn>
Signed-off-by: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Reviewed-by: Liu Song <liu.song11@zte.com.cn>
Reviewed-by: Huang Jian <huang.jian@zte.com.cn>
[scottwood: cleaned up commit message, and specified bad behavior
 as a hang rather than an oops to correspond to mainline kernel behavior]
Fixes: 1cb6e06492 ("powerpc/book3e: support CONFIG_RELOCATABLE")
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
31f96d860e powerpc/powernv: Fix TCE kill on NVLink2
commit 6b3d12a948 upstream.

Commit 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on
NVLink2") forced all TCE kills to go via the OPAL call for
NVLink2. However the PHB3 implementation of TCE kill was still being
called directly from some functions which in some circumstances caused
a machine check.

This patch adds an equivalent IODA2 version of the function which uses
the correct invalidation method depending on PHB model and changes all
external callers to use it instead.

Fixes: 616badd2fb ("powerpc/powernv: Use OPAL call for TCE kill on NVLink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
8781143e13 powerpc/iommu: Do not call PageTransHuge() on tail pages
commit e889e96e98 upstream.

The CMA pages migration code does not support compound pages at
the moment so it performs few tests before proceeding to actual page
migration.

One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
it is designed to be called on head pages only. Since we also test for
PageCompound(), and it contains PageTail() and PageHead(), we can
simplify the check by leaving just PageCompound() and therefore avoid
possible VM_BUG_ON_PAGE.

Fixes: 2e5bbb5461 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:23 +02:00
eb0807b288 powerpc/sysfs: Fix reference leak of cpu device_nodes present at boot
commit e76ca27790 upstream.

For CPUs present at boot each logical CPU acquires a reference to the
associated device node of the core. This happens in register_cpu() which
is called by topology_init(). The result of this is that we end up with
a reference held by each thread of the core. However, these references
are never freed if the CPU core is DLPAR removed.

This patch fixes the reference leaks by acquiring and releasing the references
in the CPU hotplug callbacks un/register_cpu_online(). With this patch symmetric
reference counting is observed with both CPUs present at boot, and those DLPAR
added after boot.

Fixes: f86e4718f2 ("driver/core: cpu: initialize of_node in cpu's device struture")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
d731227327 powerpc/pseries: Fix of_node_put() underflow during DLPAR remove
commit 68baf692c4 upstream.

Historically struct device_node references were tracked using a kref embedded as
a struct field. Commit 75b57ecf9d ("of: Make device nodes kobjects so they
show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
the device tree could by more simply exposed to userspace using sysfs.

Commit 0829f6d1f6 ("of: device_node kobject lifecycle fixes") (Mar 2014)
followed up these changes to better control the kobject lifecycle and in
particular the referecne counting via of_node_get(), of_node_put(), and
of_node_init().

A result of this second commit was that it introduced an of_node_put() call when
a dynamic node is detached, in of_node_remove(), that removes the initial kobj
reference created by of_node_init().

Traditionally as the original dynamic device node user the pseries code had
assumed responsibilty for releasing this final reference in its platform
specific DLPAR detach code.

This patch fixes a refcount underflow introduced by commit 0829f6d1f6, and
recently exposed by the upstreaming of the recount API.

Messages like the following are no longer seen in the kernel log with this
patch following DLPAR remove operations of cpus and pci devices.

  rpadlpar_io: slot PHB 72 removed
  refcount_t: underflow; use-after-free.
  ------------[ cut here ]------------
  WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110

Fixes: 0829f6d1f6 ("of: device_node kobject lifecycle fixes")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
[mpe: Make change log commit references more verbose]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
11e382b3b8 powerpc/book3s/mce: Move add_taint() later in virtual mode
commit d93b0ac01a upstream.

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads to unexpected behaviour or the
kernel to panic. Instead move the add_taint() call later in the virtual
mode where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.

Fixes: 27ea2c420c ("powerpc: Set the correct kernel taint on machine check errors.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
e548d552c0 powerpc/eeh: Avoid use after free in eeh_handle_special_event()
commit daeba2956f upstream.

eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE.  This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70d ("powerpc/eeh: EEH core to handle special event")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
63f44c113b powerpc/mm: Ensure IRQs are off in switch_mm()
commit 9765ad134a upstream.

powerpc expects IRQs to already be (soft) disabled when switch_mm() is
called, as made clear in the commit message of 9c1e105238 ("powerpc: Allow
perf_counters to access user memory at interrupt time").

Aside from any race conditions that might exist between switch_mm() and an IRQ,
there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
followed at some point by an IRQ enable then interrupts will remain disabled
until we return to userspace.

It is true that when switch_mm() is called from the scheduler IRQs are off, but
not when it's called by use_mm(). Looking closer we see that last year in commit
f98db6013c ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
this was made more explicit by the addition of switch_mm_irqs_off() which is now
called by the scheduler, vs switch_mm() which is used by use_mm().

Arguably it is a bug in use_mm() to call switch_mm() in a different context than
it expects, but fixing that will take time.

This was discovered recently when vhost started throwing warnings such as:

  BUG: sleeping function called from invalid context at kernel/mutex.c:578
  in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
  no locks held by vhost-10760/10768.
  irq event stamp: 10
  hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
  hardirqs last disabled at (10): switch_slb+0x2e4/0x490
  softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
  softirqs last disabled at (0):  (null)
  Call Trace:
    show_stack+0x88/0x390 (unreliable)
    dump_stack+0x30/0x44
    __might_sleep+0x1c4/0x2d0
    mutex_lock_nested+0x74/0x5c0
    cgroup_attach_task_all+0x5c/0x180
    vhost_attach_cgroups_work+0x58/0x80 [vhost]
    vhost_worker+0x24c/0x3d0 [vhost]
    kthread+0xec/0x100
    ret_from_kernel_thread+0x5c/0xd4

Prior to commit 04b96e5528 ("vhost: lockless enqueuing") (Aug 2016) the
vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
which had the effect of reenabling IRQs. Since that commit removed the locking
in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
off causing the warnings.

This patch addresses the problem by making the powerpc code mirror the x86 code,
ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
defining switch_mm_irqs_off().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[mpe: Flesh out/rewrite change log, add stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
5617775ed1 cx231xx-cards: fix NULL-deref at probe
commit 0cd273bb5e upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.

Fixes: e0d3bafd02 ("V4L/DVB (10954): Add cx231xx USB driver")

Cc: Sri Deevi <Srinivasa.Deevi@conexant.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
6f623a3490 cx231xx-audio: fix NULL-deref at probe
commit 65f921647f upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.

Fixes: e0d3bafd02 ("V4L/DVB (10954): Add cx231xx USB driver")

Cc: Sri Deevi <Srinivasa.Deevi@conexant.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
8bb9fa63d0 cx231xx-audio: fix init error path
commit fff1abc4d5 upstream.

Make sure to release the snd_card also on a late allocation error.

Fixes: e0d3bafd02 ("V4L/DVB (10954): Add cx231xx USB driver")

Cc: Sri Deevi <Srinivasa.Deevi@conexant.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
a28e1f13a6 dw2102: limit messages to buffer size
commit 950e252cb4 upstream.

Otherwise the i2c transfer functions can read or write beyond the end of
stack or heap buffers.

Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
f953d6b03a digitv: limit messages to buffer size
commit 821117dc21 upstream.

Return an error rather than memcpy()ing beyond the end of the buffer.
Internal callers use appropriate sizes, but digitv_i2c_xfer may not.

Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:22 +02:00
9e074c588d dvb-frontends/cxd2841er: define symbol_rate_min/max in T/C fe-ops
commit 158f0328af upstream.

Fixes "w_scan -f c" complaining with

  This dvb driver is *buggy*: the symbol rate limits are undefined - please
  report to linuxtv.org)

Signed-off-by: Daniel Scheller <d.scheller@gmx.net>
Acked-by: Abylay Ospan <aospan@netup.ru>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
ac6dedddba zr364xx: enforce minimum size when reading header
commit ee0fe833d9 upstream.

This code copies actual_length-128 bytes from the header, which will
underflow if the received buffer is too small.

Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
20c98053a2 dib0700: fix NULL-deref at probe
commit d5823511c0 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.

Fixes: c4018fa2e4 ("[media] dib0700: fix RC support on Hauppauge
Nova-TD")

Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
6af3917b8d s5p-mfc: Fix unbalanced call to clock management
commit a5cb00eb42 upstream.

Clock should be turned off after calling s5p_mfc_init_hw() from the
watchdog worker, like it is already done in the s5p_mfc_open() which also
calls this function.

Fixes: af93574678 ("[media] MFC: Add MFC 5.1 V4L2 driver")

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
4f220307a8 gspca: konica: add missing endpoint sanity check
commit aa58fedb8c upstream.

Make sure to check the number of endpoints to avoid accessing memory
beyond the endpoint array should a device lack the expected endpoints.

Note that, as far as I can tell, the gspca framework has already made
sure there is at least one endpoint in the current alternate setting so
there should be no risk for a NULL-pointer dereference here.

Fixes: b517af7228 ("V4L/DVB: gspca_konica: New gspca subdriver for
konica chipset using cams")

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hansverk@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
d926ceaa72 s5p-mfc: Fix race between interrupt routine and device functions
commit 0c32b8ec02 upstream.

Interrupt routine must wake process waiting for given interrupt AFTER
updating driver's internal structures and contexts. Doing it in-between
is a serious bug. This patch moves all calls to the wake() function to
the end of the interrupt processing block to avoid potential and real
races, especially on multi-core platforms. This also fixes following issue
reported from clock core (clocks were disabled in interrupt after being
unprepared from the other place in the driver, the stack trace however
points to the different place than s5p_mfc driver because of the race):

WARNING: CPU: 1 PID: 18 at drivers/clk/clk.c:544 clk_core_unprepare+0xc8/0x108
Modules linked in:
CPU: 1 PID: 18 Comm: kworker/1:0 Not tainted 4.10.0-next-20170223-00070-g04e18bc99ab9-dirty #2154
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: pm pm_runtime_work
[<c010d8b0>] (unwind_backtrace) from [<c010a534>] (show_stack+0x10/0x14)
[<c010a534>] (show_stack) from [<c033292c>] (dump_stack+0x74/0x94)
[<c033292c>] (dump_stack) from [<c011cef4>] (__warn+0xd4/0x100)
[<c011cef4>] (__warn) from [<c011cf40>] (warn_slowpath_null+0x20/0x28)
[<c011cf40>] (warn_slowpath_null) from [<c0387a84>] (clk_core_unprepare+0xc8/0x108)
[<c0387a84>] (clk_core_unprepare) from [<c0389d84>] (clk_unprepare+0x24/0x2c)
[<c0389d84>] (clk_unprepare) from [<c03d4660>] (exynos_sysmmu_suspend+0x48/0x60)
[<c03d4660>] (exynos_sysmmu_suspend) from [<c042b9b0>] (pm_generic_runtime_suspend+0x2c/0x38)
[<c042b9b0>] (pm_generic_runtime_suspend) from [<c0437580>] (genpd_runtime_suspend+0x94/0x220)
[<c0437580>] (genpd_runtime_suspend) from [<c042e240>] (__rpm_callback+0x134/0x208)
[<c042e240>] (__rpm_callback) from [<c042e334>] (rpm_callback+0x20/0x80)
[<c042e334>] (rpm_callback) from [<c042d3b8>] (rpm_suspend+0xdc/0x458)
[<c042d3b8>] (rpm_suspend) from [<c042ea24>] (pm_runtime_work+0x80/0x90)
[<c042ea24>] (pm_runtime_work) from [<c01322c4>] (process_one_work+0x120/0x318)
[<c01322c4>] (process_one_work) from [<c0132520>] (worker_thread+0x2c/0x4ac)
[<c0132520>] (worker_thread) from [<c0137ab0>] (kthread+0xfc/0x134)
[<c0137ab0>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
---[ end trace 1ead49a7bb83f0d8 ]---

Fixes: af93574678 ("[media] MFC: Add MFC 5.1 V4L2 driver")

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
776a591f75 cec: Fix runtime BUG when (CONFIG_RC_CORE && !CEC_CAP_RC)
commit 43c0c03961 upstream.

Currently when the RC Core is enabled (reachable) core code located
in cec_register_adapter() attempts to populate the RC structure with
a pointer to the 'parent' passed in by the caller.

Unfortunately if the caller did not specify RC capability when calling
cec_allocate_adapter(), then there will be no RC structure to populate.

This causes a "NULL pointer dereference" error.

Fixes: f51e80804f ("[media] cec: pass parent device in register(), not allocate()")

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
0d1015c0e9 iio: hid-sensor: Store restore poll and hysteresis on S3
commit 5d9854eaea upstream.

This change undo the change done by 'commit 3bec247474
("iio: hid-sensor-trigger: Change get poll value function order to avoid
sensor properties losing after resume from S3")' as this breaks some
USB/i2c sensor hubs.

Instead of relying on HW for restoring poll and hysteresis, driver stores
and restores on resume (S3). In this way user space modified settings are
not lost for any kind of sensor hub behavior.

In this change, whenever user space modifies sampling frequency or
hysteresis driver will get the feature value from the hub and store in the
per device hid_sensor_common data structure. On resume callback from S3,
system will set the feature to sensor hub, if user space ever modified the
feature value.

Fixes: 3bec247474 ("iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3")
Reported-by: Ritesh Raj Sarraf <rrs@researchut.com>
Tested-by: Ritesh Raj Sarraf <rrs@researchut.com>
Tested-by: Song, Hongyan <hongyan.song@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:21 +02:00
03652e4884 iio: proximity: as3935: fix as3935_write
commit 84ca8e364a upstream.

AS3935_WRITE_DATA macro bit is incorrect and the actual write
sequence is two leading zeros.

Cc: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
b13b3f3985 ipx: call ipxitf_put() in ioctl error path
commit ee0d8d8482 upstream.

We should call ipxitf_put() if the copy_to_user() fails.

Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
1e2a08069d USB: hub: fix non-SS hub-descriptor handling
commit bec444cd1c upstream.

Add missing sanity check on the non-SuperSpeed hub-descriptor length in
order to avoid parsing and leaking two bytes of uninitialised slab data
through sysfs removable-attributes (or a compound-device debug
statement).

Note that we only make sure that the DeviceRemovable field is always
present (and specifically ignore the unused PortPwrCtrlMask field) in
order to continue support any hubs with non-compliant descriptors. As a
further safeguard, the descriptor buffer is also cleared.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
345477ed11 USB: hub: fix SS hub-descriptor handling
commit 2c25a2c818 upstream.

A SuperSpeed hub descriptor does not have any variable-length fields so
bail out when reading a short descriptor.

This avoids parsing and leaking two bytes of uninitialised slab data
through sysfs removable-attributes.

Fixes: dbe79bbe9d ("USB 3.0 Hub Changes")
Cc: John Youn <John.Youn@synopsys.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
3a82455292 USB: serial: io_ti: fix div-by-zero in set_termios
commit 6aeb75e6ad upstream.

Fix a division-by-zero in set_termios when debugging is enabled and a
high-enough speed has been requested so that the divisor value becomes
zero.

Instead of just fixing the offending debug statement, cap the baud rate
at the base as a zero divisor value also appears to crash the firmware.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
e4d3ae36fc USB: serial: mct_u232: fix big-endian baud-rate handling
commit 26cede3436 upstream.

Drop erroneous cpu_to_le32 when setting the baud rate, something which
corrupted the divisor on big-endian hosts.

Found using sparse:

	warning: incorrect type in argument 1 (different base types)
	    expected unsigned int [unsigned] [usertype] val
	    got restricted __le32 [usertype] <noident>

Fixes: af2ac1a091 ("USB: serial mct_usb232: move DMA buffers to heap")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-By: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
07cf1f9d01 USB: serial: qcserial: add more Lenovo EM74xx device IDs
commit 8d7a10dd32 upstream.

In their infinite wisdom, and never ending quest for end user frustration,
Lenovo has decided to use new USB device IDs for the wwan modules in
their 2017 laptops.  The actual hardware is still the Sierra Wireless
EM7455 or EM7430, depending on region.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
d986525b7d usb: serial: option: add Telit ME910 support
commit 40dd46048c upstream.

This patch adds support for Telit ME910 PID 0x1100.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
b58d7087ce USB: iowarrior: fix info ioctl on big-endian hosts
commit dd5ca753fa upstream.

Drop erroneous le16_to_cpu when returning the USB device speed which is
already in host byte order.

Found using sparse:

	warning: cast to restricted __le16

Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:20 +02:00
d69737cb64 usb: musb: Fix trying to suspend while active for OTG configurations
commit 3c50ffef25 upstream.

Commit d8e5f0eca1 ("usb: musb: Fix hardirq-safe hardirq-unsafe
lock order error") caused a regression where musb keeps trying to
enable host mode with no cable connected. This seems to be caused
by the fact that now phy is enabled earlier, and we are wrongly
trying to force USB host mode on an OTG port. The errors we are
getting are "trying to suspend as a_idle while active".

For ports configured as OTG, we should not need to do anything
to try to force USB host mode on it's OTG port. Trying to force host
mode in this case just seems to completely confuse the musb state
machine.

Let's fix the issue by making musb_host_setup() attempt to force the
mode only if port_mode is configured for host mode.

Fixes: d8e5f0eca1 ("usb: musb: Fix hardirq-safe hardirq-unsafe lock order error")
Cc: Johan Hovold <johan@kernel.org>
Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reported-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
925f7f9244 usb: musb: tusb6010_omap: Do not reset the other direction's packet size
commit 6df2b42f7c upstream.

We have one register for each EP to set the maximum packet size for both
TX and RX.
If for example an RX programming would happen before the previous TX
transfer finishes we would reset the TX packet side.

To fix this issue, only modify the TX or RX part of the register.

Fixes: 550a7375fe ("USB: Add MUSB and TUSB support")
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
33c19ef5e2 usb: dwc3: gadget: Prevent losing events in event cache
commit d325a1de49 upstream.

The dwc3 driver can overwite its previous events if its top-half IRQ
handler (TH) gets invoked again before processing the events in the
cache. We see this as a hang in the file transfer and the host will
attempt to reset the device. TH gets the event count and deasserts the
interrupt line by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. If
there's a new event coming between reading the event count and interrupt
deassertion, dwc3 will lose previous pending events. More generally, we
will see 0 event count, which should not affect anything.

This shouldn't be possible in the current dwc3 implementation. However,
through testing and reading the PCIe trace, the TH occasionally still
gets invoked one more time after HW interrupt deassertion. (With PCIe
legacy interrupts, TH is called repeatedly as long as the interrupt line
is asserted). We suspect that there is a small detection delay in the
SW.

To avoid this issue, Check DWC3_EVENT_PENDING flag to determine if the
events are processed in the bottom-half IRQ handler. If not, return
IRQ_HANDLED and don't process new event.

Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
79312c5d61 dvb-usb-dibusb-mc-common: Add MODULE_LICENSE
commit bf05b65a9f upstream.

dvb-usb-dibusb-mc-common is licensed under GPLv2, and if we don't say
so then it won't even load since it needs a GPL-only symbol.

Fixes: e91455a149 ("[media] dvb-usb: split out common parts of dibusb")

Reported-by: Dominique Dumont <dod@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
8d730619ae ttusb2: limit messages to buffer size
commit a12b8ab8c5 upstream.

Otherwise ttusb2_i2c_xfer can read or write beyond the end of static and
heap buffers.

Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
f2b30b2f19 mceusb: fix NULL-deref at probe
commit 03eb2a557e upstream.

Make sure to check for the required out endpoint to avoid dereferencing
a NULL-pointer in mce_request_packet should a malicious device lack such
an endpoint. Note that this path is hit during probe.

Fixes: 66e89522af ("V4L/DVB: IR: add mceusb IR receiver driver")

Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
03a7e78a25 usbvision: fix NULL-deref at probe
commit eacb975b48 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.

Fixes: 2a9f8b5d25 ("V4L/DVB (5206): Usbvision: set alternate interface
modification")

Cc: Thierry MERLE <thierry.merle@free.fr>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
3aade67b76 net: irda: irda-usb: fix firmware name on big-endian hosts
commit 75cf067953 upstream.

Add missing endianness conversion when using the USB device-descriptor
bcdDevice field to construct a firmware file name.

Fixes: 8ef80aef11 ("[IRDA]: irda-usb.c: STIR421x cleanups")
Cc: Nick Fedchik <nfedchik@atlantic-link.com.ua>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
0cc9aa4c4b usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
commit 7480d912d5 upstream.

According to xHCI ch4.20 Scratchpad Buffers, the Scratchpad
Buffer needs to be zeroed.

	...
	The following operations take place to allocate
       	Scratchpad Buffers to the xHC:
	...
		b. Software clears the Scratchpad Buffer to '0'

Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:19 +02:00
966e88d540 xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
commit a0c16630d3 upstream.

Intel Denverton microserver is Atom based and need the PME and CAS quirks
as well.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
5077513b4e USB: xhci: fix lock-inversion problem
commit 63aea0dbab upstream.

With threaded interrupts, bottom-half handlers are called with
interrupts enabled.  Therefore they can't safely use spin_lock(); they
have to use spin_lock_irqsave().  Lockdep warns about a violation
occurring in xhci_irq():

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.11.0-rc8-dbg+ #1 Not tainted
---------------------------------------------------------
swapper/7/0 just changed the state of lock:
 (&(&ehci->lock)->rlock){-.-...}, at: [<ffffffffa0130a69>]
ehci_hrtimer_func+0x29/0xc0 [ehci_hcd]
but this lock took another, HARDIRQ-unsafe lock in the past:
 (hcd_urb_list_lock){+.....}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hcd_urb_list_lock);
                               local_irq_disable();
                               lock(&(&ehci->lock)->rlock);
                               lock(hcd_urb_list_lock);
  <Interrupt>
    lock(&(&ehci->lock)->rlock);
 *** DEADLOCK ***

no locks held by swapper/7/0.
the shortest dependencies between 2nd lock and 1st lock:
 -> (hcd_urb_list_lock){+.....} ops: 252 {
    HARDIRQ-ON-W at:
                      __lock_acquire+0x602/0x1280
                      lock_acquire+0xd5/0x1c0
                      _raw_spin_lock+0x2f/0x40
                      usb_hcd_unlink_urb_from_ep+0x1b/0x60 [usbcore]
                      xhci_giveback_urb_in_irq.isra.45+0x70/0x1b0 [xhci_hcd]
                      finish_td.constprop.60+0x1d8/0x2e0 [xhci_hcd]
                      xhci_irq+0xdd6/0x1fa0 [xhci_hcd]
                      usb_hcd_irq+0x26/0x40 [usbcore]
                      irq_forced_thread_fn+0x2f/0x70
                      irq_thread+0x149/0x1d0
                      kthread+0x113/0x150
                      ret_from_fork+0x2e/0x40

This patch fixes the problem.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
354c85b7ad usb: host: xhci-plat: propagate return value of platform_get_irq()
commit 4b148d5144 upstream.

platform_get_irq() returns an error code, but the xhci-plat driver
ignores it and always returns -ENODEV. This is not correct, and
prevents -EPROBE_DEFER from being propagated properly.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
c3dc5f5ae5 xhci: remove GFP_DMA flag from allocation
commit 5db851cf20 upstream.

There is no reason to restrict allocations to the first 16MB ISA DMA
addresses.

It is causing problems in a virtualization setup with enabled IOMMU
(x86_64). The result is that USB is not working in the VM.

Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
d0c8639797 xhci: Fix command ring stop regression in 4.11
commit 604d02a2a6 upstream.

In 4.11 TRB completion codes were renamed to match spec.

Completion codes for command ring stopped and endpoint stopped
were mixed, leading to failures while handling a stopped command ring.

Use the correct completion code for command ring stopped events.

Fixes: 0b7c105a04 ("usb: host: xhci: rename completion codes to match spec")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
1b8de0a2bb EDAC, amd64: Fix reporting of Chip Select sizes on Fam17h
commit eb77e6b80f upstream.

The wrong index into the csbases/csmasks arrays was being passed to
the function to compute the chip select sizes, which resulted in the
wrong size being computed. Address that so that the correct values are
computed and printed.

Also, redo how we calculate the number of pages in a CS row.

Reported-by: Benjamin Bennett <benbennett@gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com
[ Remove unneeded integer math comment, minor cleanups. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
0a4115da61 dax: fix data corruption when fault races with write
commit 13e451fdc1 upstream.

Currently DAX read fault can race with write(2) in the following way:

CPU1 - write(2)			CPU2 - read fault
				dax_iomap_pte_fault()
				  ->iomap_begin() - sees hole
dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate
				  grab_mapping_entry()
				  - we add zero page in the radix tree
				    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
3ef206af54 libnvdimm: fix clear length of nvdimm_forget_poison()
commit 8d13c02906 upstream.

ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
of error actually cleared, which may be smaller than its requested
'len'.

Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
'clear_err.cleared' when this value is valid.

Fixes: e046114af5 ("libnvdimm: clear the internal poison_list when clearing badblocks")
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
7868d9ff2e Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
commit deccf497d8 upstream.

stat/lstat/fstatat need to pass AT_NO_AUTOMOUNT to vfs_statx() as the
pre-statx code didn't set LOOKUP_AUTOMOUNT, even though fstatat() accepted
the AT_NO_AUTOMOUNT flag.

Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Reported-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:18 +02:00
d08975541f USB: chaoskey: fix Alea quirk on big-endian hosts
commit 63afd5cc78 upstream.

Add missing endianness conversion when applying the Alea timeout quirk.

Found using sparse:

	warning: restricted __le16 degrades to integer

Fixes: e4a886e811 ("hwrng: chaoskey - Fix URB warning due to timeout on Alea")
Cc: Bob Ham <bob.ham@collabora.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
7ed35d6ca7 USB: serial: ftdi_sio: add Olimex ARM-USB-TINY(H) PIDs
commit 5f63424ab7 upstream.

This patch adds support for recognition of ARM-USB-TINY(H) devices which
are almost identical to ARM-USB-OCD(H) but lacking separate barrel jack
and serial console.

By suggestion from Johan Hovold it is possible to replace
ftdi_jtag_quirk with a bit more generic construction. Since all
Olimex-ARM debuggers has exactly two ports, we could safely always use
only second port within the debugger family.

Signed-off-by: Andrey Korolyov <andrey@xdel.ru>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
14bd189bef USB: serial: ftdi_sio: fix setting latency for unprivileged users
commit bb246681b3 upstream.

Commit 557aaa7ffa ("ft232: support the ASYNC_LOW_LATENCY
flag") enables unprivileged users to set the FTDI latency timer,
but there was a logic flaw that skipped sending the corresponding
USB control message to the device.

Specifically, the device latency timer would not be updated until next
open, something which was later also inadvertently broken by commit
c19db4c9e4 ("USB: ftdi_sio: set device latency timeout at port
probe").

A recent commit c6dce26266 ("USB: serial: ftdi_sio: fix extreme
low-latency setting") disabled the low-latency mode by default so we now
need this fix to allow unprivileged users to again enable it.

Signed-off-by: Anthony Mallet <anthony.mallet@laas.fr>
[johan: amend commit message]
Fixes: 557aaa7ffa ("ft232: support the ASYNC_LOW_LATENCY flag")
Fixes: c19db4c9e4 ("USB: ftdi_sio: set device latency timeout at port probe").
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
15e4ed2a46 pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()
commit 3fd3722621 upstream.

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&tasklist_lock)
  write_lock_irq(&tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in ->nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad7682 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
CC: Michal Hocko <mhocko@suse.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>
CC: Andrei Vagin <avagin@openvz.org>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Serge Hallyn <serge@hallyn.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
9fbdfb7af4 pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes
commit b9a985db98 upstream.

The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang <vovoy@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 6347e90091 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
4cbd5018c4 IB/hfi1: Fix a subcontext memory leak
commit 224d71f910 upstream.

The only context that frees user_exp_rcv data structures is the last
context closed (from a sub-context set).  This leaks the allocations
from the other sub-contexts.  Separate the common frees from the
specific frees and call them at the appropriate time.

Using KEDR to check for memory leaks we get:

Before test:

[leak_check] Possible leaks: 25

After test:

[leak_check] Possible leaks: 31  (6 leaked data structures)

After patch applied (before and after test have the same value)

[leak_check] Possible leaks: 25

Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
d971ab21c9 IB/hfi1: Return an error on memory allocation failure
commit 94679061dc upstream.

If the eager buffer allocation fails, it is necessary to return
an error code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
c07927502b iio: stm32 trigger: fix sampling_frequency read
commit 77a9febfd8 upstream.

When prescaler (PSC) is 0, it means div factor is 1: counter clock
frequency is equal to input clk / (PSC + 1).
When reload value is 8 for example, counter counts 9 cycles, from 0 to 8.
This is handled in frequency write routine, by writing respectively:
- prescaler - 1 to PSC
- reload value - 1 to ARR
This fix does the opposite when reading the frequency from PSC and ARR:
- prescaler is PSC + 1
- reload value is ARR + 1

Thus, PSC may be 0, depending on requested sampling frequency (div 1).
In this case, reading freq wrongly reports 0, instead of computing and
reporting correct value.
Remove test on !psc and !arr.

Small test on stm32f4 (example on tim1_trgo), before this fix:
$ cd /sys/bus/iio/devices/triggerX
$ echo 10000 > sampling_frequency
$ cat sampling_frequency
0

After this fix:
$ echo 10000 > sampling_frequency
$ cat sampling_frequency
10000

Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
705fa4038f IIO: bmp280-core.c: fix error in humidity calculation
commit ed3730c435 upstream.

While calculating the compensation of the humidity there are negative values
interpreted as unsigned because of unsigned variables used.  These values as
well as the constants need to be casted to signed as indicated by the
documentation of the sensor.

Signed-off-by: Andreas Klinger <ak@it-klinger.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Matt Ranostay <matt.ranostay@konsulko.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:17 +02:00
327f5da04e iio: dac: ad7303: fix channel description
commit ce420fd425 upstream.

realbits, storagebits and shift should be numbers, not ASCII characters.

Signed-off-by: Pavel Roskin <plroskin@gmail.com>
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
6567967aad scsi: lpfc: Fix panic on BFS configuration
commit 4492b739c9 upstream.

To select the appropriate shost template, the driver is issuing a
mailbox command to retrieve the wwn. Turns out the sending of the
command precedes the reset of the function.  On SLI-4 adapters, this is
inconsequential as the mailbox command location is specified by dma via
the BMBX register. However, on SLI-3 adapters, the location of the
mailbox command submission area changes. When the function is first
powered on or reset, the cmd is submitted via PCI bar memory. Later the
driver changes the function config to use host memory and DMA. The
request to start a mailbox command is the same, a simple doorbell write,
regardless of submission area.  So.. if there has not been a boot driver
run against the adapter, the mailbox command works as defaults are
ok. But, if the boot driver has configured the card and, and if no
platform pci function/slot reset occurs as the os starts, the mailbox
command will fail. The SLI-3 device will use the stale boot driver dma
location. This can cause PCI eeh errors.

Fix is to reset the sli-3 function before sending the mailbox command,
thus synchronizing the function/driver on mailbox location.

Note: The fix uses routines that are typically invoked later in the call
flow to reset the sli-3 device. The issue in using those routines is
that the normal (non-fix) flow does additional initialization, namely
the allocation of the pport structure. So, rather than significantly
reworking the initialization flow so that the pport is alloc'd first,
pointer checks are added to work around it. Checks are limited to the
routines invoked by a sli-3 adapter (s3 routines) as this fix/early call
is only invoked on a sli3 adapter. Nothing changes post the
fix. Subsequent initialization, and another adapter reset, still occur -
both on sli-3 and sli-4 adapters.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Fixes: 96418b5e2c ("scsi: lpfc: Fix eh_deadline setting for sli3 adapters.")
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
5d6d5f07f8 ibmvscsis: Do not send aborted task response
commit 25e7853126 upstream.

The driver is sending a response to the actual scsi op that was
aborted by an abort task TM, while LIO is sending a response to
the abort task TM.

ibmvscsis_tgt does not send the response to the client until
release_cmd time. The reason for this was because if we did it
at queue_status time, then the client would be free to reuse the
tag for that command, but we're still using the tag until the
command is released at release_cmd time, so we chose to delay
sending the response until then. That then caused this issue, because
release_cmd is always called, even if queue_status is not.

SCSI spec says that the initiator that sends the abort task
TM NEVER gets a response to the aborted op and with the current
code it will send a response. Thus this fix will remove that response
if the CMD_T_ABORTED && !CMD_T_TAS.

Another case with a small timing window is the case where if LIO sends a
TMR_DOES_NOT_EXIST, and the release_cmd callback is called for the TMR Abort
cmd before the release_cmd for the (attemped) aborted cmd, then we need to
ensure that we send the response for the (attempted) abort cmd to the client
before we send the response for the TMR Abort cmd.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
c3f0eff47d of: fdt: add missing allocation-failure check
commit 49e67dd176 upstream.

The memory allocator passed to __unflatten_device_tree() (e.g. a wrapped
kzalloc) can fail so add the missing sanity check to avoid dereferencing
a NULL pointer.

Fixes: fe14042358 ("of/flattree: Refactor unflatten_device_tree and add fdt_unflatten_tree")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
5bc2cfba89 of: fix "/cpus" reference leak in of_numa_parse_cpu_nodes()
commit b8475cbee5 upstream.

The call to of_find_node_by_path("/cpus") returns the cpus device_node
with its reference count incremented. There is no matching of_node_put()
call in of_numa_parse_cpu_nodes() which results in a leaked reference
to the "/cpus" node.

This patch adds an of_node_put() to release the reference.

fixes: 298535c00a ("of, numa: Add NUMA of binding implementation.")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
854d0231ef of: fix sparse warning in of_pci_range_parser_one
commit eb31003657 upstream.

sparse gives the following warning for 'pci_space':

../drivers/of/address.c:266:26: warning: incorrect type in assignment (different base types)
../drivers/of/address.c:266:26:    expected unsigned int [unsigned] [usertype] pci_space
../drivers/of/address.c:266:26:    got restricted __be32 const [usertype] <noident>

It appears that pci_space is only ever accessed on powerpc, so the endian
swap is often not needed.

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
d5331e619a proc: Fix unbalanced hard link numbers
commit d66bb1607e upstream.

proc_create_mount_point() forgot to increase the parent's nlink, and
it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
less than expected.

Fixes: eb6d38d542 ("proc: Allow creating permanently empty directories...")
Reported-by: Tristan Ye <tristan.ye@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
525d5ef5ae cxl: Route eeh events to all drivers in cxl_pci_error_detected()
commit 4f58f0bf15 upstream.

Fix a boundary condition where in some cases an eeh event that results
in card reset isn't passed on to a driver attached to the virtual PCI
device associated with a slice. This will happen in case when a slice
attached device driver returns a value other than
PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
would result in an early return from cxl_pci_error_detected() and
other drivers attached to other AFUs on the card wont be notified.

The patch fixes this by making sure that all slice attached
device-drivers are notified and the return values from
error_detected() callback are aggregated in a scheme where request for
'disconnect' trumps all and 'none' trumps 'need_reset'.

Fixes: 9e8df8a219 ("cxl: EEH support")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
4a0e8757ad cxl: Force context lock during EEH flow
commit ea9a26d117 upstream.

During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:

  Adapter context unlocked with 0 active contexts
  ------------[ cut here ]------------
  WARNING: CPU: 12 PID: 627 at
  ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]

Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.

To fix this warning, we acquire the adapter context-lock via
cxl_adapter_context_lock() in the eeh callback
cxl_pci_error_detected() once all the virtual AFU PHBs are notified
and their contexts detached. The context-lock is released in
cxl_pci_slot_reset() after the adapter is successfully reconfigured
and before the we call the slot_reset callback on slice attached
device-drivers.

Fixes: 70b565bbdb ("cxl: Prevent adapter reset if an active context exists")
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Tested-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:16 +02:00
6c58e144d5 ohci-pci: add qemu quirk
commit 21a60f6e65 upstream.

On a loaded virtualization host (dozen guests booting at the same time)
it may happen that the ohci controller emulation doesn't manage to do
timely frame processing, with the result that the io watchdog fires and
considers the controller being dead, even though it's only the emulation
being unusual slow due to the load peak.

So, add a quirk for qemu and don't use the watchdog in case we figure we
are running on emulated ohci.  The virtual ohci controller masquerades
as apple ohci controller, but we can identify it by subsystem id.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
a83f3099d0 cdc-acm: fix possible invalid access when processing notification
commit 1bb9914e17 upstream.

Notifications may only be 8 bytes long. Accessing the 9th and
10th byte of unimplemented/unknown notifications may be insecure.
Also check the length of known notifications before accessing anything
behind the 8th byte.

Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
9541621a12 gpio: omap: return error if requested debounce time is not possible
commit 8397744393 upstream.

omap_gpio_debounce() does not validate that the requested debounce
is within a range it can handle. Instead it lets the register value
wrap silently, and always returns success.

This can lead to all sorts of unexpected behavior, such as gpio_keys
asking for a too-long debounce, but getting a very short debounce in
practice.

Fix this by returning -EINVAL if the requested value does not fit into
the register field. If there is no debounce clock available at all,
return -ENOTSUPP.

Fixes: e85ec6c304 ("gpio: omap: fix omap2_set_gpio_debounce")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Acked-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
33f379db47 drm/nouveau/tmr: handle races with hw when updating the next alarm time
commit 1b0f84380b upstream.

If the time to the next alarm is short enough, we could race with HW and
end up with an ~4 second delay until it triggers.

Fix this by checking again after we update HW.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
53b8e32038 drm/nouveau/tmr: avoid processing completed alarms when adding a new one
commit 330bdf62fe upstream.

The idea here was to avoid having to "manually" program the HW if there's
a new earliest alarm.  This was lazy and bad, as it leads to loads of fun
races between inter-related callers (ie. therm).

Turns out, it's not so difficult after all.  Go figure ;)

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
952030b868 drm/nouveau/tmr: fix corruption of the pending list when rescheduling an alarm
commit 9fc64667ee upstream.

At least therm/fantog "attempts" to work around this issue, which could
lead to corruption of the pending alarm list.

Fix it properly by not updating the timestamp without the lock held, or
trying to add an already pending alarm to the pending alarm list....

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
4c168033dd drm/nouveau/tmr: ack interrupt before processing alarms
commit 3733bd8b40 upstream.

Fixes a race where we can miss an alarm that triggers while we're already
processing previous alarms.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
a5e41c9e5a drm/nouveau/kms/nv50: skip core channel cursor update on position-only changes
commit e6db95799b upstream.

The DRM core used to only call prepare_fb/cleanup_fb() when a plane's
framebuffer changed, which achieved the desired effect.

It's apparently now up to the driver to decide on its own.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
4aecf3a73b drm/nouveau/kms/nv50: fix source-rect-only plane updates
commit 36601c2b36 upstream.

This "optimisation" (which was originally meant to skip updating cursor
settings in the core channel on position-only updates) turned out to be
pointless in the final design of the code before it was merged.

Remove it completely, as it breaks other cases.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:15 +02:00
2138faf017 drm/nouveau/therm: remove ineffective workarounds for alarm bugs
commit e4311ee51d upstream.

These were ineffective due to touching the list without the alarm lock,
but should no longer be required.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
21d3947bf7 drm/amdgpu: Add missing lb_vblank_lead_lines setup to DCE-6 path.
commit effaf848b9 upstream.

This apparently got lost when implementing the new DCE-6 support
and would cause failures in pageflip scheduling and timestamping.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
3f0db81c7d drm/amdgpu: Avoid overflows/divide-by-zero in latency_watermark calculations.
commit e190ed1ea7 upstream.

At dot clocks > approx. 250 Mhz, some of these calcs will overflow and
cause miscalculation of latency watermarks, and for some overflows also
divide-by-zero driver crash ("divide error: 0000 [#1] PREEMPT SMP" in
"dce_v10_0_latency_watermark+0x12d/0x190").

This zero-divide happened, e.g., on AMD Tonga Pro under DCE-10,
on a Displayport panel when trying to set a video mode of 2560x1440
at 165 Hz vrefresh with a dot clock of 635.540 Mhz.

Refine calculations to avoid the overflows.

Tested for DCE-10 with R9 380 Tonga + ASUS ROG PG279 panel.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
de2fa45aa9 drm/amdgpu: Make display watermark calculations more accurate
commit d63c277dc6 upstream.

Avoid big roundoff errors in scanline/hactive durations for
high pixel clocks, especially for >= 500 Mhz, and thereby
program more accurate display fifo watermarks.

Implemented here for DCE 6,8,10,11.
Successfully tested on DCE 10 with AMD R9 380 Tonga.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
fab4402325 ath9k_htc: fix NULL-deref at probe
commit ebeb36670e upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.

Fixes: 36bcce4306 ("ath9k_htc: Handle storage devices")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
43068b4585 ath9k_htc: Add support of AirTies 1eda:2315 AR9271 device
commit 16ff1fb0e3 upstream.

T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=01 Dev#=  7 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=ff Prot=ff MxPS=64 #Cfgs=  1
P:  Vendor=1eda ProdID=2315 Rev=01.08
S:  Manufacturer=ATHEROS
S:  Product=USB2.0 WLAN
S:  SerialNumber=12345
C:  #Ifs= 1 Cfg#= 1 Atr=80 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 6 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)

Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
82e31de018 s390/cputime: fix incorrect system time
commit 07a63cbe8b upstream.

git commit c5328901aa "[S390] entry[64].S improvements" removed
the update of the exit_timer lowcore field from the critical section
cleanup of the .Lsysc_restore/.Lsysc_done and .Lio_restore/.Lio_done
blocks. If the PSW is updated by the critical section cleanup to point to
user space again, the interrupt entry code will do a vtime calculation
after the cleanup completed with an exit_timer value which has *not* been
updated. Due to this incorrect system time deltas are calculated.

If an interrupt occured with an old PSW between .Lsysc_restore/.Lsysc_done
or .Lio_restore/.Lio_done update __LC_EXIT_TIMER with the system entry
time of the interrupt.

Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
5d1adbaf36 s390/kdump: Add final note
commit dcc00b79fc upstream.

Since linux v3.14 with commit 38dfac843c ("vmcore: prevent PT_NOTE
p_memsz overflow during header update") on s390 we get the following
message in the kdump kernel:

  Warning: Exceeded p_memsz, dropping PT_NOTE entry n_namesz=0x6b6b6b6b,
  n_descsz=0x6b6b6b6b

The reason for this is that we don't create a final zero note in
the ELF header which the proc/vmcore code uses to find out the end
of the notes section (see also kernel/kexec_core.c:final_note()).

It still worked on s390 by chance because we (most of the time?) have the
byte pattern 0x6b6b6b6b after the notes section which also makes the notes
parsing code stop in update_note_header_size_elf64() because 0x6b6b6b6b is
interpreded as note size:

  if ((real_sz + sz) > max_sz) {
          pr_warn("Warning: Exceeded p_memsz, dropping P ...);
          break;
  }

So fix this and add the missing final note to the ELF header.
We don't have to adjust the memory size for ELF header ("alloc_size")
because the new ELF note still fits into the 0x1000 base memory.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
d2cc3a8cb9 regulator: tps65023: Fix inverted core enable logic.
commit c90722b54a upstream.

Commit 43530b69d7 ("regulator: Use
regmap_read/write(), regmap_update_bits functions directly") intended
to replace working inline helper functions with standard regmap
calls.  However, it also inverted the set/clear logic of the "CORE ADJ
Allowed" bit.  That patch was clearly never tested, since without that
bit cleared, the core VDCDC1 voltage output does not react to I2C
configuration changes.

This patch fixes the issue by clearing the bit as in the original,
correct implementation.  Note for stable back porting that, due to
subsequent driver churn, this patch will not apply on every kernel
version.

Fixes: 43530b69d7 ("regulator: Use regmap_read/write(), regmap_update_bits functions directly")
Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:14 +02:00
fde4f79074 regulator: rk808: Fix RK818 LDO2
commit 75f8811539 upstream.

Set the correct voltage select register for LDO2.

Signed-off-by: Wadim Egorov <w.egorov@phytec.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
efb209c87a x86: fix 32-bit case of __get_user_asm_u64()
commit 33c9e97290 upstream.

The code to fetch a 64-bit value from user space was entirely buggered,
and has been since the code was merged in early 2016 in commit
b2f680380d ("x86/mm/32: Add support for 64-bit __get_user() on 32-bit
kernels").

Happily the buggered routine is almost certainly entirely unused, since
the normal way to access user space memory is just with the non-inlined
"get_user()", and the inlined version didn't even historically exist.

The normal "get_user()" case is handled by external hand-written asm in
arch/x86/lib/getuser.S that doesn't have either of these issues.

There were two independent bugs in __get_user_asm_u64():

 - it still did the STAC/CLAC user space access marking, even though
   that is now done by the wrapper macros, see commit 11f1a4b975
   ("x86: reorganize SMAP handling in user space accesses").

   This didn't result in a semantic error, it just means that the
   inlined optimized version was hugely less efficient than the
   allegedly slower standard version, since the CLAC/STAC overhead is
   quite high on modern Intel CPU's.

 - the double register %eax/%edx was marked as an output, but the %eax
   part of it was touched early in the asm, and could thus clobber other
   inputs to the asm that gcc didn't expect it to touch.

   In particular, that meant that the generated code could look like
   this:

        mov    (%eax),%eax
        mov    0x4(%eax),%edx

   where the load of %edx obviously was _supposed_ to be from the 32-bit
   word that followed the source of %eax, but because %eax was
   overwritten by the first instruction, the source of %edx was
   basically random garbage.

The fixes are trivial: remove the extraneous STAC/CLAC entries, and mark
the 64-bit output as early-clobber to let gcc know that no inputs should
alias with the output register.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
5bbaf88d79 KVM: X86: Fix read out-of-bounds vulnerability in kvm pio emulation
commit cbfc6c9184 upstream.

Huawei folks reported a read out-of-bounds vulnerability in kvm pio emulation.

- "inb" instruction to access PIT Mod/Command register (ioport 0x43, write only,
  a read should be ignored) in guest can get a random number.
- "rep insb" instruction to access PIT register port 0x43 can control memcpy()
  in emulator_pio_in_emulated() to copy max 0x400 bytes but only read 1 bytes,
  which will disclose the unimportant kernel memory in host but no crash.

The similar test program below can reproduce the read out-of-bounds vulnerability:

void hexdump(void *mem, unsigned int len)
{
        unsigned int i, j;

        for(i = 0; i < len + ((len % HEXDUMP_COLS) ? (HEXDUMP_COLS - len % HEXDUMP_COLS) : 0); i++)
        {
                /* print offset */
                if(i % HEXDUMP_COLS == 0)
                {
                        printf("0x%06x: ", i);
                }

                /* print hex data */
                if(i < len)
                {
                        printf("%02x ", 0xFF & ((char*)mem)[i]);
                }
                else /* end of block, just aligning for ASCII dump */
                {
                        printf("   ");
                }

                /* print ASCII dump */
                if(i % HEXDUMP_COLS == (HEXDUMP_COLS - 1))
                {
                        for(j = i - (HEXDUMP_COLS - 1); j <= i; j++)
                        {
                                if(j >= len) /* end of block, not really printing */
                                {
                                        putchar(' ');
                                }
                                else if(isprint(((char*)mem)[j])) /* printable char */
                                {
                                        putchar(0xFF & ((char*)mem)[j]);
                                }
                                else /* other char */
                                {
                                        putchar('.');
                                }
                        }
                        putchar('\n');
                }
        }
}

int main(void)
{
	int i;
	if (iopl(3))
	{
		err(1, "set iopl unsuccessfully\n");
		return -1;
	}
	static char buf[0x40];

	/* test ioport 0x40,0x41,0x42,0x43,0x44,0x45 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x40, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x41, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x42, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x43, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x44, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("mov $0x45, %rdx;");
	asm volatile ("in %dx,%al;");
	asm volatile ("stosb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");

	/* ins port 0x40 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x20, %rcx;");
	asm volatile ("mov $0x40, %rdx;");
	asm volatile ("rep insb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");

	/* ins port 0x43 */

	memset(buf, 0xab, sizeof(buf));

	asm volatile("push %rdi;");
	asm volatile("mov %0, %%rdi;"::"q"(buf));

	asm volatile ("mov $0x20, %rcx;");
	asm volatile ("mov $0x43, %rdx;");
	asm volatile ("rep insb;");

	asm volatile ("pop %rdi;");
	hexdump(buf, 0x40);

	printf("\n");
	return 0;
}

The vcpu->arch.pio_data buffer is used by both in/out instrutions emulation
w/o clear after using which results in some random datas are left over in
the buffer. Guest reads port 0x43 will be ignored since it is write only,
however, the function kernel_pio() can't distigush this ignore from successfully
reads data from device's ioport. There is no new data fill the buffer from
port 0x43, however, emulator_pio_in_emulated() will copy the stale data in
the buffer to the guest unconditionally. This patch fixes it by clearing the
buffer before in instruction emulation to avoid to grant guest the stale data
in the buffer.

In addition, string I/O is not supported for in kernel device. So there is no
iteration to read ioport %RCX times for string I/O. The function kernel_pio()
just reads one round, and then copy the io size * %RCX to the guest unconditionally,
actually it copies the one round ioport data w/ other random datas which are left
over in the vcpu->arch.pio_data buffer to the guest. This patch fixes it by
introducing the string I/O support for in kernel device in order to grant the right
ioport datas to the guest.

Before the patch:

0x000000: fe 38 93 93 ff ff ab ab .8......
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: f6 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 4d 51 30 30 ....MQ00
0x000018: 30 30 20 33 20 20 20 20 00 3
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

After the patch:

0x000000: 1e 02 f8 00 ff ff ab ab ........
0x000008: ab ab ab ab ab ab ab ab ........
0x000010: ab ab ab ab ab ab ab ab ........
0x000018: ab ab ab ab ab ab ab ab ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: d2 e2 d2 df d2 db d2 d7 ........
0x000008: d2 d3 d2 cf d2 cb d2 c7 ........
0x000010: d2 c4 d2 c0 d2 bc d2 b8 ........
0x000018: d2 b4 d2 b0 d2 ac d2 a8 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

0x000000: 00 00 00 00 00 00 00 00 ........
0x000008: 00 00 00 00 00 00 00 00 ........
0x000010: 00 00 00 00 00 00 00 00 ........
0x000018: 00 00 00 00 00 00 00 00 ........
0x000020: ab ab ab ab ab ab ab ab ........
0x000028: ab ab ab ab ab ab ab ab ........
0x000030: ab ab ab ab ab ab ab ab ........
0x000038: ab ab ab ab ab ab ab ab ........

Reported-by: Moguofang <moguofang@huawei.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Moguofang <moguofang@huawei.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
b67dcf8cf8 KVM: x86: Fix potential preemption when get the current kvmclock timestamp
commit e2c2206a18 upstream.

 BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/2809
 caller is __this_cpu_preempt_check+0x13/0x20
 CPU: 2 PID: 2809 Comm: qemu-system-x86 Not tainted 4.11.0+ #13
 Call Trace:
  dump_stack+0x99/0xce
  check_preemption_disabled+0xf5/0x100
  __this_cpu_preempt_check+0x13/0x20
  get_kvmclock_ns+0x6f/0x110 [kvm]
  get_time_ref_counter+0x5d/0x80 [kvm]
  kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
  ? kvm_hv_process_stimers+0x2a1/0x8a0 [kvm]
  ? kvm_arch_vcpu_ioctl_run+0xac9/0x1ce0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x5bf/0x1ce0 [kvm]
  kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
  ? __fget+0xf3/0x210
  do_vfs_ioctl+0xa4/0x700
  ? __fget+0x114/0x210
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc2
 RIP: 0033:0x7f9d164ed357
  ? __this_cpu_preempt_check+0x13/0x20

This can be reproduced by run kvm-unit-tests/hyperv_stimer.flat w/
CONFIG_PREEMPT and CONFIG_DEBUG_PREEMPT enabled.

Safe access to per-CPU data requires a couple of constraints, though: the
thread working with the data cannot be preempted and it cannot be migrated
while it manipulates per-CPU variables. If the thread is preempted, the
thread that replaces it could try to work with the same variables; migration
to another CPU could also cause confusion. However there is no preemption
disable when reads host per-CPU tsc rate to calculate the current kvmclock
timestamp.

This patch fixes it by utilizing get_cpu/put_cpu pair to guarantee both
__this_cpu_read() and rdtsc() are not preempted.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
8223e8f994 KVM: x86: Fix load damaged SSEx MXCSR register
commit a575813bfe upstream.

Reported by syzkaller:

   BUG: unable to handle kernel paging request at ffffffffc07f6a2e
   IP: report_bug+0x94/0x120
   PGD 348e12067
   P4D 348e12067
   PUD 348e14067
   PMD 3cbd84067
   PTE 80000003f7e87161

   Oops: 0003 [#1] SMP
   CPU: 2 PID: 7091 Comm: kvm_load_guest_ Tainted: G           OE   4.11.0+ #8
   task: ffff92fdfb525400 task.stack: ffffbda6c3d04000
   RIP: 0010:report_bug+0x94/0x120
   RSP: 0018:ffffbda6c3d07b20 EFLAGS: 00010202
    do_trap+0x156/0x170
    do_error_trap+0xa3/0x170
    ? kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? mark_held_locks+0x79/0xa0
    ? retint_kernel+0x10/0x10
    ? trace_hardirqs_off_thunk+0x1a/0x1c
    do_invalid_op+0x20/0x30
    invalid_op+0x1e/0x30
   RIP: 0010:kvm_load_guest_fpu.part.175+0x12a/0x170 [kvm]
    ? kvm_load_guest_fpu.part.175+0x1c/0x170 [kvm]
    kvm_arch_vcpu_ioctl_run+0xed6/0x1b70 [kvm]
    kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? kvm_vcpu_ioctl+0x384/0x780 [kvm]
    ? sched_clock+0x13/0x20
    ? __do_page_fault+0x2a0/0x550
    do_vfs_ioctl+0xa4/0x700
    ? up_read+0x1f/0x40
    ? __do_page_fault+0x2a0/0x550
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x23/0xc2

SDM mentioned that "The MXCSR has several reserved bits, and attempting to write
a 1 to any of these bits will cause a general-protection exception(#GP) to be
generated". The syzkaller forks' testcase overrides xsave area w/ random values
and steps on the reserved bits of MXCSR register. The damaged MXCSR register
values of guest will be restored to SSEx MXCSR register before vmentry. This
patch fixes it by catching userspace override MXCSR register reserved bits w/
random values and bails out immediately.

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
efa8cd1e2f ima: accept previously set IMA_NEW_FILE
commit 1ac202e978 upstream.

Modifying the attributes of a file makes ima_inode_post_setattr reset
the IMA cache flags. So if the file, which has just been created,
is opened a second time before the first file descriptor is closed,
verification fails since the security.ima xattr has not been written
yet. We therefore have to look at the IMA_NEW_FILE even if the file
already existed.

With this patch there should no longer be an error when cat tries to
open testfile:

$ rm -f testfile
$ ( echo test >&3 ; touch testfile ; cat testfile ) 3>testfile

A file being new is no reason to accept that it is missing a digital
signature demanded by the policy.

Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
d5811285b5 mwifiex: pcie: fix cmd_buf use-after-free in remove/reset
commit 3c8cb9ad03 upstream.

Command buffers (skb's) are allocated by the main driver, and freed upon
the last use. That last use is often in mwifiex_free_cmd_buffer(). In
the meantime, if the command buffer gets used by the PCI driver, we map
it as DMA-able, and store the mapping information in the 'cb' memory.

However, if a command was in-flight when resetting the device (and
therefore was still mapped), we don't get a chance to unmap this memory
until after the core has cleaned up its command handling.

Let's keep a refcount within the PCI driver, so we ensure the memory
only gets freed after we've finished unmapping it.

Noticed by KASAN when forcing a reset via:

  echo 1 > /sys/bus/pci/.../reset

The same code path can presumably be exercised in remove() and
shutdown().

[  205.390377] mwifiex_pcie 0000:01:00.0: info: shutdown mwifiex...
[  205.400393] ==================================================================
[  205.407719] BUG: KASAN: use-after-free in mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie] at addr ffffffc0ad471b28
[  205.419040] Read of size 16 by task bash/1913
[  205.423421] =============================================================================
[  205.431625] BUG skbuff_head_cache (Tainted: G    B          ): kasan: bad access detected
[  205.439815] -----------------------------------------------------------------------------
[  205.439815]
[  205.449534] INFO: Allocated in __build_skb+0x48/0x114 age=1311 cpu=4 pid=1913
[  205.456709] 	alloc_debug_processing+0x124/0x178
[  205.461282] 	___slab_alloc.constprop.58+0x528/0x608
[  205.466196] 	__slab_alloc.isra.54.constprop.57+0x44/0x54
[  205.471542] 	kmem_cache_alloc+0xcc/0x278
[  205.475497] 	__build_skb+0x48/0x114
[  205.479019] 	__netdev_alloc_skb+0xe0/0x170
[  205.483244] 	mwifiex_alloc_cmd_buffer+0x68/0xdc [mwifiex]
[  205.488759] 	mwifiex_init_fw+0x40/0x6cc [mwifiex]
[  205.493584] 	_mwifiex_fw_dpc+0x158/0x520 [mwifiex]
[  205.498491] 	mwifiex_reinit_sw+0x2c4/0x398 [mwifiex]
[  205.503510] 	mwifiex_pcie_reset_notify+0x114/0x15c [mwifiex_pcie]
[  205.509643] 	pci_reset_notify+0x5c/0x6c
[  205.513519] 	pci_reset_function+0x6c/0x7c
[  205.517567] 	reset_store+0x68/0x98
[  205.521003] 	dev_attr_store+0x54/0x60
[  205.524705] 	sysfs_kf_write+0x9c/0xb0
[  205.528413] INFO: Freed in __kfree_skb+0xb0/0xbc age=131 cpu=4 pid=1913
[  205.535064] 	free_debug_processing+0x264/0x370
[  205.539550] 	__slab_free+0x84/0x40c
[  205.543075] 	kmem_cache_free+0x1c8/0x2a0
[  205.547030] 	__kfree_skb+0xb0/0xbc
[  205.550465] 	consume_skb+0x164/0x178
[  205.554079] 	__dev_kfree_skb_any+0x58/0x64
[  205.558304] 	mwifiex_free_cmd_buffer+0xa0/0x158 [mwifiex]
[  205.563817] 	mwifiex_shutdown_drv+0x578/0x5c4 [mwifiex]
[  205.569164] 	mwifiex_shutdown_sw+0x178/0x310 [mwifiex]
[  205.574353] 	mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
[  205.580398] 	pci_reset_notify+0x5c/0x6c
[  205.584274] 	pci_dev_save_and_disable+0x24/0x6c
[  205.588837] 	pci_reset_function+0x30/0x7c
[  205.592885] 	reset_store+0x68/0x98
[  205.596324] 	dev_attr_store+0x54/0x60
[  205.600017] 	sysfs_kf_write+0x9c/0xb0
...
[  205.800488] Call trace:
[  205.802980] [<ffffffc00020a69c>] dump_backtrace+0x0/0x190
[  205.808415] [<ffffffc00020a96c>] show_stack+0x20/0x28
[  205.813506] [<ffffffc0005d020c>] dump_stack+0xa4/0xcc
[  205.818598] [<ffffffc0003be44c>] print_trailer+0x158/0x168
[  205.824120] [<ffffffc0003be5f0>] object_err+0x4c/0x5c
[  205.829210] [<ffffffc0003c45bc>] kasan_report+0x334/0x500
[  205.834641] [<ffffffc0003c3994>] check_memory_region+0x20/0x14c
[  205.840593] [<ffffffc0003c3b14>] __asan_loadN+0x14/0x1c
[  205.845879] [<ffffffbffc46171c>] mwifiex_unmap_pci_memory.isra.14+0x4c/0x100 [mwifiex_pcie]
[  205.854282] [<ffffffbffc461864>] mwifiex_pcie_delete_cmdrsp_buf+0x94/0xa8 [mwifiex_pcie]
[  205.862421] [<ffffffbffc462028>] mwifiex_pcie_free_buffers+0x11c/0x158 [mwifiex_pcie]
[  205.870302] [<ffffffbffc4620d4>] mwifiex_pcie_down_dev+0x70/0x80 [mwifiex_pcie]
[  205.877736] [<ffffffbffc1397a8>] mwifiex_shutdown_sw+0x190/0x310 [mwifiex]
[  205.884658] [<ffffffbffc4606b4>] mwifiex_pcie_reset_notify+0xd4/0x15c [mwifiex_pcie]
[  205.892446] [<ffffffc000635f54>] pci_reset_notify+0x5c/0x6c
[  205.898048] [<ffffffc00063a044>] pci_dev_save_and_disable+0x24/0x6c
[  205.904350] [<ffffffc00063cf0c>] pci_reset_function+0x30/0x7c
[  205.910134] [<ffffffc000641118>] reset_store+0x68/0x98
[  205.915312] [<ffffffc000771588>] dev_attr_store+0x54/0x60
[  205.920750] [<ffffffc00046f53c>] sysfs_kf_write+0x9c/0xb0
[  205.926182] [<ffffffc00046dfb0>] kernfs_fop_write+0x184/0x1f8
[  205.931963] [<ffffffc0003d64f4>] __vfs_write+0x6c/0x17c
[  205.937221] [<ffffffc0003d7164>] vfs_write+0xf0/0x1c4
[  205.942310] [<ffffffc0003d7da0>] SyS_write+0x78/0xd8
[  205.947312] [<ffffffc000204634>] el0_svc_naked+0x24/0x28
...
[  205.998268] ==================================================================

This bug has been around in different forms for a while. It was sort of
noticed in commit 955ab095c5 ("mwifiex: Do not kfree cmd buf while
unregistering PCIe"), but it just fixed the double-free, without
acknowledging the potential for use-after-free.

Fixes: fc33146090 ("mwifiex: use pci_alloc/free_consistent APIs for PCIe")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
a32a0c8285 mwifiex: MAC randomization should not be persistent
commit 7e2f18f064 upstream.

nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
request that should be randomized; the absence of such a flag means we
should not randomize. However, mwifiex was stashing the latest
randomization request and *always* using it for future scans, even those
that didn't set the flag.

Let's zero out the randomization info whenever we get a scan request
without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
priv->random_mac entirely (and plumb the randomization MAC properly
through the call sequence), but the spaghetti is a little difficult to
unravel here for me.

Fixes: c2a8f0ff9c ("mwifiex: support random MAC address for scanning")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
cf6df2bcd4 rtlwifi: rtl8821ae: setup 8812ae RFE according to device type
commit 46cfa2148e upstream.

Current channel switch implementation sets 8812ae RFE reg value assuming
that device always has type 2.

Extend possible RFE types set and write corresponding reg values.

Source for new code is
http://dlcdnet.asus.com/pub/ASUS/wireless/PCE-AC51/DR_PCE_AC51_20232801152016.zip

Signed-off-by: Maxim Samoylov <max7255@gmail.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Yan-Hsuan Chuang <yhchuang@realtek.com>
Cc: Pkshih <pkshih@realtek.com>
Cc: Birming Chiu <birming@realtek.com>
Cc: Shaofu <shaofu@realtek.com>
Cc: Steven Ting <steventing@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:13 +02:00
699ae09634 md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop
commit 065e519e71 upstream.

if called md_set_readonly and set MD_CLOSING bit, the mddev cannot
be opened any more due to the MD_CLOING bit wasn't cleared. Thus it
needs to be cleared in md_ioctl after any call to md_set_readonly()
or do_md_stop().

Signed-off-by: NeilBrown <neilb@suse.com>
Fixes: af8d8e6f03 ("md: changes for MD_STILL_CLOSED flag")
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
1800fde309 md: update slab_cache before releasing new stripes when stripes resizing
commit 583da48e38 upstream.

When growing raid5 device on machine with small memory, there is chance that
mdadm will be killed and the following bug report can be observed. The same
bug could also be reproduced in linux-4.10.6.

[57600.075774] BUG: unable to handle kernel NULL pointer dereference at           (null)
[57600.083796] IP: [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.110378] PGD 421cf067 PUD 4442d067 PMD 0
[57600.114678] Oops: 0002 [#1] SMP
[57600.180799] CPU: 1 PID: 25990 Comm: mdadm Tainted: P           O    4.2.8 #1
[57600.187849] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS QV05AR66 03/06/2013
[57600.197490] task: ffff880044e47240 ti: ffff880043070000 task.ti: ffff880043070000
[57600.204963] RIP: 0010:[<ffffffff81a6aa87>]  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.213057] RSP: 0018:ffff880043073810  EFLAGS: 00010046
[57600.218359] RAX: 0000000000000000 RBX: 000000000000000c RCX: ffff88011e296dd0
[57600.225486] RDX: 0000000000000001 RSI: ffffe8ffffcb46c0 RDI: 0000000000000000
[57600.232613] RBP: ffff880043073878 R08: ffff88011e5f8170 R09: 0000000000000282
[57600.239739] R10: 0000000000000005 R11: 28f5c28f5c28f5c3 R12: ffff880043073838
[57600.246872] R13: ffffe8ffffcb46c0 R14: 0000000000000000 R15: ffff8800b9706a00
[57600.253999] FS:  00007f576106c700(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
[57600.262078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[57600.267817] CR2: 0000000000000000 CR3: 00000000428fe000 CR4: 00000000001406e0
[57600.274942] Stack:
[57600.276949]  ffffffff8114ee35 ffff880043073868 0000000000000282 000000000000eb3f
[57600.284383]  ffffffff81119043 ffff880043073838 ffff880043073838 ffff88003e197b98
[57600.291820]  ffffe8ffffcb46c0 ffff88003e197360 0000000000000286 ffff880043073968
[57600.299254] Call Trace:
[57600.301698]  [<ffffffff8114ee35>] ? cache_flusharray+0x35/0xe0
[57600.307523]  [<ffffffff81119043>] ? __page_cache_release+0x23/0x110
[57600.313779]  [<ffffffff8114eb53>] kmem_cache_free+0x63/0xc0
[57600.319344]  [<ffffffff81579942>] drop_one_stripe+0x62/0x90
[57600.324915]  [<ffffffff81579b5b>] raid5_cache_scan+0x8b/0xb0
[57600.330563]  [<ffffffff8111b98a>] shrink_slab.part.36+0x19a/0x250
[57600.336650]  [<ffffffff8111e38c>] shrink_zone+0x23c/0x250
[57600.342039]  [<ffffffff8111e4f3>] do_try_to_free_pages+0x153/0x420
[57600.348210]  [<ffffffff8111e851>] try_to_free_pages+0x91/0xa0
[57600.353959]  [<ffffffff811145b1>] __alloc_pages_nodemask+0x4d1/0x8b0
[57600.360303]  [<ffffffff8157a30b>] check_reshape+0x62b/0x770
[57600.365866]  [<ffffffff8157a4a5>] raid5_check_reshape+0x55/0xa0
[57600.371778]  [<ffffffff81583df7>] update_raid_disks+0xc7/0x110
[57600.377604]  [<ffffffff81592b73>] md_ioctl+0xd83/0x1b10
[57600.382827]  [<ffffffff81385380>] blkdev_ioctl+0x170/0x690
[57600.388307]  [<ffffffff81195238>] block_ioctl+0x38/0x40
[57600.393525]  [<ffffffff811731c5>] do_vfs_ioctl+0x2b5/0x480
[57600.399010]  [<ffffffff8115e07b>] ? vfs_write+0x14b/0x1f0
[57600.404400]  [<ffffffff811733cc>] SyS_ioctl+0x3c/0x70
[57600.409447]  [<ffffffff81a6ad97>] entry_SYSCALL_64_fastpath+0x12/0x6a
[57600.415875] Code: 00 00 00 00 55 48 89 e5 8b 07 85 c0 74 04 31 c0 5d c3 ba 01 00 00 00 f0 0f b1 17 85 c0 75 ef b0 01 5d c3 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 85 d1 63 ff 5d
[57600.435460] RIP  [<ffffffff81a6aa87>] _raw_spin_lock+0x7/0x20
[57600.441208]  RSP <ffff880043073810>
[57600.444690] CR2: 0000000000000000
[57600.448000] ---[ end trace cbc6b5cc4bf9831d ]---

The problem is that resize_stripes() releases new stripe_heads before assigning new
slab cache to conf->slab_cache. If the shrinker function raid5_cache_scan() gets called
after resize_stripes() starting releasing new stripes but right before new slab cache
being assigned, it is possible that these new stripe_heads will be freed with the old
slab_cache which was already been destoryed and that triggers this bug.

Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Fixes: edbe83ab4c ("md/raid5: allow the stripe_cache to grow and shrink.")
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
d7252903e2 dm space map disk: fix some book keeping in the disk space map
commit 0377a07c7a upstream.

When decrementing the reference count for a block, the free count wasn't
being updated if the reference count went to zero.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
f568a10289 dm thin metadata: call precommit before saving the roots
commit 91bcdb92d3 upstream.

These calls were the wrong way round in __write_initial_superblock.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
17cedf3ba1 dm bufio: make the parameter "retain_bytes" unsigned long
commit 13840d3801 upstream.

Change the type of the parameter "retain_bytes" from unsigned to
unsigned long, so that on 64-bit machines the user can set more than
4GiB of data to be retained.

Also, change the type of the variable "count" in the function
"__evict_old_buffers" to unsigned long.  The assignment
"count = c->n_buffers[LIST_CLEAN] + c->n_buffers[LIST_DIRTY];"
could result in unsigned long to unsigned overflow and that could result
in buffers not being freed when they should.

While at it, avoid division in get_retain_buffers().  Division is slow,
we can change it to shift because we have precalculated the log2 of
block size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
2e80638cee dm cache metadata: fail operations if fail_io mode has been established
commit 10add84e27 upstream.

Otherwise it is possible to trigger crashes due to the metadata being
inaccessible yet these methods don't safely account for that possibility
without these checks.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
a5b9afd690 dm mpath: delay requeuing while path initialization is in progress
commit c1d7ecf7ca upstream.

Requeuing a request immediately while path initialization is ongoing
causes high CPU usage, something that is undesired.  Hence delay
requeuing while path initialization is in progress.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
5087ded2bb dm mpath: avoid that path removal can trigger an infinite loop
commit 7083abbbfc upstream.

If blk_get_request() fails, check whether the failure is due to a path
being removed.  If that is the case, fail the path by triggering a call
to fail_path().  This avoids that the following scenario can be
encountered while removing paths:
* CPU usage of a kworker thread jumps to 100%.
* Removing the DM device becomes impossible.

Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK,
and the queue is not dying, because in these cases immediate requeuing
is inappropriate.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:12 +02:00
5a09414335 dm mpath: split and rename activate_path() to prepare for its expanded use
commit 89bfce763e upstream.

activate_path() is renamed to activate_path_work() which now calls
activate_or_offline_path().  activate_or_offline_path() will be used
by the next commit.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
4ff00db718 dm mpath: requeue after a small delay if blk_get_request() fails
commit 06eb061f48 upstream.

If blk_get_request() returns ENODEV then multipath_clone_and_map()
causes a request to be requeued immediately. This can cause a kworker
thread to spend 100% of the CPU time of a single core in
__blk_mq_run_hw_queue() and also can cause device removal to never
finish.

Avoid this by only requeuing after a delay if blk_get_request() fails.
Additionally, reduce the requeue delay.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
7d67cd026d dm bufio: check new buffer allocation watermark every 30 seconds
commit 390020ad2a upstream.

dm-bufio checks a watermark when it allocates a new buffer in
__bufio_new().  However, it doesn't check the watermark when the user
changes /sys/module/dm_bufio/parameters/max_cache_size_bytes.

This may result in a problem - if the watermark is high enough so that
all possible buffers are allocated and if the user lowers the value of
"max_cache_size_bytes", the watermark will never be checked against the
new value because no new buffer would be allocated.

To fix this, change __evict_old_buffers() so that it checks the
watermark.  __evict_old_buffers() is called every 30 seconds, so if the
user reduces "max_cache_size_bytes", dm-bufio will react to this change
within 30 seconds and decrease memory consumption.

Depends-on: 1b0fb5a5b2 ("dm bufio: avoid a possible ABBA deadlock")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
2aac21e001 dm bufio: avoid a possible ABBA deadlock
commit 1b0fb5a5b2 upstream.

__get_memory_limit() tests if dm_bufio_cache_size changed and calls
__cache_size_refresh() if it did.  It takes dm_bufio_clients_lock while
it already holds the client lock.  However, lock ordering is violated
because in cleanup_old_buffers() dm_bufio_clients_lock is taken before
the client lock.

This results in a possible deadlock and lockdep engine warning.

Fix this deadlock by changing mutex_lock() to mutex_trylock().  If the
lock can't be taken, it will be re-checked next time when a new buffer
is allocated.

Also add "unlikely" to the if condition, so that the optimizer assumes
that the condition is false.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
539ee78649 dm raid: select the Kconfig option CONFIG_MD_RAID0
commit 7b81ef8b14 upstream.

Since the commit 0cf4503174 ("dm raid: add support for the MD RAID0
personality"), the dm-raid subsystem can activate a RAID-0 array.
Therefore, add MD_RAID0 to the dependencies of DM_RAID, so that MD_RAID0
will be selected when DM_RAID is selected.

Fixes: 0cf4503174 ("dm raid: add support for the MD RAID0 personality")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
11dfdb1a46 dm btree: fix for dm_btree_find_lowest_key()
commit 7d1fedb6e9 upstream.

dm_btree_find_lowest_key() is giving incorrect results.  find_key()
traverses the btree correctly for finding the highest key, but there is
an error in the way it traverses the btree for retrieving the lowest
key.  dm_btree_find_lowest_key() fetches the first key of the rightmost
block of the btree instead of fetching the first key from the leftmost
block.

Fix this by conditionally passing the correct parameter to value64()
based on the @find_highest flag.

Signed-off-by: Erez Zadok <ezk@fsl.cs.sunysb.edu>
Signed-off-by: Vinothkumar Raja <vinraja@cs.stonybrook.edu>
Signed-off-by: Nidhi Panpalia <npanpalia@cs.stonybrook.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
cd21f4af21 infiniband: call ipv6 route lookup via the stub interface
commit eea40b8f62 upstream.

The infiniband address handle can be triggered to resolve an ipv6
address in response to MAD packets, regardless of the ipv6
module being disabled via the kernel command line argument.

That will cause a call into the ipv6 routing code, which is not
initialized, and a conseguent oops.

This commit addresses the above issue replacing the direct lookup
call with an indirect one via the ipv6 stub, which is properly
initialized according to the ipv6 status (e.g. if ipv6 is
disabled, the routing lookup fails gracefully)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
3992c93b74 mlx5: Fix mlx5_ib_map_mr_sg mr length
commit 0a49f2c31c upstream.

In case we got an initial sg_offset, we need to
account for it in the mr length.

Fixes: ff2ba99365 ("IB/core: Add passing an offset into the SG to ib_map_mr_sg")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:11 +02:00
cca9c04ad4 ASoC: cs4271: configure reset GPIO as output
commit 49b2e27ab9 upstream.

During reset "refactoring" the output configuration was lost.
This commit repairs sound on EDB93XX boards.

Fixes: 9a397f4 ("ASoC: cs4271: add regulator consumer support")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
a7caccf886 tpm: fix handling of the TPM 2.0 event logs
commit fd5c78694f upstream.

When TPM2 log has entries with more than 3 digests, or with digests
not listed in the log header, log gets misparsed, eventually
leading to kernel complaint that code tried to vmalloc 512MB of
memory (I have no idea what would happen on bigger system).

So code should not parse only first 3 digests: both event header
and event itself are already in memory, so we can parse any number
of digests, as long as we do not try to parse whole memory when
given count of 0xFFFFFFFF.

So this change:

* Rejects event entry with more digests than log header describes.
  Digest types should be unique, and all should be described in
  log header, so there cannot be more digests in the event than in
  the header.

* Reject event entry with digest that is not described in the
  log header.  In theory code could hardcode information about
  digest IDs already assigned by TCG, but if firmware authors
  cannot get event log format right, why should anyone believe
  that they got event log content right.

Fixes: 4d23cc323c ("tpm: add securityfs support for TPM 2.0 firmware event log")
Signed-off-by: Petr Vandrovec <petr@vmware.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
455edd3ceb vTPM: Fix missing NULL check
commit 31574d321c upstream.

The current code passes the address of tpm_chip as the argument to
dev_get_drvdata() without prior NULL check in
tpm_ibmvtpm_get_desired_dma.  This resulted an oops during kernel
boot when vTPM is enabled in Power partition configured in active
memory sharing mode.

The vio_driver's get_desired_dma() is called before the probe(), which
for vtpm is tpm_ibmvtpm_probe, and it's this latter function that
initializes the driver and set data.  Attempting to get data before
the probe() caused the problem.

This patch adds a NULL check to the tpm_ibmvtpm_get_desired_dma.

fixes: 9e0d39d8a6 ("tpm: Remove useless priv field in struct tpm_vendor_specific")
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkine <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
b06ad9f0bf tpm_crb: check for bad response size
commit 8569defde8 upstream.

Make sure size of response buffer is at least 6 bytes, or
we will underflow and pass large size_t to memcpy_fromio().
This was encountered while testing earlier version of
locality patchset.

Fixes: 30fc8d138e ("tpm: TPM 2.0 CRB Interface")
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
7cb54bfbd5 tpm: add sleep only for retry in i2c_nuvoton_write_status()
commit 0afb7118ae upstream.

Currently, there is an unnecessary 1 msec delay added in
i2c_nuvoton_write_status() for the successful case. This
function is called multiple times during send() and recv(),
which implies adding multiple extra delays for every TPM
operation.

This patch calls usleep_range() only if retry is to be done.

Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
6705f253bc tpm: msleep() delays - replace with usleep_range() in i2c nuvoton driver
commit a233a0289c upstream.

Commit 500462a9de "timers: Switch to a non-cascading wheel" replaced
the 'classic' timer wheel, which aimed for near 'exact' expiry of the
timers.  Their analysis was that the vast majority of timeout timers
are used as safeguards, not as real timers, and are cancelled or
rearmed before expiration.  The only exception noted to this were
networking timers with a small expiry time.

Not included in the analysis was the TPM polling timer, which resulted
in a longer normal delay and, every so often, a very long delay.  The
non-cascading wheel delay is based on CONFIG_HZ.  For a description of
the different rings and their delays, refer to the comments in
kernel/time/timer.c.

Below are the delays given for rings 0 - 2, which explains the longer
"normal" delays and the very, long delays as seen on systems with
CONFIG_HZ 250.

* HZ 1000 steps
 * Level Offset  Granularity            Range
 *  0      0         1 ms                0 ms - 63 ms
 *  1     64         8 ms               64 ms - 511 ms
 *  2    128        64 ms              512 ms - 4095 ms (512ms - ~4s)

* HZ  250
 * Level Offset  Granularity            Range
 *  0      0         4 ms                0 ms - 255 ms
 *  1     64        32 ms              256 ms - 2047 ms (256ms - ~2s)
 *  2    128       256 ms             2048 ms - 16383 ms (~2s - ~16s)

Below is a comparison of extending the TPM with 1000 measurements,
using msleep() vs. usleep_delay() when configured for 1000 hz vs. 250
hz, before and after commit 500462a9de.

linux-4.7 | msleep() usleep_range()
1000 hz: 0m44.628s | 1m34.497s 29.243s
250 hz: 1m28.510s | 4m49.269s 32.386s

linux-4.7  | min-max (msleep)  min-max (usleep_range)
1000 hz: 0:017 - 2:760s | 0:015 - 3:967s    0:014 - 0:418s
250 hz: 0:028 - 1:954s | 0:040 - 4:096s    0:016 - 0:816s

This patch replaces the msleep() with usleep_range() calls in the
i2c nuvoton driver with a consistent max range value.

Signed-of-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.vnet.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
84eef50e82 tpm_tis_spi: Add small delay after last transfer
commit 5cc0101d1f upstream.

Testing the implementation with a Raspberry Pi 2 showed that under some
circumstances its SPI master erroneously releases the CS line before the
transfer is complete, i.e. before the end of the last clock. In this case
the TPM ignores the transfer and misses for example the GO command. The
driver is unable to detect this communication problem and will wait for a
command response that is never going to arrive, timing out eventually.

As a workaround, the small delay ensures that the CS line is held long
enough, even with a faulty SPI master. Other SPI masters are not affected,
except for a negligible performance penalty.

Fixes: 0edbfea537 ("tpm/tpm_tis_spi: Add support for spi phy")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Benoit Houyere <benoit.houyere@st.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
32a6b61947 tpm_tis_spi: Remove limitation of transfers to MAX_SPI_FRAMESIZE bytes
commit 591e48c26c upstream.

Limiting transfers to MAX_SPI_FRAMESIZE was not expected by the upper
layers, as tpm_tis has no such limitation. Add a loop to hide that
limitation.

v2: Moved scope of spi_message to the top as requested by Jarkko
Fixes: 0edbfea537 ("tpm/tpm_tis_spi: Add support for spi phy")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Benoit Houyere <benoit.houyere@st.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:10 +02:00
e2fbfd47e3 tpm_tis_spi: Check correct byte for wait state indicator
commit e110cc69dc upstream.

Wait states are signaled in the last byte received from the TPM in
response to the header, not the first byte. Check rx_buf[3] instead of
rx_buf[0].

Fixes: 0edbfea537 ("tpm/tpm_tis_spi: Add support for spi phy")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Benoit Houyere <benoit.houyere@st.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
6b2a246ad0 tpm_tis_spi: Abort transfer when too many wait states are signaled
commit 975094ddc3 upstream.

Abort the transfer with ETIMEDOUT when the TPM signals more than
TPM_RETRY wait states. Continuing with the transfer in this state
will only lead to arbitrary failures in other parts of the code.

Fixes: 0edbfea537 ("tpm/tpm_tis_spi: Add support for spi phy")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Benoit Houyere <benoit.houyere@st.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
b0e2a93c28 tpm_tis_spi: Use single function to transfer data
commit f848f2143a upstream.

The algorithm for sending data to the TPM is mostly identical to the
algorithm for receiving data from the TPM, so a single function is
sufficient to handle both cases.

This is a prequisite for all the other fixes, so we don't have to fix
everything twice (send/receive)

v2: u16 instead of u8 for the length.
Fixes: 0edbfea537 ("tpm/tpm_tis_spi: Add support for spi phy")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Benoit Houyere <benoit.houyere@st.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
580c656a95 fanotify: don't expose EOPENSTALE to userspace
commit 4ff33aafd3 upstream.

When delivering an event to userspace for a file on an NFS share,
if the file is deleted on server side before user reads the event,
user will not get the event.

If the event queue contained several events, the stale event is
quietly dropped and read() returns to user with events read so far
in the buffer.

If the event queue contains a single stale event or if the stale
event is a permission event, read() returns to user with the kernel
internal error code 518 (EOPENSTALE), which is not a POSIX error code.

Check the internal return value -EOPENSTALE in fanotify_read(), just
the same as it is checked in path_openat() and drop the event in the
cases that it is not already dropped.

This is a reproducer from Marko Rauhamaa:

Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:

    ==============================================================
    NFS Server    NFS Client(1)     NFS Client(2)
    ==============================================================
    # echo foo >/nfsshare/bar.txt
                  # cat /nfsshare/bar.txt
                  foo
                                    # ./fantest /nfsshare
                                    Press enter key to terminate.
                                    Listening for events.
    # rm -f /nfsshare/bar.txt
                  # cat /nfsshare/bar.txt
                                    read: Unknown error 518
                  cat: /nfsshare/bar.txt: Operation not permitted
    ==============================================================

where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.

Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
826ca313fd ALSA: hda: Fix cpu lockup when stopping the cmd dmas
commit 960013762d upstream.

Using jiffies in hdac_wait_for_cmd_dmas() to determine when to time out
when interrupts are off (snd_hdac_bus_stop_cmd_io()/spin_lock_irq())
causes hard lockup so unlock while waiting using jiffies.

---<-snip->---
<0>[ 1211.603046] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
<4>[ 1211.603047] Modules linked in: snd_hda_intel i915 vgem
<4>[ 1211.603053] irq event stamp: 13366
<4>[ 1211.603053] hardirqs last  enabled at (13365):
...
<4>[ 1211.603059] Call Trace:
<4>[ 1211.603059]  ? delay_tsc+0x3d/0xc0
<4>[ 1211.603059]  __delay+0xa/0x10
<4>[ 1211.603060]  __const_udelay+0x31/0x40
<4>[ 1211.603060]  snd_hdac_bus_stop_cmd_io+0x96/0xe0 [snd_hda_core]
<4>[ 1211.603060]  ? azx_dev_disconnect+0x20/0x20 [snd_hda_intel]
<4>[ 1211.603061]  snd_hdac_bus_stop_chip+0xb1/0x100 [snd_hda_core]
<4>[ 1211.603061]  azx_stop_chip+0x9/0x10 [snd_hda_codec]
<4>[ 1211.603061]  azx_suspend+0x72/0x220 [snd_hda_intel]
<4>[ 1211.603061]  pci_pm_suspend+0x71/0x140
<4>[ 1211.603062]  dpm_run_callback+0x6f/0x330
<4>[ 1211.603062]  ? pci_pm_freeze+0xe0/0xe0
<4>[ 1211.603062]  __device_suspend+0xf9/0x370
<4>[ 1211.603062]  ? dpm_watchdog_set+0x60/0x60
<4>[ 1211.603063]  async_suspend+0x1a/0x90
<4>[ 1211.603063]  async_run_entry_fn+0x34/0x160
<4>[ 1211.603063]  process_one_work+0x1f4/0x6d0
<4>[ 1211.603063]  ? process_one_work+0x16e/0x6d0
<4>[ 1211.603064]  worker_thread+0x49/0x4a0
<4>[ 1211.603064]  kthread+0x107/0x140
<4>[ 1211.603064]  ? process_one_work+0x6d0/0x6d0
<4>[ 1211.603065]  ? kthread_create_on_node+0x40/0x40
<4>[ 1211.603065]  ret_from_fork+0x2e/0x40

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100419
Fixes: 38b19ed7f8 ("ALSA: hda: fix to wait for RIRB & CORB DMA to set")
Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jeeja KP <jeeja.kp@intel.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
d2b8a861a6 tpm_tis_core: Choose appropriate timeout for reading burstcount
commit 302a6ad7fc upstream.

TIS v1.3 for TPM 1.2 and PTP for TPM 2.0 disagree about which timeout
value applies to reading a valid burstcount. It is TIMEOUT_D according to
TIS, but TIMEOUT_A according to PTP, so choose the appropriate value
depending on whether we deal with a TPM 1.2 or a TPM 2.0.

This is important since according to the PTP TIMEOUT_D is much smaller
than TIMEOUT_A. So the previous implementation could run into timeouts
with a TPM 2.0, even though the TPM was behaving perfectly fine.

During tpm2_probe TIMEOUT_D will be used even with a TPM 2.0, because
TPM_CHIP_FLAG_TPM2 is not yet set. This is fine, since the timeout values
will only be changed afterwards by tpm_get_timeouts. Until then
TIS_TIMEOUT_D_MAX applies, which is large enough.

Fixes: aec04cbdf7 ("tpm: TPM 2.0 FIFO Interface")
Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
b3bf0bbc1e USB: core: replace %p with %pK
commit 2f964780c0 upstream.

Format specifier %p can leak kernel addresses while not valuing the
kptr_restrict system settings. When kptr_restrict is set to (1), kernel
pointers printed using the %pK format specifier will be replaced with
Zeros. Debugging Note : &pK prints only Zeros as address. If you need
actual address information, write 0 to kptr_restrict.

echo 0 > /proc/sys/kernel/kptr_restrict

[Found by poking around in a random vendor kernel tree, it would be nice
if someone would actually send these types of patches upstream - gkh]

Signed-off-by: Vamsi Krishna Samavedam <vskrishn@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
28c7411cdb char: lp: fix possible integer overflow in lp_setup()
commit 3e21f4af17 upstream.

The lp_setup() code doesn't apply any bounds checking when passing
"lp=none", and only in this case, resulting in an overflow of the
parport_nr[] array. All versions in Git history are affected.

Reported-By: Roee Hay <roee.hay@hcl.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
37f84f8b99 watchdog: pcwd_usb: fix NULL-deref at probe
commit 46c319b848 upstream.

Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:09 +02:00
a646b2974a USB: ene_usb6250: fix DMA to the stack
commit 628c2893d4 upstream.

The ene_usb6250 sub-driver in usb-storage does USB I/O to buffers on
the stack, which doesn't work with vmapped stacks.  This patch fixes
the problem by allocating a separate 512-byte buffer at probe time and
using it for all of the offending I/O operations.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:08 +02:00
dff0ad3b9c usb: misc: legousbtower: Fix memory leak
commit 0bd193d62b upstream.

get_version_reply is not freed if function returns with success.

Fixes: 942a48730f ("usb: misc: legousbtower: Fix buffers on stack")
Reported-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:08 +02:00
cfd3c22c36 usb: misc: legousbtower: Fix buffers on stack
commit 942a48730f upstream.

Allocate buffers on HEAP instead of STACK for local structures
that are to be received using usb_control_msg().

Signed-off-by: Maksim Salau <maksim.salau@gmail.com>
Tested-by: Alfredo Rafael Vicente Boix <alviboi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-25 15:46:08 +02:00
02d8683735 Linux 4.11.2 2017-05-20 14:50:04 +02:00
bbc105f387 pstore: Shut down worker when unregistering
commit 6330d55347 upstream.

When built as a module and running with update_ms >= 0, pstore will Oops
during module unload since the work timer is still running. This makes sure
the worker is stopped before unloading.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
ed8834ea4f pstore: Use dynamic spinlock initializer
commit e9a330c428 upstream.

The per-prz spinlock should be using the dynamic initializer so that
lockdep can correctly track it. Without this, under lockdep, we get a
warning at boot that the lock is in non-static memory.

Fixes: 109704492e ("pstore: Make spinlock per zone instead of global")
Fixes: 76d5692a58 ("pstore: Correctly initialize spinlock and flags")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
f25c78c879 pstore: Fix flags to enable dumps on powerpc
commit 041939c1ec upstream.

After commit c950fd6f20 kernel registers pstore write based on flag set.
Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for
powerpc architecture. On panic, kernel doesn't write message to
/fs/pstore/dmesg*(Entry doesn't gets created at all).

This patch enables pstore write for powerpc architecture by setting
PSTORE_FLAGS_DMESG flag.

Fixes: c950fd6f20 ("pstore: Split pstore fragile flags")
Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
af0eb80e8e libnvdimm, pfn: fix 'npfns' vs section alignment
commit d5483feda8 upstream.

Fix failures to create namespaces due to the vmem_altmap not advertising
enough free space to store the memmap.

 WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
 [..]
 Call Trace:
  dump_stack+0x63/0x83
  __warn+0xcb/0xf0
  warn_slowpath_null+0x1d/0x20
  arch_add_memory+0xde/0xf0
  devm_memremap_pages+0x244/0x440
  pmem_attach_disk+0x37e/0x490 [nd_pmem]
  nd_pmem_probe+0x7e/0xa0 [nd_pmem]
  nvdimm_bus_probe+0x71/0x120 [libnvdimm]
  driver_probe_device+0x2bb/0x460
  bind_store+0x114/0x160
  drv_attr_store+0x25/0x30

In commit 658922e57b "libnvdimm, pfn: fix memmap reservation sizing"
we arranged for the capacity to be allocated, but failed to also update
the 'npfns' parameter. This leads to cases where there is enough
capacity reserved to hold all the allocated sections, but
vmemmap_populate_hugepages() still encounters -ENOMEM from
altmap_alloc_block_buf().

This fix is a stop-gap until we can teach the core memory hotplug
implementation to permit sub-section hotplug.

Fixes: 658922e57b ("libnvdimm, pfn: fix memmap reservation sizing")
Reported-by: Anisha Allada <anisha.allada@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
a3ff3ebdf3 libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
commit 452bae0aed upstream.

A debug patch to turn the standard device_lock() into something that
lockdep can analyze yielded the following:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0-rc4+ #106 Tainted: G           O
 -------------------------------------------------------
 lt-libndctl/1898 is trying to acquire lock:
  (&dev->nvdimm_mutex/3){+.+.+.}, at: [<ffffffffc023c948>] nd_attach_ndns+0x178/0x1b0 [libnvdimm]

 but task is already holding lock:
  (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffc022e0b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
        lock_acquire+0xf6/0x1f0
        __mutex_lock+0x88/0x980
        mutex_lock_nested+0x1b/0x20
        nvdimm_bus_lock+0x21/0x30 [libnvdimm]
        nvdimm_namespace_capacity+0x1b/0x40 [libnvdimm]
        nvdimm_namespace_common_probe+0x230/0x510 [libnvdimm]
        nd_pmem_probe+0x14/0x180 [nd_pmem]
        nvdimm_bus_probe+0xa9/0x260 [libnvdimm]

 -> #0 (&dev->nvdimm_mutex/3){+.+.+.}:
        __lock_acquire+0x1107/0x1280
        lock_acquire+0xf6/0x1f0
        __mutex_lock+0x88/0x980
        mutex_lock_nested+0x1b/0x20
        nd_attach_ndns+0x178/0x1b0 [libnvdimm]
        nd_namespace_store+0x308/0x3c0 [libnvdimm]
        namespace_store+0x87/0x220 [libnvdimm]

In this case '&dev->nvdimm_mutex/3' mirrors '&dev->mutex'.

Fix this by replacing the use of device_lock() with nvdimm_bus_lock() to protect
nd_{attach,detach}_ndns() operations.

Fixes: 8c2f7e8658 ("libnvdimm: infrastructure for btt devices")
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
de21b800a6 libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
commit b2518c78ce upstream.

The following BUG was observed when nd_pmem_notify() was called
for a BTT device.  The use of a pmem_device pointer is not valid
with BTT.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
 IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
 Call Trace:
  nd_device_notify+0x40/0x50
  child_notify+0x10/0x20
  device_for_each_child+0x50/0x90
  nd_region_notify+0x20/0x30
  nd_device_notify+0x40/0x50
  nvdimm_region_notify+0x27/0x30
  acpi_nfit_scrub+0x341/0x590 [nfit]
  process_one_work+0x197/0x450
  worker_thread+0x4e/0x4a0
  kthread+0x109/0x140

Fix nd_pmem_notify() by setting nd_region and badblocks pointers
properly for BTT.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 719994660c ("libnvdimm: async notification support")
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
d2572f5b01 libnvdimm, region: fix flush hint detection crash
commit bc042fdfbb upstream.

In the case where a dimm does not have any associated flush hints the
ndrd->flush_wpq array may be uninitialized leading to crashes with the
following signature:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
 IP: region_visible+0x10f/0x160 [libnvdimm]

 Call Trace:
  internal_create_group+0xbe/0x2f0
  sysfs_create_groups+0x40/0x80
  device_add+0x2d8/0x650
  nd_async_device_register+0x12/0x40 [libnvdimm]
  async_run_entry_fn+0x39/0x170
  process_one_work+0x212/0x6c0
  ? process_one_work+0x197/0x6c0
  worker_thread+0x4e/0x4a0
  kthread+0x10c/0x140
  ? process_one_work+0x6c0/0x6c0
  ? kthread_create_on_node+0x60/0x60
  ret_from_fork+0x31/0x40

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Fixes: f284a4f237 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
d517be5133 ipmi: Fix kernel panic at ipmi_ssif_thread()
commit 6de65fcfdb upstream.

msg_written_handler() may set ssif_info->multi_data to NULL
when using ipmitool to write fru.

Before setting ssif_info->multi_data to NULL, add new local
pointer "data_to_send" and store correct i2c data pointer to
it to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos.

Signed-off-by: Joeseph Chang <joechang@codeaurora.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:51 +02:00
8f0fde5bc9 libata: reject passthrough WRITE SAME requests
commit c6ade20f5e upstream.

The WRITE SAME to TRIM translation rewrites the DATA OUT buffer.  While
the SCSI code accomodates for this by passing a read-writable buffer
userspace applications don't cater for this behavior.  In fact it can
be used to rewrite e.g. a readonly file through mmap and should be
considered as a security fix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
a5938c093a cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()
commit a590b90d47 upstream.

cgroup_get() expected to be called only on live cgroups and triggers
warning on a dead cgroup; however, cgroup_sk_alloc() may be called
while cloning a socket which is left in an empty and removed cgroup
and thus may legitimately duplicate its reference on a dead cgroup.
This currently triggers the following warning spuriously.

 WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
 ...
  [<ffffffff8107e123>] __warn+0xd3/0xf0
  [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
  [<ffffffff810ff465>] cgroup_get+0x55/0x60
  [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
  [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
  [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
  [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
  [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
  [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
  [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
  [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
  [<ffffffff81837cb2>] ip6_input+0x32/0xb0
  [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
  [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
  [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
  [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
  [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
  [<ffffffff817787d8>] napi_gro_frags+0x208/0x270
  [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
  [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
  [<ffffffff81778b30>] net_rx_action+0x210/0x350
  [<ffffffff8188c426>] __do_softirq+0x106/0x2c7
  [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
  [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
  [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
  [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
  [<ffffffff8103edd1>] start_secondary+0xf1/0x100

This patch renames the existing cgroup_get() with the dead cgroup
warning to cgroup_get_live() after cgroup_kn_lock_live() and
introduces the new cgroup_get() which doesn't check whether the cgroup
is live or dead.

All existing cgroup_get() users except for cgroup_sk_alloc() are
converted to use cgroup_get_live().

Fixes: d979a39d72 ("cgroup: duplicate cgroup reference when cloning sockets")
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
740f485dae Bluetooth: hci_intel: add missing tty-device sanity check
commit dcb9cfaa5e upstream.

Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.

Fixes: 74cdad37cd ("Bluetooth: hci_intel: Add runtime PM support")
Fixes: 1ab1f239bf ("Bluetooth: hci_intel: Add support for platform driver")
Cc: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
a86af7983d Bluetooth: hci_bcm: add missing tty-device sanity check
commit 95065a61e9 upstream.

Make sure to check the tty-device pointer before looking up the sibling
platform device to avoid dereferencing a NULL-pointer when the tty is
one end of a Unix98 pty.

Fixes: 0395ffc1ee ("Bluetooth: hci_bcm: Add PM for BCM devices")
Cc: Frederic Danis <frederic.danis@linux.intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
ded39dd9f0 Bluetooth: Fix user channel for 32bit userspace on 64bit kernel
commit ab89f0bdd6 upstream.

Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being
defined as 0x80000000. This results in sendmsg failure if used from 32bit
userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT
in flags check in hci_sock_sendmsg.

Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marko Kiiskila <marko@runtime.io>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
01f9853326 tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44
commit 5a0722b898 upstream.

Define a new early console name for Qualcomm Datacenter Technologies
QDF2400 SOCs affected by erratum 44, instead of piggy-backing on "pl011".
Previously, to enable traditional (non-SPCR) earlycon, the documentation
said to specify "earlycon=pl011,<address>,qdf2400_e44", but the code was
broken and this didn't actually work.

So instead, the method for specifying the E44 work-around with traditional
earlycon is "earlycon=qdf2400_e44,<address>".  Both methods of earlycon
are now enabled with the same function.

Fixes: e53e597fd4 ("tty: pl011: fix earlycon work-around for QDF2400 erratum 44")
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
34e01f9207 tty: pty: Fix ldisc flush after userspace become aware of the data already
commit 77dae61344 upstream.

While using emacs, cat or others' commands in konsole with recent
kernels, I have met many times that CTRL-C freeze konsole. After
konsole freeze I can't type anything, then I have to open a new one,
it is very annoying.

See bug report:
https://bugs.kde.org/show_bug.cgi?id=175283

The platform in that bug report is Solaris, but now the pty in linux
has the same problem or the same behavior as Solaris :)

It has high possibility to trigger the problem follow steps below:
Note: In my test, BigFile is a text file whose size is bigger than 1G
1:open konsole
1:cat BigFile
2:CTRL-C

After some digging, I find out the reason is that commit 1d1d14da12
("pty: Fix buffer flush deadlock") changes the behavior of pty_flush_buffer.

Thread A                                 Thread B
--------                                 --------
1:n_tty_poll return POLLIN
                                         2:CTRL-C trigger pty_flush_buffer
                                             tty_buffer_flush
                                               n_tty_flush_buffer
3:attempt to check count of chars:
  ioctl(fd, TIOCINQ, &available)
  available is equal to 0

4:read(fd, buffer, avaiable)
  return 0

5:konsole close fd

Yes, I know we could use the same patch included in the BUG report as
a workaround for linux platform too. But I think the data in ldisc is
belong to application of another side, we shouldn't clear it when we
want to flush write buffer of this side in pty_flush_buffer. So I think
it is better to disable ldisc flush in pty_flush_buffer, because its new
hehavior bring no benefit except that it mess up the behavior between
POLLIN, and TIOCINQ or FIONREAD.

Also I find no flush_buffer function in others' tty driver has the
same behavior as current pty_flush_buffer.

Fixes: 1d1d14da12 ("pty: Fix buffer flush deadlock")
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
aad32798dc serial: omap: suspend device on probe errors
commit 77e6fe7fd2 upstream.

Make sure to actually suspend the device before returning after a failed
(or deferred) probe.

Note that autosuspend must be disabled before runtime pm is disabled in
order to balance the usage count due to a negative autosuspend delay as
well as to make the final put suspend the device synchronously.

Fixes: 388bc26226 ("omap-serial: Fix the error handling in the omap_serial probe")
Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
9e85b5c73a serial: omap: fix runtime-pm handling on unbind
commit 099bd73dc1 upstream.

An unbalanced and misplaced synchronous put was used to suspend the
device on driver unbind, something which with a likewise misplaced
pm_runtime_disable leads to external aborts when an open port is being
removed.

Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa024010
...
[<c046e760>] (serial_omap_set_mctrl) from [<c046a064>] (uart_update_mctrl+0x50/0x60)
[<c046a064>] (uart_update_mctrl) from [<c046a400>] (uart_shutdown+0xbc/0x138)
[<c046a400>] (uart_shutdown) from [<c046bd2c>] (uart_hangup+0x94/0x190)
[<c046bd2c>] (uart_hangup) from [<c045b760>] (__tty_hangup+0x404/0x41c)
[<c045b760>] (__tty_hangup) from [<c045b794>] (tty_vhangup+0x1c/0x20)
[<c045b794>] (tty_vhangup) from [<c046ccc8>] (uart_remove_one_port+0xec/0x260)
[<c046ccc8>] (uart_remove_one_port) from [<c046ef4c>] (serial_omap_remove+0x40/0x60)
[<c046ef4c>] (serial_omap_remove) from [<c04845e8>] (platform_drv_remove+0x34/0x4c)

Fix this up by resuming the device before deregistering the port and by
suspending and disabling runtime pm only after the port has been
removed.

Also make sure to disable autosuspend before disabling runtime pm so
that the usage count is balanced and device actually suspended before
returning.

Note that due to a negative autosuspend delay being set in probe, the
unbalanced put would actually suspend the device on first driver unbind,
while rebinding and again unbinding would result in a negative
power.usage_count.

Fixes: 7e9c8e7dbf ("serial: omap: make sure to suspend device before remove")
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:50 +02:00
cd823aa0bc serial: samsung: Add missing checks for dma_map_single failure
commit 500fcc08a3 upstream.

This patch adds missing checks for dma_map_single() failure and proper error
reporting. Although this issue was harmless on ARM architecture, it is always
good to use the DMA mapping API in a proper way. This patch fixes the following
DMA API debug warning:

WARNING: CPU: 1 PID: 3785 at lib/dma-debug.c:1171 check_unmap+0x8a0/0xf28
dma-pl330 121a0000.pdma: DMA-API: device driver failed to check map error[device address=0x000000006e0f9000] [size=4096 bytes] [mapped as single]
Modules linked in:
CPU: 1 PID: 3785 Comm: (agetty) Tainted: G        W       4.11.0-rc1-00137-g07ca963-dirty #59
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c072a114>] (check_unmap+0x8a0/0xf28)
[<c072a114>] (check_unmap) from [<c072a834>] (debug_dma_unmap_page+0x98/0xc8)
[<c072a834>] (debug_dma_unmap_page) from [<c0803874>] (s3c24xx_serial_shutdown+0x314/0x52c)
[<c0803874>] (s3c24xx_serial_shutdown) from [<c07f5124>] (uart_port_shutdown+0x54/0x88)
[<c07f5124>] (uart_port_shutdown) from [<c07f522c>] (uart_shutdown+0xd4/0x110)
[<c07f522c>] (uart_shutdown) from [<c07f6a8c>] (uart_hangup+0x9c/0x208)
[<c07f6a8c>] (uart_hangup) from [<c07c426c>] (__tty_hangup+0x49c/0x634)
[<c07c426c>] (__tty_hangup) from [<c07c78ac>] (tty_ioctl+0xc88/0x16e4)
[<c07c78ac>] (tty_ioctl) from [<c03b5f2c>] (do_vfs_ioctl+0xc4/0xd10)
[<c03b5f2c>] (do_vfs_ioctl) from [<c03b6bf4>] (SyS_ioctl+0x7c/0x8c)
[<c03b6bf4>] (SyS_ioctl) from [<c010b4a0>] (ret_fast_syscall+0x0/0x3c)

Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Fixes: 62c37eedb7 ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
43a1c1ff12 serial: samsung: Use right device for DMA-mapping calls
commit 768d64f491 upstream.

Driver should provide its own struct device for all DMA-mapping calls instead
of extracting device pointer from DMA engine channel. Although this is harmless
from the driver operation perspective on ARM architecture, it is always good
to use the DMA mapping API in a proper way. This patch fixes following DMA API
debug warning:

WARNING: CPU: 0 PID: 0 at lib/dma-debug.c:1241 check_sync+0x520/0x9f4
samsung-uart 12c20000.serial: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006df0f580] [size=64 bytes]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-00137-g07ca963 #51
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[<c011aaa4>] (unwind_backtrace) from [<c01127c0>] (show_stack+0x20/0x24)
[<c01127c0>] (show_stack) from [<c06ba5d8>] (dump_stack+0x84/0xa0)
[<c06ba5d8>] (dump_stack) from [<c0139528>] (__warn+0x14c/0x180)
[<c0139528>] (__warn) from [<c01395a4>] (warn_slowpath_fmt+0x48/0x50)
[<c01395a4>] (warn_slowpath_fmt) from [<c0729058>] (check_sync+0x520/0x9f4)
[<c0729058>] (check_sync) from [<c072967c>] (debug_dma_sync_single_for_device+0x88/0xc8)
[<c072967c>] (debug_dma_sync_single_for_device) from [<c0803c10>] (s3c24xx_serial_start_tx_dma+0x100/0x2f8)
[<c0803c10>] (s3c24xx_serial_start_tx_dma) from [<c0804338>] (s3c24xx_serial_tx_chars+0x198/0x33c)

Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Fixes: 62c37eedb7 ("serial: samsung: add dma reqest/release functions")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
ab62f4118b fscrypt: avoid collisions when presenting long encrypted filenames
commit 6b06cdee81 upstream.

When accessing an encrypted directory without the key, userspace must
operate on filenames derived from the ciphertext names, which contain
arbitrary bytes.  Since we must support filenames as long as NAME_MAX,
we can't always just base64-encode the ciphertext, since that may make
it too long.  Currently, this is solved by presenting long names in an
abbreviated form containing any needed filesystem-specific hashes (e.g.
to identify a directory block), then the last 16 bytes of ciphertext.
This needs to be sufficient to identify the actual name on lookup.

However, there is a bug.  It seems to have been assumed that due to the
use of a CBC (ciphertext block chaining)-based encryption mode, the last
16 bytes (i.e. the AES block size) of ciphertext would depend on the
full plaintext, preventing collisions.  However, we actually use CBC
with ciphertext stealing (CTS), which handles the last two blocks
specially, causing them to appear "flipped".  Thus, it's actually the
second-to-last block which depends on the full plaintext.

This caused long filenames that differ only near the end of their
plaintexts to, when observed without the key, point to the wrong inode
and be undeletable.  For example, with ext4:

    # echo pass | e4crypt add_key -p 16 edir/
    # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
    # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
    100000
    # sync
    # echo 3 > /proc/sys/vm/drop_caches
    # keyctl new_session
    # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l
    2004
    # rm -rf edir/
    rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning
    ...

To fix this, when presenting long encrypted filenames, encode the
second-to-last block of ciphertext rather than the last 16 bytes.

Although it would be nice to solve this without depending on a specific
encryption mode, that would mean doing a cryptographic hash like SHA-256
which would be much less efficient.  This way is sufficient for now, and
it's still compatible with encryption modes like HEH which are strong
pseudorandom permutations.  Also, changing the presented names is still
allowed at any time because they are only provided to allow applications
to do things like delete encrypted directories.  They're not designed to
be used to persistently identify files --- which would be hard to do
anyway, given that they're encrypted after all.

For ease of backports, this patch only makes the minimal fix to both
ext4 and f2fs.  It leaves ubifs as-is, since ubifs doesn't compare the
ciphertext block yet.  Follow-on patches will clean things up properly
and make the filesystems use a shared helper function.

Fixes: 5de0b4d0cd ("ext4 crypto: simplify and speed up filename encryption")
Reported-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
eb86b4c68b fscrypt: fix context consistency check when key(s) unavailable
commit 272f98f684 upstream.

To mitigate some types of offline attacks, filesystem encryption is
designed to enforce that all files in an encrypted directory tree use
the same encryption policy (i.e. the same encryption context excluding
the nonce).  However, the fscrypt_has_permitted_context() function which
enforces this relies on comparing struct fscrypt_info's, which are only
available when we have the encryption keys.  This can cause two
incorrect behaviors:

1. If we have the parent directory's key but not the child's key, or
   vice versa, then fscrypt_has_permitted_context() returned false,
   causing applications to see EPERM or ENOKEY.  This is incorrect if
   the encryption contexts are in fact consistent.  Although we'd
   normally have either both keys or neither key in that case since the
   master_key_descriptors would be the same, this is not guaranteed
   because keys can be added or removed from keyrings at any time.

2. If we have neither the parent's key nor the child's key, then
   fscrypt_has_permitted_context() returned true, causing applications
   to see no error (or else an error for some other reason).  This is
   incorrect if the encryption contexts are in fact inconsistent, since
   in that case we should deny access.

To fix this, retrieve and compare the fscrypt_contexts if we are unable
to set up both fscrypt_infos.

While this slightly hurts performance when accessing an encrypted
directory tree without the key, this isn't a case we really need to be
optimizing for; access *with* the key is much more important.
Furthermore, the performance hit is barely noticeable given that we are
already retrieving the fscrypt_context and doing two keyring searches in
fscrypt_get_encryption_info().  If we ever actually wanted to optimize
this case we might start by caching the fscrypt_contexts.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
602f8911c7 initramfs: avoid "label at end of compound statement" error
commit 394e4f5d58 upstream.

Commit 17a9be3174 ("initramfs: Always do fput() and load modules after
rootfs populate") introduced an error for the

    CONFIG_BLK_DEV_RAM=y

case, because even though the code looks fine, the compiler really wants
a statement after a label, or you'll get complaints:

  init/initramfs.c: In function 'populate_rootfs':
  init/initramfs.c:644:2: error: label at end of compound statement

That commit moved the subsequent statements to outside the compound
statement, leaving the label without any associated statements.

Reported-by: Jörg Otte <jrg.otte@gmail.com>
Fixes: 17a9be3174 ("initramfs: Always do fput() and load modules after rootfs populate")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
e54af78e1a initramfs: Always do fput() and load modules after rootfs populate
commit 17a9be3174 upstream.

In OpenRISC we do not have a bootloader passed initrd, but the built in
initramfs does contain the /init and other binaries, including modules.
The previous commit 0886551480 ("initramfs: finish fput() before
accessing any binary from initramfs") made a change to only call fput()
if the bootloader initrd was available, this caused intermittent crashes
for OpenRISC.

This patch changes the fput() to happen unconditionally if any rootfs is
loaded. Also, I added some comments to make it a bit more clear why we
call unpack_to_rootfs() multiple times.

Fixes: 0886551480 ("initramfs: finish fput() before accessing any binary from initramfs")
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
9b3be66cb4 f2fs: Make flush bios explicitely sync
commit 3adc5fcb7e upstream.

Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65a
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
bd3dfe5049 f2fs: check entire encrypted bigname when finding a dentry
commit 6332cd32c8 upstream.

If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.

Eric reported how to reproduce this issue by:

 # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
 # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
 # sync
 # echo 3 > /proc/sys/vm/drop_caches
 # keyctl new_session
 # find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999

Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
3a625468bd f2fs: fix multiple f2fs_add_link() having same name for inline dentry
commit d3bb910c15 upstream.

Commit 88c5c13a50 (f2fs: fix multiple f2fs_add_link() calls having
same name) does not cover the scenario where inline dentry is enabled.
In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will
lookup dentries one more time.

This patch fixes it by moving the assigment of current task to a upper
level to cover both normal and inline dentry.

Fixes: 88c5c13a50 (f2fs: fix multiple f2fs_add_link() calls having same name)
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
e71f099677 f2fs: fix fs corruption due to zero inode page
commit 9bb02c3627 upstream.

This patch fixes the following scenario.

- f2fs_create/f2fs_mkdir             - write_checkpoint
 - f2fs_mark_inode_dirty_sync         - block_operations
                                       - f2fs_lock_all
                                       - f2fs_sync_inode_meta
                                        - f2fs_unlock_all
                                        - sync_inode_metadata
 - f2fs_lock_op
                                         - f2fs_write_inode
                                          - update_inode_page
                                           - get_node_page
                                             return -ENOENT
 - new_inode_page
  - fill_node_footer
 - f2fs_mark_inode_dirty_sync
 - ...
 - f2fs_unlock_op
                                          - f2fs_inode_synced
                                       - f2fs_lock_all
                                       - do_checkpoint

In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:49 +02:00
ed4d26a1e4 Revert "f2fs: put allocate_segment after refresh_sit_entry"
commit c6f82fe90d upstream.

This reverts commit 3436c4bdb3.

This makes a leak to register dirty segments. I reproduced the issue by
modified postmark which injects a lot of file create/delete/update and
finally triggers huge number of SSR allocations.

[Jaegeuk Kim: Change missing incorrect comment]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
33cbcc2556 f2fs: fix wrong max cost initialization
commit c541a51b8c upstream.

This patch fixes missing increased max cost caused by a patch that we increased
cose of data segments in greedy algorithm.

Fixes: b9cd20619 "f2fs: node segment is prior to data segment selected victim"
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
03606c8ecc dax: fix PMD data corruption when fault races with write
commit 876f29460c upstream.

This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.

Currently DAX PMD read fault can race with write(2) in the following
way:

CPU1 - write(2)                 CPU2 - read fault
                                dax_iomap_pmd_fault()
                                  ->iomap_begin() - sees hole

dax_iomap_rw()
  iomap_apply()
    ->iomap_begin - allocates blocks
    dax_iomap_actor()
      invalidate_inode_pages2_range()
        - there's nothing to invalidate

                                  grab_mapping_entry()
				  - we add huge zero page to the radix tree
				    and map it to page tables

The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.

Fix the problem by locking exception entry before mapping blocks for the
fault.  That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).

Fixes: 9f141d6ef6 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
5a3651b4a9 ext4: return to starting transaction in ext4_dax_huge_fault()
commit fb26a1cbed upstream.

DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes.  To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().

Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
85f353dbd0 mm: fix data corruption due to stale mmap reads
commit cd656375f9 upstream.

Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX.  That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:

 - open an mmap over a 2MiB hole

 - read from a 2MiB hole, faulting in a 2MiB zero page

 - write to the hole with write(3p). The write succeeds but we
   incorrectly leave the 2MiB zero page mapping intact.

 - via the mmap, read the data that was just written. Since the zero
   page mapping is still intact we read back zeroes instead of the new
   data.

Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.

Fixes: c6dcf52c23
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
06305f51ef dax: prevent invalidation of mapped DAX entries
commit 4636e70bb0 upstream.

Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.

This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).

The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.

This patch (of 4):

dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked.  This is done via:

  invalidate_mapping_pages()
    invalidate_exceptional_entry()
      dax_invalidate_mapping_entry()

However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped.  This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().

For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry.  This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.

We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.

Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().

Fixes: c6dcf52c23 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
55dbfd7dcd device-dax: fix sysfs attribute deadlock
commit 565851c972 upstream.

Usage of device_lock() for dax_region attributes is unnecessary and
deadlock prone. It's unnecessary because the order of registration /
un-registration guarantees that drvdata is always valid. It's deadlock
prone because it sets up this situation:

 ndctl           D    0  2170   2082 0x00000000
 Call Trace:
  __schedule+0x31f/0x980
  schedule+0x3d/0x90
  schedule_preempt_disabled+0x15/0x20
  __mutex_lock+0x402/0x980
  ? __mutex_lock+0x158/0x980
  ? align_show+0x2b/0x80 [dax]
  ? kernfs_seq_start+0x2f/0x90
  mutex_lock_nested+0x1b/0x20
  align_show+0x2b/0x80 [dax]
  dev_attr_show+0x20/0x50

 ndctl           D    0  2186   2079 0x00000000
 Call Trace:
  __schedule+0x31f/0x980
  schedule+0x3d/0x90
  __kernfs_remove+0x1f6/0x340
  ? kernfs_remove_by_name_ns+0x45/0xa0
  ? remove_wait_queue+0x70/0x70
  kernfs_remove_by_name_ns+0x45/0xa0
  remove_files.isra.1+0x35/0x70
  sysfs_remove_group+0x44/0x90
  sysfs_remove_groups+0x2e/0x50
  dax_region_unregister+0x25/0x40 [dax]
  devm_action_release+0xf/0x20
  release_nodes+0x16d/0x2b0
  devres_release_all+0x3c/0x60
  device_release_driver_internal+0x17d/0x220
  device_release_driver+0x12/0x20
  unbind_store+0x112/0x160

ndctl/2170 is trying to acquire the device_lock() to read an attribute,
and ndctl/2186 is holding the device_lock() while trying to drain all
active attribute readers.

Thanks to Yi Zhang for the reproduction script.

Fixes: d7fe1a67f6 ("dax: add region 'id', 'size', and 'align' attributes")
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
8931477790 device-dax: fix cdev leak
commit ed01e50acd upstream.

If device_add() fails, cleanup the cdev. Otherwise, we leak a kobj_map()
with a stale device number.

As Jason points out, there is a small possibility that userspace has
opened and mapped the device in the time between cdev_add() and the
device_add() failure. We need a new kill_dax_dev() helper to invalidate
any established mappings.

Fixes: ba09c01d2f ("dax: convert to the cdev api")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
9bcb9cc96d md/raid1: avoid reusing a resync bio after error handling.
commit 0c9d5b127f upstream.

fix_sync_read_error() modifies a bio on a newly faulty
device by setting bi_end_io to end_sync_write.
This ensure that put_buf() will still call rdev_dec_pending()
as required, but makes sure that subsequent code in
fix_sync_read_error() doesn't try to read from the device.

Unfortunately this interacts badly with sync_request_write()
which assumes that any bio with bi_end_io set to non-NULL
other than end_sync_read is safe to write to.

As the device is now faulty it doesn't make sense to write.
As the bio was recently used for a read, it is "dirty"
and not suitable for immediate submission.
In particular, ->bi_next might be non-NULL, which will cause
generic_make_request() to complain.

Break this interaction by refusing to write to devices
which are marked as Faulty.

Reported-and-tested-by: Michael Wang <yun.wang@profitbricks.com>
Fixes: 2e52d449bc ("md/raid1: add failfast handling for reads.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:48 +02:00
7d70808547 padata: free correct variable
commit 07a77929ba upstream.

The author meant to free the variable that was just allocated, instead
of the one that failed to be allocated, but made a simple typo. This
patch rectifies that.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
f33f17b28d ovl: do not set overlay.opaque on non-dir create
commit 4a99f3c83d upstream.

The optimization for opaque dir create was wrongly being applied
also to non-dir create.

Fixes: 97c684cc91 ("ovl: create directories inside merged parent opaque")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
6a1baa3a16 CIFS: add misssing SFM mapping for doublequote
commit 85435d7a15 upstream.

SFM is mapping doublequote to 0xF020

Without this patch creating files with doublequote fails to Windows/Mac

Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
c4f35cc8dc cifs: fix CIFS_IOC_GET_MNT_INFO oops
commit d8a6e505d6 upstream.

An open directory may have a NULL private_data pointer prior to readdir.

Fixes: 0de1f4c6f6 ("Add way to query server fs info for smb3")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
33fa011644 CIFS: fix oplock break deadlocks
commit 3998e6b87d upstream.

When the final cifsFileInfo_put() is called from cifsiod and an oplock
break work is queued, lockdep complains loudly:

 =============================================
 [ INFO: possible recursive locking detected ]
 4.11.0+ #21 Not tainted
 ---------------------------------------------
 kworker/0:2/78 is trying to acquire lock:
  ("cifsiod"){++++.+}, at: flush_work+0x215/0x350

 but task is already holding lock:
  ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock("cifsiod");
   lock("cifsiod");

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by kworker/0:2/78:
  #0:  ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
  #1:  ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0

 stack backtrace:
 CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
 Workqueue: cifsiod cifs_writev_complete
 Call Trace:
  dump_stack+0x85/0xc2
  __lock_acquire+0x17dd/0x2260
  ? match_held_lock+0x20/0x2b0
  ? trace_hardirqs_off_caller+0x86/0x130
  ? mark_lock+0xa6/0x920
  lock_acquire+0xcc/0x260
  ? lock_acquire+0xcc/0x260
  ? flush_work+0x215/0x350
  flush_work+0x236/0x350
  ? flush_work+0x215/0x350
  ? destroy_worker+0x170/0x170
  __cancel_work_timer+0x17d/0x210
  ? ___preempt_schedule+0x16/0x18
  cancel_work_sync+0x10/0x20
  cifsFileInfo_put+0x338/0x7f0
  cifs_writedata_release+0x2a/0x40
  ? cifs_writedata_release+0x2a/0x40
  cifs_writev_complete+0x29d/0x850
  ? preempt_count_sub+0x18/0xd0
  process_one_work+0x304/0x8e0
  worker_thread+0x9b/0x6a0
  kthread+0x1b2/0x200
  ? process_one_work+0x8e0/0x8e0
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x31/0x40

This is a real warning.  Since the oplock is queued on the same
workqueue this can deadlock if there is only one worker thread active
for the workqueue (which will be the case during memory pressure when
the rescuer thread is handling it).

Furthermore, there is at least one other kind of hang possible due to
the oplock break handling if there is only worker.  (This can be
reproduced without introducing memory pressure by having passing 1 for
the max_active parameter of cifsiod.) cifs_oplock_break() can wait
indefintely in the filemap_fdatawait() while the cifs_writev_complete()
work is blocked:

 sysrq: SysRq : Show Blocked State
   task                        PC stack   pid father
 kworker/0:1     D    0    16      2 0x00000000
 Workqueue: cifsiod cifs_oplock_break
 Call Trace:
  __schedule+0x562/0xf40
  ? mark_held_locks+0x4a/0xb0
  schedule+0x57/0xe0
  io_schedule+0x21/0x50
  wait_on_page_bit+0x143/0x190
  ? add_to_page_cache_lru+0x150/0x150
  __filemap_fdatawait_range+0x134/0x190
  ? do_writepages+0x51/0x70
  filemap_fdatawait_range+0x14/0x30
  filemap_fdatawait+0x3b/0x40
  cifs_oplock_break+0x651/0x710
  ? preempt_count_sub+0x18/0xd0
  process_one_work+0x304/0x8e0
  worker_thread+0x9b/0x6a0
  kthread+0x1b2/0x200
  ? process_one_work+0x8e0/0x8e0
  ? kthread_create_on_node+0x40/0x40
  ret_from_fork+0x31/0x40
 dd              D    0   683    171 0x00000000
 Call Trace:
  __schedule+0x562/0xf40
  ? mark_held_locks+0x29/0xb0
  schedule+0x57/0xe0
  io_schedule+0x21/0x50
  wait_on_page_bit+0x143/0x190
  ? add_to_page_cache_lru+0x150/0x150
  __filemap_fdatawait_range+0x134/0x190
  ? do_writepages+0x51/0x70
  filemap_fdatawait_range+0x14/0x30
  filemap_fdatawait+0x3b/0x40
  filemap_write_and_wait+0x4e/0x70
  cifs_flush+0x6a/0xb0
  filp_close+0x52/0xa0
  __close_fd+0xdc/0x150
  SyS_close+0x33/0x60
  entry_SYSCALL_64_fastpath+0x1f/0xbe

 Showing all locks held in the system:
 2 locks held by kworker/0:1/16:
  #0:  ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
  #1:  ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0

 Showing busy workqueues and worker pools:
 workqueue cifsiod: flags=0xc
   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
     in-flight: 16:cifs_oplock_break
     delayed: cifs_writev_complete, cifs_echo_request
 pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3

Fix these problems by creating a a new workqueue (with a rescuer) for
the oplock break work.

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
076c404c12 cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops
commit 6026685de3 upstream.

As with 618763958b, an open directory may have a NULL private_data
pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
before dereference.

Fixes: 834170c859 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
63d86cd970 cifs: fix leak in FSCTL_ENUM_SNAPS response handling
commit 0e5c795592 upstream.

The server may respond with success, and an output buffer less than
sizeof(struct smb_snapshot_array) in length. Do not leak the output
buffer in this case.

Fixes: 834170c859 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
160239dfbf CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
commit b704e70b7c upstream.

- trailing space maps to 0xF028
- trailing period maps to 0xF029

This fix corrects the mapping of file names which have a trailing character
that would otherwise be illegal (period or space) but is allowed by POSIX.

Signed-off-by: Bjoern Jacke <bjacke@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
6f95f1925a SMB3: Work around mount failure when using SMB3 dialect to Macs
commit 7db0a6efdc upstream.

Macs send the maximum buffer size in response on ioctl to validate
negotiate security information, which causes us to fail the mount
as the response buffer is larger than the expected response.

Changed ioctl response processing to allow for padding of validate
negotiate ioctl response and limit the maximum response size to
maximum buffer size.

Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
49f70d15b5 Set unicode flag on cifs echo request to avoid Mac error
commit 26c9cb668c upstream.

Mac requires the unicode flag to be set for cifs, even for the smb
echo request (which doesn't have strings).

Without this Mac rejects the periodic echo requests (when mounting
with cifs) that we use to check if server is down

Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:47 +02:00
fdf150ee55 Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE
commit 7d0c234fd2 upstream.

commit 620d8745b3 ("Introduce cifs_copy_file_range()") changes the
behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of
successful writes, it now returns the number of bytes written. This
return value is treated as an error by the xfstest cifs/001. Depending
on the errno set at that time, this may or may not result in the test
failing.

The patch fixes this by setting the return value to 0 in case of
successful writes.

Fixes: commit 620d8745b3 ("Introduce cifs_copy_file_range()")
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
ecf9bb0fd7 Fix match_prepath()
commit cd8c42968e upstream.

Incorrect return value for shares not using the prefix path means that
we will never match superblocks for these shares.

Fixes: commit c1d8b24d18 ("Compare prepaths when comparing superblocks")
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
d322fd67ca mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
commit 62be1511b1 upstream.

Patch series "more robust PF_MEMALLOC handling"

This series aims to unify the setting and clearing of PF_MEMALLOC, which
prevents recursive reclaim.  There are some places that clear the flag
unconditionally from current->flags, which may result in clearing a
pre-existing flag.  This already resulted in a bug report that Patch 1
fixes (without the new helpers, to make backporting easier).  Patch 2
introduces the new helpers, modelled after existing memalloc_noio_* and
memalloc_nofs_* helpers, and converts mm core to use them.  Patches 3
and 4 convert non-mm code.

This patch (of 4):

__alloc_pages_direct_compact() sets PF_MEMALLOC to prevent deadlock
during page migration by lock_page() (see the comment in
__unmap_and_move()).  Then it unconditionally clears the flag, which can
clear a pre-existing PF_MEMALLOC flag and result in recursive reclaim.
This was not a problem until commit a8161d1ed6 ("mm, page_alloc:
restructure direct compaction handling in slowpath"), because direct
compation was called only after direct reclaim, which was skipped when
PF_MEMALLOC flag was set.

Even now it's only a theoretical issue, as the new callsite of
__alloc_pages_direct_compact() is reached only for costly orders and
when gfp_pfmemalloc_allowed() is true, which means either
__GFP_NOMEMALLOC is in gfp_flags or in_interrupt() is true.  There is no
such known context, but let's play it safe and make
__alloc_pages_direct_compact() robust for cases where PF_MEMALLOC is
already set.

Fixes: a8161d1ed6 ("mm, page_alloc: restructure direct compaction handling in slowpath")
Link: http://lkml.kernel.org/r/20170405074700.29871-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
41a68ab851 mm: vmscan: fix IO/refault regression in cache workingset transition
commit 2a2e48854d upstream.

Since commit 59dc76b0d4 ("mm: vmscan: reduce size of inactive file
list") we noticed bigger IO spikes during changes in cache access
patterns.

The patch in question shrunk the inactive list size to leave more room
for the current workingset in the presence of streaming IO.  However,
workingset transitions that previously happened on the inactive list are
now pushed out of memory and incur more refaults to complete.

This patch disables active list protection when refaults are being
observed.  This accelerates workingset transitions, and allows more of
the new set to establish itself from memory, without eating into the
ability to protect the established workingset during stable periods.

The workloads that were measurably affected for us were hit pretty bad
by it, with refault/majfault rates doubling and tripling during cache
transitions, and the machines sustaining half-hour periods of 100% IO
utilization, where they'd previously have sub-minute peaks at 60-90%.

Stateful services that handle user data tend to be more conservative
with kernel upgrades.  As a result we hit most page cache issues with
some delay, as was the case here.

The severity seemed to warrant a stable tag.

Fixes: 59dc76b0d4 ("mm: vmscan: reduce size of inactive file list")
Link: http://lkml.kernel.org/r/20170404220052.27593-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
f7bccf3f16 fs/block_dev: always invalidate cleancache in invalidate_bdev()
commit a5f6a6a9c7 upstream.

invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0
which doen't make any sense.

Make sure that invalidate_bdev() always calls cleancache_invalidate_inode()
regardless of mapping->nrpages value.

Fixes: c515e1fd36 ("mm/fs: add hooks to support cleancache")
Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
601462864d fs: fix data invalidation in the cleancache during direct IO
commit 55635ba76e upstream.

Patch series "Properly invalidate data in the cleancache", v2.

We've noticed that after direct IO write, buffered read sometimes gets
stale data which is coming from the cleancache.  The reason for this is
that some direct write hooks call call invalidate_inode_pages2[_range]()
conditionally iff mapping->nrpages is not zero, so we may not invalidate
data in the cleancache.

Another odd thing is that we check only for ->nrpages and don't check
for ->nrexceptional, but invalidate_inode_pages2[_range] also
invalidates exceptional entries as well.  So we invalidate exceptional
entries only if ->nrpages != 0? This doesn't feel right.

 - Patch 1 fixes direct IO writes by removing ->nrpages check.
 - Patch 2 fixes similar case in invalidate_bdev().
     Note: I only fixed conditional cleancache_invalidate_inode() here.
       Do we also need to add ->nrexceptional check in into invalidate_bdev()?

 - Patches 3-4: some optimizations.

This patch (of 4):

Some direct IO write fs hooks call invalidate_inode_pages2[_range]()
conditionally iff mapping->nrpages is not zero.  This can't be right,
because invalidate_inode_pages2[_range]() also invalidate data in the
cleancache via cleancache_invalidate_inode() call.  So if page cache is
empty but there is some data in the cleancache, buffered read after
direct IO write would get stale data from the cleancache.

Also it doesn't feel right to check only for ->nrpages because
invalidate_inode_pages2[_range] invalidates exceptional entries as well.

Fix this by calling invalidate_inode_pages2[_range]() regardless of
nrpages state.

Note: nfs,cifs,9p doesn't need similar fix because the never call
cleancache_get_page() (nor directly, nor via mpage_readpage[s]()), so
they are not affected by this bug.

Fixes: c515e1fd36 ("mm/fs: add hooks to support cleancache")
Link: http://lkml.kernel.org/r/20170424164135.22350-2-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
a4c34c20fb ceph: fix memory leak in __ceph_setxattr()
commit eeca958dce upstream.

The ceph_inode_xattr needs to be released when removing an xattr.  Easily
reproducible running the 'generic/020' test from xfstests or simply by
doing:

  attr -s attr0 -V 0 /mnt/test && attr -r attr0 /mnt/test

While there, also fix the error path.

Here's the kmemleak splat:

unreferenced object 0xffff88001f86fbc0 (size 64):
  comm "attr", pid 244, jiffies 4294904246 (age 98.464s)
  hex dump (first 32 bytes):
    40 fa 86 1f 00 88 ff ff 80 32 38 1f 00 88 ff ff  @........28.....
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
  backtrace:
    [<ffffffff81560199>] kmemleak_alloc+0x49/0xa0
    [<ffffffff810f3e5b>] kmem_cache_alloc+0x9b/0xf0
    [<ffffffff812b157e>] __ceph_setxattr+0x17e/0x820
    [<ffffffff812b1c57>] ceph_set_xattr_handler+0x37/0x40
    [<ffffffff8111fb4b>] __vfs_removexattr+0x4b/0x60
    [<ffffffff8111fd37>] vfs_removexattr+0x77/0xd0
    [<ffffffff8111fdd1>] removexattr+0x41/0x60
    [<ffffffff8111fe65>] path_removexattr+0x75/0xa0
    [<ffffffff81120aeb>] SyS_lremovexattr+0xb/0x10
    [<ffffffff81564b20>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
2cfc744069 fs/xattr.c: zero out memory copied to userspace in getxattr
commit 81be3dee96 upstream.

getxattr uses vmalloc to allocate memory if kzalloc fails.  This is
filled by vfs_getxattr and then copied to the userspace.  vmalloc,
however, doesn't zero out the memory so if the specific implementation
of the xattr handler is sloppy we can theoretically expose a kernel
memory.  There is no real sign this is really the case but let's make
sure this will not happen and use vzalloc instead.

Fixes: 779302e678 ("fs/xattr.c:getxattr(): improve handling of allocation failures")
Link: http://lkml.kernel.org/r/20170306103327.2766-1-mhocko@kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
8bd8322308 orangefs: do not check possibly stale size on truncate
commit 53950ef541 upstream.

Let the server figure this out because our size might be out of date or
not present.

The bug was that

	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
	echo "Test" > /mnt/foo
	xfs_io -f -t -c "pread -v 0 100" /mnt/foo

fails because the second truncate did not happen if nothing had
requested the size after the write in echo.  Thus i_size was zero (not
present) and the orangefs_setattr though i_size was zero and there was
nothing to do.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:46 +02:00
66f1885bea orangefs: do not set getattr_time on orangefs_lookup
commit 17930b252c upstream.

Since orangefs_lookup calls orangefs_iget which calls
orangefs_inode_getattr, getattr_time will get set.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
a6d3c33255 orangefs: clean up oversize xattr validation
commit e675c5ec51 upstream.

Also don't check flags as this has been validated by the VFS already.

Fix an off-by-one error in the max size checking.

Stop logging just because userspace wants to write attributes which do
not fit.

This and the previous commit fix xfstests generic/020.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
7adb15036a orangefs: fix bounds check for listxattr
commit a956af337b upstream.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
1567c17063 ext4: evict inline data when writing to memory map
commit 7b4cc9787f upstream.

Currently the case of writing via mmap to a file with inline data is not
handled.  This is maybe a rare case since it requires a writable memory
map of a very small file, but it is trivial to trigger with on
inline_data filesystem, and it causes the
'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in
ext4_writepages() to be hit:

    mkfs.ext4 -O inline_data /dev/vdb
    mount /dev/vdb /mnt
    xfs_io -f /mnt/file \
	-c 'pwrite 0 1' \
	-c 'mmap -w 0 1m' \
	-c 'mwrite 0 1' \
	-c 'fsync'

	kernel BUG at fs/ext4/inode.c:2723!
	invalid opcode: 0000 [#1] SMP
	CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
	task: ffff88003d3a8040 task.stack: ffffc90000300000
	RIP: 0010:ext4_writepages+0xc89/0xf8a
	RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283
	RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc
	RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246
	RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5
	R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698
	R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff
	FS:  00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0
	Call Trace:
	 ? _raw_spin_unlock+0x27/0x2a
	 ? kvm_clock_read+0x1e/0x20
	 do_writepages+0x23/0x2c
	 ? do_writepages+0x23/0x2c
	 __filemap_fdatawrite_range+0x80/0x87
	 filemap_write_and_wait_range+0x67/0x8c
	 ext4_sync_file+0x20e/0x472
	 vfs_fsync_range+0x8e/0x9f
	 ? syscall_trace_enter+0x25b/0x2d0
	 vfs_fsync+0x1c/0x1e
	 do_fsync+0x31/0x4a
	 SyS_fsync+0x10/0x14
	 do_syscall_64+0x69/0x131
	 entry_SYSCALL64_slow_path+0x25/0x25

We could try to be smart and keep the inline data in this case, or at
least support delayed allocation when allocating the block, but these
solutions would be more complicated and don't seem worthwhile given how
rare this case seems to be.  So just fix the bug by calling
ext4_convert_inline_data() when we're asked to make a page writable, so
that any inline data gets evicted, with the block allocated immediately.

Reported-by: Nick Alcock <nick.alcock@oracle.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
7621cb7966 jbd2: fix dbench4 performance regression for 'nobarrier' mounts
commit 5052b069ac upstream.

Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
filesystem is mounted with nobarrier mount option, journal superblock
writes ended up being async writes after this patch and that caused
heavy performance regression for dbench4 benchmark with high number of
processes. In my test setup with HP RAID array with non-volatile write
cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.

Fix the problem by making sure journal superblock writes are always
treated as synchronous since they generally block progress of the
journalling machinery and thus the whole filesystem.

Fixes: b685d3d65a
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
7a17cbb4b8 perf annotate s390: Implement jump types for perf annotate
commit d9f8dfa9ba upstream.

Implement simple detection for all kind of jumps and branches.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Link: http://lkml.kernel.org/r/1491465112-45819-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
b09eace76e perf annotate s390: Fix perf annotate error -95 (4.10 regression)
commit e77852b32d upstream.

since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b5184 ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Fixes: 786c1b5184 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
08b0b5bd54 perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
commit c3a0bbc7ad upstream.

Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
2d58452241 IB/hfi1: Prevent kernel QP post send hard lockups
commit b6eac931b9 upstream.

The driver progress routines can call cond_resched() when
a timeslice is exhausted and irqs are enabled.

If the ULP had been holding a spin lock without disabling irqs and
the post send directly called the progress routine, the cond_resched()
could yield allowing another thread from the same ULP to deadlock
on that same lock.

Correct by replacing the current hfi1_do_send() calldown with a unique
one for post send and adding an argument to hfi1_do_send() to indicate
that the send engine is running in a thread.   If the routine is not
running in a thread, avoid calling cond_resched().

Fixes: Commit 831464ce4b ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:45 +02:00
7a48c89a3c IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level
commit fb7a91746a upstream.

A warning message during SRIOV multicast cleanup should have actually been
a debug level message. The condition generating the warning does no harm
and can fill the message log.

In some cases, during testing, some tests were so intense as to swamp the
message log with these warning messages, causing a stall in the console
message log output task. This stall caused an NMI to be sent to all CPUs
(so that they all dumped their stacks into the message log).
Aside from the message flood causing an NMI, the tests all passed.

Once the message flood which caused the NMI is removed (by reducing the
warning message to debug level), the NMI no longer occurs.

Sample message log (console log) output illustrating the flood and
resultant NMI (snippets with comments and modified with ... instead
of hex digits, to satisfy checkpatch.pl):

 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 *** About 4000 almost identical lines in less than one second ***
 <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
 INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
 *** { 17} above indicates that CPU 17 was the one that stalled ***
 sending NMI to all CPUs:
 ...
 NMI backtrace for cpu 17
 CPU: 17 PID: 45909 Comm: kworker/17:2
 Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
 Workqueue: events fb_flashcursor
 task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
 RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
 RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
 RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
 RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
 RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
 R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
 R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
 FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
 CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
 DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
 DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
 Stack:
 ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
 ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
 ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
 Call Trace:
[<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
[<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
[<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
[<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
[<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
[<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
[<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
[<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
[<ffffffff81355c30>] ? bit_clear+0x120/0x120
[<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[<ffffffff810a5aef>] kthread+0xcf/0xe0
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81645858>] ret_from_fork+0x58/0x90
[<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6

As indicated in the stack trace above, the console output task got swamped.

Fixes: b9c5d6a643 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
295a7aab35 IB/mlx4: Fix ib device initialization error flow
commit 99e68909d5 upstream.

In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.

However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
called to free the allocated EQs.

Fixes: e605b743f3 ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
f98af463f8 IB/IPoIB: ibX: failed to create mcg debug file
commit 771a525840 upstream.

When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.

Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.

Fixes: 1732b0ef3b ([IPoIB] add path record information in debugfs)
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
88ded1f18b IB/core: For multicast functions, verify that LIDs are multicast LIDs
commit 8561eae60f upstream.

The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5).  Currently the MLID value is not
validated.

Add check to verify that the MLID value is in the correct address
range.

Fixes: 0c33aeedb2 ("[IB] Add checks to multicast attach and detach")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
82f0feccee IB/core: Fix kernel crash during fail to initialize device
commit 4be3a4fa51 upstream.

This patch fixes the kernel crash that occurs during ib_dealloc_device()
called due to provider driver fails with an error after
ib_alloc_device() and before it can register using ib_register_device().

This crashed seen in tha lab as below which can occur with any IB device
which fails to perform its device initialization before invoking
ib_register_device().

This patch avoids touching cache and port immutable structures if device
is not yet initialized.
It also releases related memory when cache and port immutable data
structure initialization fails during register_device() state.

[81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
[81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
[81416.576222] PGD 78da66067
[81416.576223] PUD 7f2d7c067
[81416.579484] PMD 0
[81416.582720]
[81416.587242] Oops: 0000 [#1] SMP
[81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
[81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
[81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
[81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
[81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
[81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
[81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
[81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
[81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
[81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[81416.820950] Call Trace:
[81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
[81416.829858]  device_release+0x32/0xa0
[81416.834370]  kobject_cleanup+0x63/0x170
[81416.839058]  kobject_put+0x25/0x50
[81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
[81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
[81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
[81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
[81416.866587]  ? 0xffffffffa09e9000
[81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
[81416.876094]  do_one_initcall+0x51/0x1b0
[81416.880861]  ? __vunmap+0x85/0xd0
[81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
[81416.890768]  ? vfree+0x2e/0x70
[81416.894762]  do_init_module+0x60/0x1fa
[81416.899441]  load_module+0x15f6/0x1af0
[81416.904114]  ? __symbol_put+0x60/0x60
[81416.908709]  ? ima_post_read_file+0x3d/0x80
[81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
[81416.920006]  SYSC_finit_module+0xa6/0xf0
[81416.924888]  SyS_finit_module+0xe/0x10
[81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[81416.935089] RIP: 0033:0x7fd879494949
[81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
[81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
[81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
[81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
[81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
[81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
[81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
[81417.016045] CR2: 0000000000000000

Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject")
Fixes: 7738613e7c ("IB/core: Add per port immutable struct to ib_device")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
5c7f1dfa7b IB/core: Fix sysfs registration error flow
commit b312be3d87 upstream.

The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.

As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).

However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.

The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.

The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes
ib_dealloc_device.

Fixes: 55aeed0654 ("IB/core: Make ib_alloc_device init the kobject")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
09944c2660 iov_iter: don't revert iov buffer if csum error
commit a6a5993243 upstream.

The patch 3278682123 (make skb_copy_datagram_msg() et.al. preserve
->msg_iter on error) will revert the iov buffer if copy to iter
failed, but it didn't copy any datagram if the skb_checksum_complete
error, so no need to revert any data at this place.

v2: Sabrina notice that return -EFAULT when checksum error is not correct
    here, it would confuse the caller about the return value, so fix it.

Fixes: 3278682123 ("make skb_copy_datagram_msg() et.al. preserve->msg_iter on error")
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
1e0b0a9ef0 vfio/type1: Remove locked page accounting workqueue
commit 0cfef2b741 upstream.

If the mmap_sem is contented then the vfio type1 IOMMU backend will
defer locked page accounting updates to a workqueue task.  This has a
few problems and depending on which side the user tries to play, they
might be over-penalized for unmaps that haven't yet been accounted or
race the workqueue to enter more mappings than they're allowed.  The
original intent of this workqueue mechanism seems to be focused on
reducing latency through the ioctl, but we cannot do so at the cost
of correctness.  Remove this workqueue mechanism and update the
callers to allow for failure.  We can also now recheck the limit under
write lock to make sure we don't exceed it.

vfio_pin_pages_remote() also now necessarily includes an unwind path
which we can jump to directly if the consecutive page pinning finds
that we're exceeding the user's memory limits.  This avoids the
current lazy approach which does accounting and mapping up to the
fault, only to return an error on the next iteration to unwind the
entire vfio_dma.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
34224e0e1c dm thin: fix a memory leak when passing discard bio down
commit 948f581a53 upstream.

dm-thin does not free the discard_parent bio after all chained sub
bios finished. The following kmemleak report could be observed after
pool with discard_passdown option processes discard bios in
linux v4.11-rc7. To fix this, we drop the discard_parent bio reference
when its endio (passdown_endio) called.

unreferenced object 0xffff8803d6b29700 (size 256):
  comm "kworker/u8:0", pid 30349, jiffies 4379504020 (age 143002.776s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    01 00 00 00 00 00 00 f0 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81a5efd9>] kmemleak_alloc+0x49/0xa0
    [<ffffffff8114ec34>] kmem_cache_alloc+0xb4/0x100
    [<ffffffff8110eec0>] mempool_alloc_slab+0x10/0x20
    [<ffffffff8110efa5>] mempool_alloc+0x55/0x150
    [<ffffffff81374939>] bio_alloc_bioset+0xb9/0x260
    [<ffffffffa018fd20>] process_prepared_discard_passdown_pt1+0x40/0x1c0 [dm_thin_pool]
    [<ffffffffa018b409>] break_up_discard_bio+0x1a9/0x200 [dm_thin_pool]
    [<ffffffffa018b484>] process_discard_cell_passdown+0x24/0x40 [dm_thin_pool]
    [<ffffffffa018b24d>] process_discard_bio+0xdd/0xf0 [dm_thin_pool]
    [<ffffffffa018ecf6>] do_worker+0xa76/0xd50 [dm_thin_pool]
    [<ffffffff81086239>] process_one_work+0x139/0x370
    [<ffffffff810867b1>] worker_thread+0x61/0x450
    [<ffffffff8108b316>] kthread+0xd6/0xf0
    [<ffffffff81a6cd1f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:44 +02:00
899750e7d9 dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
commit 23a6012489 upstream.

Otherwise the request-based DM blk-mq request_queue will be put into
service without being properly exported via sysfs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
ee68aa113f dm era: save spacemap metadata root after the pre-commit
commit 117aceb030 upstream.

When committing era metadata to disk, it doesn't always save the latest
spacemap metadata root in superblock. Due to this, metadata is getting
corrupted sometimes when reopening the device. The correct order of update
should be, pre-commit (shadows spacemap root), save the spacemap root
(newly shadowed block) to in-core superblock and then the final commit.

Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
f38faf569e dm crypt: rewrite (wipe) key in crypto layer using random data
commit c82feeec9a upstream.

The message "key wipe" used to wipe real key stored in crypto layer by
rewriting it with zeroes.  Since commit 28856a9 ("crypto: xts -
consolidate sanity check for keys") this no longer works in FIPS mode
for XTS.

While running in FIPS mode the crypto key part has to differ from the
tweak key.

Fixes: 28856a9 ("crypto: xts - consolidate sanity check for keys")
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
6ab60d2b26 crypto: ccp - Change ISR handler method for a v5 CCP
commit 6263b51eb3 upstream.

The CCP has the ability to perform several operations simultaneously,
but only one interrupt.  When implemented as a PCI device and using
MSI-X/MSI interrupts, use a tasklet model to service interrupts. By
disabling and enabling interrupts from the CCP, coupled with the
queuing that tasklets provide, we can ensure that all events
(occurring on the device) are recognized and serviced.

This change fixes a problem wherein 2 or more busy queues can cause
notification bits to change state while a (CCP) interrupt is being
serviced, but after the queue state has been evaluated. This results
in the event being 'lost' and the queue hanging, waiting to be
serviced. Since the status bits are never fully de-asserted, the
CCP never generates another interrupt (all bits zero -> one or more
bits one), and no further CCP operations will be executed.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
747cbf4753 crypto: ccp - Change ISR handler method for a v3 CCP
commit 7b537b24e7 upstream.

The CCP has the ability to perform several operations simultaneously,
but only one interrupt.  When implemented as a PCI device and using
MSI-X/MSI interrupts, use a tasklet model to service interrupts. By
disabling and enabling interrupts from the CCP, coupled with the
queuing that tasklets provide, we can ensure that all events
(occurring on the device) are recognized and serviced.

This change fixes a problem wherein 2 or more busy queues can cause
notification bits to change state while a (CCP) interrupt is being
serviced, but after the queue state has been evaluated. This results
in the event being 'lost' and the queue hanging, waiting to be
serviced. Since the status bits are never fully de-asserted, the
CCP never generates another interrupt (all bits zero -> one or more
bits one), and no further CCP operations will be executed.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
511b79fb94 crypto: ccp - Disable interrupts early on unload
commit 116591fe3e upstream.

Ensure that we disable interrupts first when shutting down
the driver.

Signed-off-by: Gary R Hook <ghook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
c133daf731 crypto: ccp - Use only the relevant interrupt bits
commit 56467cb11c upstream.

Each CCP queue can product interrupts for 4 conditions:
operation complete, queue empty, error, and queue stopped.
This driver only works with completion and error events.

Signed-off-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
b576fed7c8 crypto: algif_aead - Require setkey before accept(2)
commit 2a2a251f11 upstream.

Some cipher implementations will crash if you try to use them
without calling setkey first.  This patch adds a check so that
the accept(2) call will fail with -ENOKEY if setkey hasn't been
done on the socket yet.

Fixes: 400c40cf78 ("crypto: algif - add AEAD support")
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
72a03cf8e3 crypto: s5p-sss - Close possible race for completed requests
commit 42d5c176b7 upstream.

Driver is capable of handling only one request at a time and it stores
it in its state container struct s5p_aes_dev.  This stored request must be
protected between concurrent invocations (e.g. completing current
request and scheduling new one).  Combination of lock and "busy" field
is used for that purpose.

When "busy" field is true, the driver will not accept new request thus
it will not overwrite currently handled data.

However commit 28b62b1458 ("crypto: s5p-sss - Fix spinlock recursion
on LRW(AES)") moved some of the write to "busy" field out of a lock
protected critical section.  This might lead to potential race between
completing current request and scheduling a new one.  Effectively the
request completion might try to operate on new crypto request.

Fixes: 28b62b1458 ("crypto: s5p-sss - Fix spinlock recursion on LRW(AES)")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
d9ae27b661 block: fix blk_integrity_register to use template's interval_exp if not 0
commit 2859323e35 upstream.

When registering an integrity profile: if the template's interval_exp is
not 0 use it, otherwise use the ilog2() of logical block size of the
provided gendisk.

This fixes a long-standing DM linear target bug where it cannot pass
integrity data to the underlying device if its logical block size
conflicts with the underlying device's logical block size.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:43 +02:00
d978303275 arm64: KVM: Fix decoding of Rt/Rt2 when trapping AArch32 CP accesses
commit c667186f1c upstream.

Our 32bit CP14/15 handling inherited some of the ARMv7 code for handling
the trapped system registers, completely missing the fact that the
fields for Rt and Rt2 are now 5 bit wide, and not 4...

Let's fix it, and provide an accessor for the most common Rt case.

Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
95121cc98c KVM: arm/arm64: fix races in kvm_psci_vcpu_on
commit 6c7a5dce22 upstream.

Fix potential races in kvm_psci_vcpu_on() by taking the kvm->lock
mutex.  In general, it's a bad idea to allow more than one PSCI_CPU_ON
to process the same target VCPU at the same time.  One such problem
that may arise is that one PSCI_CPU_ON could be resetting the target
vcpu, which fills the entire sys_regs array with a temporary value
including the MPIDR register, while another looks up the VCPU based
on the MPIDR value, resulting in no target VCPU found.  Resolves both
races found with the kvm-unit-tests/arm/psci unit test.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reported-by: Levente Kurusa <lkurusa@redhat.com>
Suggested-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
d1daa545bb Revert "KVM: Support vCPU-based gfn->hva cache"
commit 4e335d9e7d upstream.

This reverts commit bbd6411513.

I've been sitting on this revert for too long and it unfortunately
missed 4.11.  It's also the reason why I haven't merged ring-based
dirty tracking for 4.12.

Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and
kvm_vcpu_write_guest_offset_cached means that the MSR value can
now be used to access SMRAM, simply by making it point to an SMRAM
physical address.  This is problematic because it lets the guest
OS overwrite memory that it shouldn't be able to touch.

Fixes: bbd6411513
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
2d271c9508 KVM: x86: fix user triggerable warning in kvm_apic_accept_events()
commit 28bf288879 upstream.

If we already entered/are about to enter SMM, don't allow switching to
INIT/SIPI_RECEIVED, otherwise the next call to kvm_apic_accept_events()
will report a warning.

Same applies if we are already in MP state INIT_RECEIVED and SMM is
requested to be turned on. Refuse to set the VCPU events in this case.

Fixes: cd7764fe9f ("KVM: x86: latch INITs while in system management mode")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
d85d4c871a perf/x86: Fix Broadwell-EP DRAM RAPL events
commit 33b88e708e upstream.

It appears as though the Broadwell-EP DRAM units share the special
units quirk with Haswell-EP/KNL.

Without this patch, you get really high results (a single DRAM using 20W
of power).

The powercap driver in drivers/powercap/intel_rapl.c already has this
change.

Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
9e74819678 um: Fix PTRACE_POKEUSER on x86_64
commit 9abc74a22d upstream.

This is broken since ever but sadly nobody noticed.
Recent versions of GDB set DR_CONTROL unconditionally and
UML dies due to a heap corruption. It turns out that
the PTRACE_POKEUSER was copy&pasted from i386 and assumes
that addresses are 4 bytes long.

Fix that by using 8 as address size in the calculation.

Reported-by: jie cao <cj3054@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
f25c69bd5e x86, pmem: Fix cache flushing for iovec write < 8 bytes
commit 8376efd31d upstream.

Commit 11e63f6d92 added cache flushing for unaligned writes from an
iovec, covering the first and last cache line of a >= 8 byte write and
the first cache line of a < 8 byte write.  But an unaligned write of
2-7 bytes can still cover two cache lines, so make sure we flush both
in that case.

Fixes: 11e63f6d92 ("x86, pmem: fix broken __copy_user_nocache ...")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
20fd61dbb1 selftests/x86/ldt_gdt_32: Work around a glibc sigaction() bug
commit 65973dd3fd upstream.

i386 glibc is buggy and calls the sigaction syscall incorrectly.

This is asymptomatic for normal programs, but it blows up on
programs that do evil things with segmentation.  The ldt_gdt
self-test is an example of such an evil program.

This doesn't appear to be a regression -- I think I just got lucky
with the uninitialized memory that glibc threw at the kernel when I
wrote the test.

This hackish fix manually issues sigaction(2) syscalls to undo the
damage.  Without the fix, ldt_gdt_32 segfaults; with the fix, it
passes for me.

See: https://sourceware.org/bugzilla/show_bug.cgi?id=21269

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aaab0f9f93c9af25396f01232608c163a760a668.1490218061.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
0dc0e26fee x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup
commit d594aa0277 upstream.

The minimum size for a new stack (512 bytes) setup for arch/x86/boot components
when the bootloader does not setup/provide a stack for the early boot components
is not "enough".

The setup code executing as part of early kernel startup code, uses the stack
beyond 512 bytes and accidentally overwrites and corrupts part of the BSS
section. This is exposed mostly in the early video setup code, where
it was corrupting BSS variables like force_x, force_y, which in-turn affected
kernel parameters such as screen_info (screen_info.orig_video_cols) and
later caused an exception/panic in console_init().

Most recent boot loaders setup the stack for early boot components, so this
stack overwriting into BSS section issue has not been exposed.

Signed-off-by: Ashish Kalra <ashish@bluestacks.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170419152015.10011-1-ashishkalra@Ashishs-MacBook-Pro.local
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:42 +02:00
12001f3a45 usb: hub: Do not attempt to autosuspend disconnected devices
commit f5cccf4942 upstream.

While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
the following crash was observed.

Unable to handle kernel NULL pointer dereference at virtual address 00000218
pgd = ffffffc00165f000
[00000218] *pgd=000000000174f003, *pud=000000000174f003,
				*pmd=0000000001750003, *pte=00e8000001751713
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
ppp_generic slhc tun
CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
Hardware name: Google Kevin (DT)
Workqueue: pm pm_runtime_work
task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
PC is at autosuspend_check+0x74/0x174
LR is at autosuspend_check+0x70/0x174
...
Call trace:
[<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
[<ffffffc000810500>] usb_runtime_idle+0x20/0x40
[<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
[<ffffffc000786af0>] rpm_idle+0x1e8/0x498
[<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40

Source:

(gdb) l *0xffffffc00080dcc0
0xffffffc00080dcc0 is in autosuspend_check
(drivers/usb/core/driver.c:1778).
1773		/* We don't need to check interfaces that are
1774		 * disabled for runtime PM.  Either they are unbound
1775		 * or else their drivers don't support autosuspend
1776		 * and so they are permanently active.
1777		 */
1778		if (intf->dev.power.disable_depth)
1779			continue;
1780		if (atomic_read(&intf->dev.power.usage_count) > 0)
1781			return -EBUSY;
1782		w |= intf->needs_remote_wakeup;

Code analysis shows that intf is set to NULL in usb_disable_device() prior
to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
lock the usb device, and neither does any of the functions in the
traceback. This means that there is no protection against a race condition
where usb_disable_device() is removing dev->actconfig->interface[] pointers
while those are being accessed from autosuspend_check().

To solve the problem, synchronize and validate device state between
autosuspend_check() and usb_disconnect().

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
90d4e2a659 usb: hub: Fix error loop seen after hub communication errors
commit 245b2eecee upstream.

While stress testing a usb controller using a bind/unbind looop, the
following error loop was observed.

usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
usb 7-1.2: hub failed to enable device, error -108
usb 7-1-port2: cannot disable (err = -22)
usb 7-1-port2: couldn't allocate usb_device
usb 7-1-port2: cannot disable (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22
** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22)

This continues forever. After adding tracebacks into the code,
the call sequence leading to this is found to be as follows.

[<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
[<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
[<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
[<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
[<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
[<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
[<ffffffc00078217c>] rpm_callback+0xa8/0xd4
[<ffffffc000786234>] rpm_suspend+0x84/0x758
[<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
[<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
[<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
[<ffffffc000803798>] hub_event+0x10ac/0x12ac
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40

kick_hub_wq() is called from hub_activate() even after failures to
communicate with the hub. This results in an endless sequence of
hub event -> hub activate -> wq trigger -> hub event -> ...

Provide two solutions for the problem.

- Only trigger the hub event queue if communication with the hub
  is successful.
- After a suspend failure, only resume already suspended interfaces
  if the communication with the device is still possible.

Each of the changes fixes the observed problem. Use both to improve
robustness.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
8e3b0f66f4 usb: Make sure usb/phy/of gets built-in
commit 3d6159640d upstream.

DWC3 driver uses of_usb_get_phy_mode() which is
implemented in drivers/usb/phy/of.c and in bare minimal
configuration it might not be pulled in kernel binary.

In case of ARC or ARM this could be easily reproduced with
"allnodefconfig" +CONFIG_USB=m +CONFIG_USB_DWC3=m.

On building all ends-up with:
---------------------->8------------------
  Kernel: arch/arm/boot/Image is ready
  Kernel: arch/arm/boot/zImage is ready
  Building modules, stage 2.
  MODPOST 5 modules
ERROR: "of_usb_get_phy_mode" [drivers/usb/dwc3/dwc3.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
---------------------->8------------------

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
6a9fc06acf usb: gadget: legacy gadgets are optional
commit 6e253d0fbc upstream.

With commit bc49d1d17d ("usb: gadget: don't couple configfs to legacy
gadgets"),it is possible to build a modular kernel with both built-in
configfs support and modular legacy gadget drivers.

But when building a kernel without modules, it is also necessary to be
able to build with configfs but without any legacy gadget driver. This
was a possible configuration when the USB_CONFIGFS was a part of the
choice options, but not anymore.

Mark the choice for legacy gadget drivers as optional restores this.

Fixes: bc49d1d17d ("usb: gadget: don't couple configfs to legacy gadgets")
Signed-off-by: Romain Izard <romain.izard.pro@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
84759debb0 usb: misc: add missing continue in switch
commit 2c930e3d0a upstream.

Add missing continue in switch.

Addresses-Coverity-ID: 1248733
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
69f8945b4f staging: comedi: jr3_pci: cope with jiffies wraparound
commit 8ec04a4918 upstream.

The timer expiry routine `jr3_pci_poll_dev()` checks for expiry by
checking whether the absolute value of `jiffies` (stored in local
variable `now`) is greater than the expected expiry time in jiffy units.
This will fail when `jiffies` wraps around.  Also, it seems to make
sense to handle the expiry one jiffy earlier than the current test.  Use
`time_after_eq()` to check for expiry.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
6e4c973ac6 staging: comedi: jr3_pci: fix possible null pointer dereference
commit 45292be0b3 upstream.

For some reason, the driver does not consider allocation of the
subdevice private data to be a fatal error when attaching the COMEDI
device.  It tests the subdevice private data pointer for validity at
certain points, but omits some crucial tests.  In particular,
`jr3_pci_auto_attach()` calls `jr3_pci_alloc_spriv()` to allocate and
initialize the subdevice private data, but the same function
subsequently dereferences the pointer to access the `next_time_min` and
`next_time_max` members without checking it first.  The other missing
test is in the timer expiry routine `jr3_pci_poll_dev()`, but it will
crash before it gets that far.

Fix the bug by returning `-ENOMEM` from `jr3_pci_auto_attach()` as soon
as one of the calls to `jr3_pci_alloc_spriv()` returns `NULL`.  The
COMEDI core will subsequently call `jr3_pci_detach()` to clean up.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
9918523578 staging: sir: fill in missing fields and fix probe
commit cf9ed9aa5b upstream.

Some fields are left blank.

Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
89cb8fccea staging: wilc1000: Fix problem with wrong vif index
commit 0e490657c7 upstream.

The vif->idx value is always 0 for two interfaces.

wl->vif_num = 0;

loop {
     ...

     vif->idx = wl->vif_num;
     ...
     wl->vif_num = i;
      ....
     i++;
     ...
}

At present, vif->idx is assigned the value of wl->vif_num
at the beginning of this block and device is initialized
based on this index value.
In the next iteration, wl->vif_num is still 0 as it is only updated
later but gets assigned to vif->idx in the beginning. This causes problems
later when we try to reference a particular interface and also while
configuring the firmware.

This patch moves the assignment to vif->idx from the beginning
of the block to after wl->vif_num is updated with latest value of i.

Fixes: commit 735bb39ca3 ("staging: wilc1000: simplify vif[i]->ndev accesses")
Signed-off-by: Aditya Shankar <aditya.shankar@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:41 +02:00
22d8767df9 staging: gdm724x: gdm_mux: fix use-after-free on module unload
commit b58f45c8fc upstream.

Make sure to deregister the USB driver before releasing the tty driver
to avoid use-after-free in the USB disconnect callback where the tty
devices are deregistered.

Fixes: 61e1210476 ("staging: gdm7240: adding LTE USB driver")
Cc: Won Kang <wkang77@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
af685eefa2 staging: vt6656: use off stack for out buffer USB transfers.
commit 12ecd24ef9 upstream.

Since 4.9 mandated USB buffers be heap allocated this causes the driver
to fail.

Since there is a wide range of buffer sizes use kmemdup to create
allocated buffer.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
5ba4fd334f staging: vt6656: use off stack for in buffer USB transfers.
commit 05c0cf88be upstream.

Since 4.9 mandated USB buffers to be heap allocated. This causes
the driver to fail.

Create buffer for USB transfers.

Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
f49d36b7b3 USB: Revert "cdc-wdm: fix "out-of-sync" due to missing notifications"
commit 1944581699 upstream.

This reverts commit 833415a3e7 ("cdc-wdm: fix "out-of-sync" due to
missing notifications")

There have been several reports of wdm_read returning unexpected EIO
errors with QMI devices using the qmi_wwan driver. The reporters
confirm that reverting prevents these errors. I have been unable to
reproduce the bug myself, and have no explanation to offer either. But
reverting is the safe choice here, given that the commit was an
attempt to work around a firmware problem.  Living with a firmware
problem is still better than adding driver bugs.

Reported-by: Kasper Holtze <kasper@holtze.dk>
Reported-by: Aleksander Morgado <aleksander@aleksander.es>
Reported-by: Daniele Palmas <dnlplm@gmail.com>
Fixes: 833415a3e7 ("cdc-wdm: fix "out-of-sync" due to missing notifications")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
2c33341c96 USB: Proper handling of Race Condition when two USB class drivers try to call init_usb_class simultaneously
commit 2f86a96be0 upstream.

There is race condition when two USB class drivers try to call
init_usb_class at the same time and leads to crash.
code path: probe->usb_register_dev->init_usb_class

To solve this, mutex locking has been added in init_usb_class() and
destroy_usb_class().

As pointed by Alan, removed "if (usb_class)" test from destroy_usb_class()
because usb_class can never be NULL there.

Signed-off-by: Ajay Kaher <ajay.kaher@samsung.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
2436236ee3 USB: serial: ftdi_sio: add device ID for Microsemi/Arrow SF2PLUS Dev Kit
commit 31c5d1922b upstream.

This development kit has an FT4232 on it with a custom USB VID/PID.
The FT4232 provides four UARTs, but only two are used. The UART 0
is used by the FlashPro5 programmer and UART 2 is connected to the
SmartFusion2 CortexM3 SoC UART port.

Note that the USB VID is registered to Actel according to Linux USB
VID database, but that was acquired by Microsemi.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
f34b103cd5 usb: host: xhci: print correct command ring address
commit 6fc091fb04 upstream.

Print correct command ring address using 'val_64'.

Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
c5379cf67f usb: xhci: bInterval quirk for TI TUSB73x0
commit 69307ccb9a upstream.

As per [1] issue #4,
"The periodic EP scheduler always tries to schedule the EPs
that have large intervals (interval equal to or greater than
128 microframes) into different microframes. So it maintains
an internal counter and increments for each large interval
EP added. When the counter is greater than 128, the scheduler
rejects the new EP. So when the hub re-enumerated 128 times,
it triggers this condition."

This results in Bandwidth error when devices with periodic
endpoints (ISO/INT) having bInterval > 7 are plugged and
unplugged several times on a TUSB73x0 XHCI host.

Workaround this issue by limiting the bInterval to 7
(i.e. interval to 6) for High-speed or faster periodic endpoints.

[1] - http://www.ti.com/lit/er/sllz076/sllz076.pdf

Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
f9a45058a3 iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
commit 197b806ae5 upstream.

While testing modification of per se_node_acl queue_depth forcing
session reinstatement via lio_target_nacl_cmdsn_depth_store() ->
core_tpg_set_initiator_node_queue_depth(), a hung task bug triggered
when changing cmdsn_depth invoked session reinstatement while an iscsi
login was already waiting for session reinstatement to complete.

This can happen when an outstanding se_cmd descriptor is taking a
long time to complete, and session reinstatement from iscsi login
or cmdsn_depth change occurs concurrently.

To address this bug, explicitly set session_fall_back_to_erl0 = 1
when forcing session reinstatement, so session reinstatement is
not attempted if an active session is already being shutdown.

This patch has been tested with two scenarios.  The first when
iscsi login is blocked waiting for iscsi session reinstatement
to complete followed by queue_depth change via configfs, and
second when queue_depth change via configfs us blocked followed
by a iscsi login driven session reinstatement.

Note this patch depends on commit d36ad77f70 to handle multiple
sessions per se_node_acl when changing cmdsn_depth, and for
pre v4.5 kernels will need to be included for stable as well.

Reported-by: Gary Guo <ghg@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Gary Guo <ghg@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:40 +02:00
14a890020f target/fileio: Fix zero-length READ and WRITE handling
commit 59ac9c0781 upstream.

This patch fixes zero-length READ and WRITE handling in target/FILEIO,
which was broken a long time back by:

Since:

  commit d81cb44726
  Author: Paolo Bonzini <pbonzini@redhat.com>
  Date:   Mon Sep 17 16:36:11 2012 -0700

      target: go through normal processing for all zero-length commands

which moved zero-length READ and WRITE completion out of target-core,
to doing submission into backend driver code.

To address this, go ahead and invoke target_complete_cmd() for any
non negative return value in fd_do_rw().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:39 +02:00
25ed85889d target: Fix compare_and_write_callback handling for non GOOD status
commit a71a5dc7f8 upstream.

Following the bugfix for handling non SAM_STAT_GOOD COMPARE_AND_WRITE
status during COMMIT phase in commit 9b2792c3da, the same bug exists
for the READ phase as well.

This would manifest first as a lost SCSI response, and eventual
hung task during fabric driver logout or re-login, as existing
shutdown logic waited for the COMPARE_AND_WRITE se_cmd->cmd_kref
to reach zero.

To address this bug, compare_and_write_callback() has been changed
to set post_ret = 1 and return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE
as necessary to signal failure status.

Reported-by: Bill Borsari <wgb@datera.io>
Cc: Bill Borsari <wgb@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Gary Guo <ghg@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:39 +02:00
1ec6f0815a xen: adjust early dom0 p2m handling to xen hypervisor behavior
commit 69861e0a52 upstream.

When booted as pv-guest the p2m list presented by the Xen is already
mapped to virtual addresses. In dom0 case the hypervisor might make use
of 2M- or 1G-pages for this mapping. Unfortunately while being properly
aligned in virtual and machine address space, those pages might not be
aligned properly in guest physical address space.

So when trying to obtain the guest physical address of such a page
pud_pfn() and pmd_pfn() must be avoided as those will mask away guest
physical address bits not being zero in this special case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:49:39 +02:00
4c71e91a04 Linux 4.11.1 2017-05-14 14:06:16 +02:00
0873ab0224 block: get rid of blk_integrity_revalidate()
commit 19b7ccf865 upstream.

Commit 25520d55cd ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi->profile)
            disk->queue->backing_dev_info->capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk->queue->backing_dev_info->capabilities &=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cd ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:03 +02:00
3ed024e274 xen: Revert commits da72ff5bfc and 72a9b18629
commit 84d582d236 upstream.

Recent discussion (http://marc.info/?l=xen-devel&m=149192184523741)
established that commit 72a9b18629 ("xen: Remove event channel
notification through Xen PCI platform device") (and thus commit
da72ff5bfc ("partially revert "xen: Remove event channel
notification through Xen PCI platform device"")) are unnecessary and,
in fact, prevent HVM guests from booting on Xen releases prior to 4.0

Therefore we revert both of those commits.

The summary of that discussion is below:

  Here is the brief summary of the current situation:

  Before the offending commit (72a9b18629):

  1) INTx does not work because of the reset_watches path.
  2) The reset_watches path is only taken if you have Xen > 4.0
  3) The Linux Kernel by default will use vector inject if the hypervisor
     support. So even INTx does not work no body running the kernel with
     Xen > 4.0 would notice. Unless he explicitly disabled this feature
     either in the kernel or in Xen (and this can only be disabled by
     modifying the code, not user-supported way to do it).

  After the offending commit (+ partial revert):

  1) INTx is no longer support for HVM (only for PV guests).
  2) Any HVM guest The kernel will not boot on Xen < 4.0 which does
     not have vector injection support. Since the only other mode
     supported is INTx which.

  So based on this summary, I think before commit (72a9b18629) we were
  in much better position from a user point of view.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: xen-devel@lists.xenproject.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:03 +02:00
5e25479ea9 xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..."
commit e058632670 upstream.

The following commit:

  commit 815dd18788
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Fri Jan 20 13:04:04 2017 -0800

      treewide: Consolidate get_dma_ops() implementations

rearranges get_dma_ops in a way that xen_dma_ops are not returned when
running on Xen anymore, dev->dma_ops is returned instead (see
arch/arm/include/asm/dma-mapping.h:get_arch_dma_ops and
include/linux/dma-mapping.h:get_dma_ops).

Fix the problem by storing dev->dma_ops in dev_archdata, and setting
dev->dma_ops to xen_dma_ops. This way, xen_dma_ops is returned naturally
by get_dma_ops. The Xen code can retrieve the original dev->dma_ops from
dev_archdata when needed. It also allows us to remove __generic_dma_ops
from common headers.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
CC: linux@armlinux.org.uk
CC: catalin.marinas@arm.com
CC: will.deacon@arm.com
CC: boris.ostrovsky@oracle.com
CC: jgross@suse.com
CC: Julien Grall <julien.grall@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
c7f765b5d6 f2fs: sanity check segment count
commit b9dd46188e upstream.

F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.

Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
275e2edbb8 net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
[ Upstream commit 922c60e89d ]

If an error is encountered in mdio_mux_init(), the error path will call
mdiobus_free().  Since mdiobus_register() has been called prior to
mdio_mux_init(), the bus->state will not be MDIOBUS_UNREGISTERED.  This
causes a BUG_ON() in mdiobus_free().  To correct this issue, add an
error path for mdio_mux_init() which calls mdiobus_unregister() prior to
mdiobus_free().

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Fixes: 98bc865a1e ("net: mdio-mux: Add MDIO mux driver for iProc SoCs")
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
ced12308e5 bpf: don't let ldimm64 leak map addresses on unprivileged
[ Upstream commit 0d0e57697f ]

The patch fixes two things at once:

1) It checks the env->allow_ptr_leaks and only prints the map address to
   the log if we have the privileges to do so, otherwise it just dumps 0
   as we would when kptr_restrict is enabled on %pK. Given the latter is
   off by default and not every distro sets it, I don't want to rely on
   this, hence the 0 by default for unprivileged.

2) Printing of ldimm64 in the verifier log is currently broken in that
   we don't print the full immediate, but only the 32 bit part of the
   first insn part for ldimm64. Thus, fix this up as well; it's okay to
   access, since we verified all ldimm64 earlier already (including just
   constants) through replace_map_fd_with_map_ptr().

Fixes: 1be7f75d16 ("bpf: enable non-root eBPF programs")
Fixes: cbd3570086 ("bpf: verifier (add ability to receive verification log)")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
7a0a483ec5 bnxt_en: allocate enough space for ->ntp_fltr_bmap
[ Upstream commit ac45bd93a5 ]

We have the number of longs, but we need to calculate the number of
bytes required.

Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
f8e3892f9f tcp: randomize timestamps on syncookies
[ Upstream commit 84b114b984 ]

Whole point of randomization was to hide server uptime, but an attacker
can simply start a syn flood and TCP generates 'old style' timestamps,
directly revealing server jiffies value.

Also, TSval sent by the server to a particular remote address vary
depending on syncookies being sent or not, potentially triggering PAWS
drops for innocent clients.

Lets implement proper randomization, including for SYNcookies.

Also we do not need to export sysctl_tcp_timestamps, since it is not
used from a module.

In v2, I added Florian feedback and contribution, adding tsoff to
tcp_get_cookie_sock().

v3 removed one unused variable in tcp_v4_connect() as Florian spotted.

Fixes: 95a22caee3 ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
3960afa2e0 ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
[ Upstream commit 242d3a49a2 ]

For each netns (except init_net), we initialize its null entry
in 3 places:

1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
   loopback registers

Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, thus we have to do that after we add
idev to loopback. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier. This is similar to commit 2f460933f5
("ipv6: initialize route null entry in addrconf_init()") which
fixes init_net.

Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.

Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
5ee83127ac ipv6: initialize route null entry in addrconf_init()
[ Upstream commit 2f460933f5 ]

Andrey reported a crash on init_net.ipv6.ip6_null_entry->rt6i_idev
since it is always NULL.

This is clearly wrong, we have code to initialize it to loopback_dev,
unfortunately the order is still not correct.

loopback_dev is registered very early during boot, we lose a chance
to re-initialize it in notifier. addrconf_init() is called after
ip6_route_init(), which means we have no chance to correct it.

Fix it by moving this initialization explicitly after
ipv6_add_dev(init_net.loopback_dev) in addrconf_init().

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
0913e57331 rtnetlink: NUL-terminate IFLA_PHYS_PORT_NAME string
[ Upstream commit 77ef033b68 ]

IFLA_PHYS_PORT_NAME is a string attribute, so terminate it with \0.
Otherwise libnl3 fails to validate netlink messages with this attribute.
"ip -detail a" assumes too that the attribute is NUL-terminated when
printing it. It often was, due to padding.

I noticed this as libvirtd failing to start on a system with sfc driver
after upgrading it to Linux 4.11, i.e. when sfc added support for
phys_port_name.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
165a007e51 ipv4, ipv6: ensure raw socket message is big enough to hold an IP header
[ Upstream commit 86f4c90a1c ]

raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied
from the userspace contains the IPv4/IPv6 header, so if too few bytes are
copied, parts of the header may remain uninitialized.

This bug has been detected with KMSAN.

For the record, the KMSAN report:

==================================================================
BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0
inter: 0
CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x143/0x1b0 lib/dump_stack.c:52
 kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
 __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
 nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577
 ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn ./include/linux/netfilter.h:102
 nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310
 nf_hook ./include/linux/netfilter.h:212
 NF_HOOK ./include/linux/netfilter.h:255
 rawv6_send_hdrinc net/ipv6/raw.c:673
 rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919
 inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
 SyS_sendto+0xbc/0xe0 net/socket.c:1664
 do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x436e03
RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000
origin: 00000000d9400053
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362
 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
 kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
 slab_alloc_node mm/slub.c:2735
 __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
 __kmalloc_reserve net/core/skbuff.c:138
 __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
 alloc_skb ./include/linux/skbuff.h:933
 alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
 sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
 sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
 rawv6_send_hdrinc net/ipv6/raw.c:638
 rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919
 inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
 SyS_sendto+0xbc/0xe0 net/socket.c:1664
 do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================

, triggered by the following syscalls:
  socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
  sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM

A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket
instead of a PF_INET6 one.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:02 +02:00
86a9a0884d tcp: do not inherit fastopen_req from parent
[ Upstream commit 8b485ce698 ]

Under fuzzer stress, it is possible that a child gets a non NULL
fastopen_req pointer from its parent at accept() time, when/if parent
morphs from listener to active session.

We need to make sure this can not happen, by clearing the field after
socket cloning.

BUG: Double free or freeing an invalid pointer
Unexpected shadow byte: 0xFB
CPU: 3 PID: 20933 Comm: syz-executor3 Not tainted 4.11.0+ #306
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x292/0x395 lib/dump_stack.c:52
 kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
 kasan_report_double_free+0x5c/0x70 mm/kasan/report.c:185
 kasan_slab_free+0x9d/0xc0 mm/kasan/kasan.c:580
 slab_free_hook mm/slub.c:1357 [inline]
 slab_free_freelist_hook mm/slub.c:1379 [inline]
 slab_free mm/slub.c:2961 [inline]
 kfree+0xe8/0x2b0 mm/slub.c:3882
 tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
 tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
 inet_child_forget+0xb8/0x600 net/ipv4/inet_connection_sock.c:898
 inet_csk_reqsk_queue_add+0x1e7/0x250
net/ipv4/inet_connection_sock.c:928
 tcp_get_cookie_sock+0x21a/0x510 net/ipv4/syncookies.c:217
 cookie_v4_check+0x1a19/0x28b0 net/ipv4/syncookies.c:384
 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1384 [inline]
 tcp_v4_do_rcv+0x731/0x940 net/ipv4/tcp_ipv4.c:1421
 tcp_v4_rcv+0x2dc0/0x31c0 net/ipv4/tcp_ipv4.c:1715
 ip_local_deliver_finish+0x4cc/0xc20 net/ipv4/ip_input.c:216
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ip_local_deliver+0x1ce/0x700 net/ipv4/ip_input.c:257
 dst_input include/net/dst.h:492 [inline]
 ip_rcv_finish+0xb1d/0x20b0 net/ipv4/ip_input.c:396
 NF_HOOK include/linux/netfilter.h:257 [inline]
 ip_rcv+0xd8c/0x19c0 net/ipv4/ip_input.c:487
 __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4210
 __netif_receive_skb+0x2a/0x1a0 net/core/dev.c:4248
 process_backlog+0xe5/0x6c0 net/core/dev.c:4868
 napi_poll net/core/dev.c:5270 [inline]
 net_rx_action+0xe70/0x18e0 net/core/dev.c:5335
 __do_softirq+0x2fb/0xb99 kernel/softirq.c:284
 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:899
 </IRQ>
 do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
 do_softirq kernel/softirq.c:176 [inline]
 __local_bh_enable_ip+0x1cf/0x1e0 kernel/softirq.c:181
 local_bh_enable include/linux/bottom_half.h:31 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:931 [inline]
 ip_finish_output2+0x9ab/0x15e0 net/ipv4/ip_output.c:230
 ip_finish_output+0xa35/0xdf0 net/ipv4/ip_output.c:316
 NF_HOOK_COND include/linux/netfilter.h:246 [inline]
 ip_output+0x1f6/0x7b0 net/ipv4/ip_output.c:404
 dst_output include/net/dst.h:486 [inline]
 ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
 ip_queue_xmit+0x9a8/0x1a10 net/ipv4/ip_output.c:503
 tcp_transmit_skb+0x1ade/0x3470 net/ipv4/tcp_output.c:1057
 tcp_write_xmit+0x79e/0x55b0 net/ipv4/tcp_output.c:2265
 __tcp_push_pending_frames+0xfa/0x3a0 net/ipv4/tcp_output.c:2450
 tcp_push+0x4ee/0x780 net/ipv4/tcp.c:683
 tcp_sendmsg+0x128d/0x39b0 net/ipv4/tcp.c:1342
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x446059
RSP: 002b:00007faa6761fb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 0000000000446059
RDX: 0000000000000001 RSI: 0000000020ba3fcd RDI: 0000000000000017
RBP: 00000000006e40a0 R08: 0000000020ba4ff0 R09: 0000000000000010
R10: 0000000020000000 R11: 0000000000000282 R12: 0000000000708150
R13: 0000000000000000 R14: 00007faa676209c0 R15: 00007faa67620700
Object at ffff88003b5bbcb8, in cache kmalloc-64 size: 64
Allocated:
PID = 20909
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
 kmalloc include/linux/slab.h:490 [inline]
 kzalloc include/linux/slab.h:663 [inline]
 tcp_sendmsg_fastopen net/ipv4/tcp.c:1094 [inline]
 tcp_sendmsg+0x221a/0x39b0 net/ipv4/tcp.c:1139
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed:
PID = 20909
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
 save_stack+0x43/0xd0 mm/kasan/kasan.c:513
 set_track mm/kasan/kasan.c:525 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
 slab_free_hook mm/slub.c:1357 [inline]
 slab_free_freelist_hook mm/slub.c:1379 [inline]
 slab_free mm/slub.c:2961 [inline]
 kfree+0xe8/0x2b0 mm/slub.c:3882
 tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
 tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
 __inet_stream_connect+0x20c/0xf90 net/ipv4/af_inet.c:593
 tcp_sendmsg_fastopen net/ipv4/tcp.c:1111 [inline]
 tcp_sendmsg+0x23a8/0x39b0 net/ipv4/tcp.c:1139
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x660/0x810 net/socket.c:1696
 SyS_sendto+0x40/0x50 net/socket.c:1664
 entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 7db92362d2 ("tcp: fix potential double free issue for fastopen_req")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
0b1d889cb0 net: usb: qmi_wwan: add Telit ME910 support
[ Upstream commit 4c54dc0277 ]

This patch adds support for Telit ME910 PID 0x1100.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
23e761f37b net: ipv6: Do not duplicate DAD on link up
[ Upstream commit 6d717134a1 ]

Andrey reported a warning triggered by the rcu code:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 5911 at lib/debugobjects.c:289
debug_print_object+0x175/0x210
ODEBUG: activate active (active state 1) object type: rcu_head hint:
        (null)
Modules linked in:
CPU: 1 PID: 5911 Comm: a.out Not tainted 4.11.0-rc8+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x192/0x22d lib/dump_stack.c:52
 __warn+0x19f/0x1e0 kernel/panic.c:549
 warn_slowpath_fmt+0xe0/0x120 kernel/panic.c:564
 debug_print_object+0x175/0x210 lib/debugobjects.c:286
 debug_object_activate+0x574/0x7e0 lib/debugobjects.c:442
 debug_rcu_head_queue kernel/rcu/rcu.h:75
 __call_rcu.constprop.76+0xff/0x9c0 kernel/rcu/tree.c:3229
 call_rcu_sched+0x12/0x20 kernel/rcu/tree.c:3288
 rt6_rcu_free net/ipv6/ip6_fib.c:158
 rt6_release+0x1ea/0x290 net/ipv6/ip6_fib.c:188
 fib6_del_route net/ipv6/ip6_fib.c:1461
 fib6_del+0xa42/0xdc0 net/ipv6/ip6_fib.c:1500
 __ip6_del_rt+0x100/0x160 net/ipv6/route.c:2174
 ip6_del_rt+0x140/0x1b0 net/ipv6/route.c:2187
 __ipv6_ifa_notify+0x269/0x780 net/ipv6/addrconf.c:5520
 addrconf_ifdown+0xe60/0x1a20 net/ipv6/addrconf.c:3672
...

Andrey's reproducer program runs in a very tight loop, calling
'unshare -n' and then spawning 2 sets of 14 threads running random ioctl
calls. The relevant networking sequence:

1. New network namespace created via unshare -n
- ip6tnl0 device is created in down state

2. address added to ip6tnl0
- equivalent to ip -6 addr add dev ip6tnl0 fd00::bb/1
- DAD is started on the address and when it completes the host
  route is inserted into the FIB

3. ip6tnl0 is brought up
- the new fixup_permanent_addr function restarts DAD on the address

4. exit namespace
- teardown / cleanup sequence starts
- once in a blue moon, lo teardown appears to happen BEFORE teardown
  of ip6tunl0
  + down on 'lo' removes the host route from the FIB since the dst->dev
    for the route is loobback
  + host route added to rcu callback list
    * rcu callback has not run yet, so rt is NOT on the gc list so it has
      NOT been marked obsolete

5. in parallel to 4. worker_thread runs addrconf_dad_completed
- DAD on the address on ip6tnl0 completes
- calls ipv6_ifa_notify which inserts the host route

All of that happens very quickly. The result is that a host route that
has been deleted from the IPv6 FIB and added to the RCU list is re-inserted
into the FIB.

The exit namespace eventually gets to cleaning up ip6tnl0 which removes the
host route from the FIB again, calls the rcu function for cleanup -- and
triggers the double rcu trace.

The root cause is duplicate DAD on the address -- steps 2 and 3. Arguably,
DAD should not be started in step 2. The interface is in the down state,
so it can not really send out requests for the address which makes starting
DAD pointless.

Since the second DAD was introduced by a recent change, seems appropriate
to use it for the Fixes tag and have the fixup function only start DAD for
addresses in the PREDAD state which occurs in addrconf_ifdown if the
address is retained.

Big thanks to Andrey for isolating a reliable reproducer for this problem.
Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
a1baca92af tcp: fix wraparound issue in tcp_lp
[ Upstream commit a9f11f963a ]

Be careful when comparing tcp_time_stamp to some u32 quantity,
otherwise result can be surprising.

Fixes: 7c106d7e78 ("[TCP]: TCP Low Priority congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
f94cc32ccb bpf, arm64: fix jit branch offset related to ldimm64
[ Upstream commit ddc665a4bb ]

When the instruction right before the branch destination is
a 64 bit load immediate, we currently calculate the wrong
jump offset in the ctx->offset[] array as we only account
one instruction slot for the 64 bit load immediate although
it uses two BPF instructions. Fix it up by setting the offset
into the right slot after we incremented the index.

Before (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  54ffff82  b.cs 0x00000020
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

After (ldimm64 test 1):

  [...]
  00000020:  52800007  mov w7, #0x0 // #0
  00000024:  d2800060  mov x0, #0x3 // #3
  00000028:  d2800041  mov x1, #0x2 // #2
  0000002c:  eb01001f  cmp x0, x1
  00000030:  540000a2  b.cs 0x00000044
  00000034:  d29fffe7  mov x7, #0xffff // #65535
  00000038:  f2bfffe7  movk x7, #0xffff, lsl #16
  0000003c:  f2dfffe7  movk x7, #0xffff, lsl #32
  00000040:  f2ffffe7  movk x7, #0xffff, lsl #48
  00000044:  d29dddc7  mov x7, #0xeeee // #61166
  00000048:  f2bdddc7  movk x7, #0xeeee, lsl #16
  0000004c:  f2ddddc7  movk x7, #0xeeee, lsl #32
  00000050:  f2fdddc7  movk x7, #0xeeee, lsl #48
  [...]

Also, add a couple of test cases to make sure JITs pass
this test. Tested on Cavium ThunderX ARMv8. The added
test cases all pass after the fix.

Fixes: 8eee539dde ("arm64: bpf: fix out-of-bounds read in bpf2a64_offset()")
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
34acfbf08c bpf: enhance verifier to understand stack pointer arithmetic
[ Upstream commit 332270fdc8 ]

llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.

Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
5a5b34164f geneve: fix incorrect setting of UDP checksum flag
[ Upstream commit 5e0740c445 ]

Creating a geneve link with 'udpcsum' set results in a creation of link
for which UDP checksum will NOT be computed on outbound packets, as can
be seen below.

11: gen0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether c2:85:27:b6:b4:15 brd ff:ff:ff:ff:ff:ff promiscuity 0
    geneve id 200 remote 192.168.13.1 dstport 6081 noudpcsum

Similarly, creating a link with 'noudpcsum' set results in a creation
of link for which UDP checksum will be computed on outbound packets.

Fixes: 9b4437a5b8 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
011fbfb296 tcp: fix access to sk->sk_state in tcp_poll()
[ Upstream commit d68be71ea1 ]

avoid direct access to sk->sk_state when tcp_poll() is called on a socket
using active TCP fastopen with deferred connect. Use local variable
'state', which stores the result of sk_state_load(), like it was done in
commit 00fd38d938 ("tcp: ensure proper barriers in lockless contexts").

Fixes: 19f6d3f3c8 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
c0e5b847c7 net: macb: fix phy interrupt parsing
[ Upstream commit ae3696c167 ]

Since 83a77e9ec4, the phydev irq is explicitly set to PHY_POLL when
there is no pdata. It doesn't work on DT enabled platforms because the
phydev irq is already set by libphy before.

Fixes: 83a77e9ec4 ("net: macb: Added PCI wrapper for Platform Driver.")
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
acc7c954b0 refcount: change EXPORT_SYMBOL markings
commit d557d1b58b upstream.

Now that kref is using the refcount apis, the _GPL markings are getting
exported to places that it previously wasn't.  Now kref.h is GPLv2
licensed, so any non-GPL code using it better be talking to some
lawyers, but changing api markings isn't considered "nice", so let's fix
this up.

Cc: Philip Müller <philm@manjaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-14 14:06:01 +02:00
e2b8851f1f sparc64: fix fault handling in NGbzero.S and GENbzero.S
commit 3c7f622120 upstream.

When any of the functions contained in NGbzero.S and GENbzero.S
vector through *bzero_from_clear_user, we may end up taking a
fault when executing one of the store alternate address space
instructions. If this happens, the exception handler does not
restore the %asi register.

This commit fixes the issue by introducing a new exception
handler that ensures the %asi register is restored when
a fault is handled.

Orabug: 25577560

Signed-off-by: Dave Aldridge <david.j.aldridge@oracle.com>
Reviewed-by: Rob Gardner <rob.gardner@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:01 +02:00
c98f47c1f6 brcmfmac: Make skb header writable before use
commit 9cc4b7cb86 upstream.

The driver was making changes to the skb_header without
ensuring it was writable (i.e. uncloned).
This patch also removes some boiler plate header size
checking/adjustment code as that is also handled by the
skb_cow_header function used to make header writable.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:00 +02:00
7f073b7e4d brcmfmac: Ensure pointer correctly set if skb data location changes
commit 455a1eb465 upstream.

The incoming skb header may be resized if header space is
insufficient, which might change the data adddress in the skb.
Ensure that a cached pointer to that data is correctly set by
moving assignment to after any possible changes.

Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:00 +02:00
b998524b6c power: supply: lp8788: prevent out of bounds array access
commit bdd9968d35 upstream.

val might become 7 in which case stime[7] (array of length 7) would be
accessed during the scnprintf call later and that will cause issues.
Obviously, string concatenation is not intended here so just a comma needs
to be added to fix the issue.

Fixes: 98a2766493 ("power_supply: Add new lp8788 charger driver")
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@gmail.com>
Acked-by: Milo Kim <milo.kim@ti.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:00 +02:00
648ada88cb drm/sti: fix GDP size to support up to UHD resolution
commit 2f410f88c0 upstream.

On stih407-410 chip family the GDP layers are able to support up to UHD
resolution (3840 x 2160).

Signed-off-by: Vincent Abriou <vincent.abriou@st.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Tested-by: Lee Jones <lee.jones@linaro.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1490280292-30466-1-git-send-email-vincent.abriou@st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:00 +02:00
40a1937317 dm ioctl: prevent stack leak in dm ioctl call
commit 4617f564c0 upstream.

When calling a dm ioctl that doesn't process any data
(IOCTL_FLAGS_NO_PARAMS), the contents of the data field in struct
dm_ioctl are left initialized.  Current code is incorrectly extending
the size of data copied back to user, causing the contents of kernel
stack to be leaked to user.  Fix by only copying contents before data
and allow the functions processing the ioctl to override.

Signed-off-by: Adrian Salido <salidoa@google.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-14 14:06:00 +02:00
657 changed files with 6341 additions and 3232 deletions

View File

@ -59,20 +59,28 @@ button driver uses the following 3 modes in order not to trigger issues.
If the userspace hasn't been prepared to ignore the unreliable "opened"
events and the unreliable initial state notification, Linux users can use
the following kernel parameters to handle the possible issues:
A. button.lid_init_state=open:
A. button.lid_init_state=method:
When this option is specified, the ACPI button driver reports the
initial lid state using the returning value of the _LID control method
and whether the "opened"/"closed" events are paired fully relies on the
firmware implementation.
This option can be used to fix some platforms where the returning value
of the _LID control method is reliable but the initial lid state
notification is missing.
This option is the default behavior during the period the userspace
isn't ready to handle the buggy AML tables.
B. button.lid_init_state=open:
When this option is specified, the ACPI button driver always reports the
initial lid state as "opened" and whether the "opened"/"closed" events
are paired fully relies on the firmware implementation.
This may fix some platforms where the returning value of the _LID
control method is not reliable and the initial lid state notification is
missing.
This option is the default behavior during the period the userspace
isn't ready to handle the buggy AML tables.
If the userspace has been prepared to ignore the unreliable "opened" events
and the unreliable initial state notification, Linux users should always
use the following kernel parameter:
B. button.lid_init_state=ignore:
C. button.lid_init_state=ignore:
When this option is specified, the ACPI button driver never reports the
initial lid state and there is a compensation mechanism implemented to
ensure that the reliable "closed" notifications can always be delievered

View File

@ -11,24 +11,56 @@ in AArch64 Linux.
The kernel configures the translation tables so that translations made
via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of
the virtual address ignored by the translation hardware. This frees up
this byte for application use, with the following caveats:
this byte for application use.
(1) The kernel requires that all user addresses passed to EL1
are tagged with tag 0x00. This means that any syscall
parameters containing user virtual addresses *must* have
their top byte cleared before trapping to the kernel.
(2) Non-zero tags are not preserved when delivering signals.
This means that signal handlers in applications making use
of tags cannot rely on the tag information for user virtual
addresses being maintained for fields inside siginfo_t.
One exception to this rule is for signals raised in response
to watchpoint debug exceptions, where the tag information
will be preserved.
Passing tagged addresses to the kernel
--------------------------------------
(3) Special care should be taken when using tagged pointers,
since it is likely that C compilers will not hazard two
virtual addresses differing only in the upper byte.
All interpretation of userspace memory addresses by the kernel assumes
an address tag of 0x00.
This includes, but is not limited to, addresses found in:
- pointer arguments to system calls, including pointers in structures
passed to system calls,
- the stack pointer (sp), e.g. when interpreting it to deliver a
signal,
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
Using non-zero address tags in any of these locations may result in an
error code being returned, a (fatal) signal being raised, or other modes
of failure.
For these reasons, passing non-zero address tags to the kernel via
system calls is forbidden, and using a non-zero address tag for sp is
strongly discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
visibility.
Preserving tags
---------------
Non-zero tags are not preserved when delivering signals. This means that
signal handlers in applications making use of tags cannot rely on the
tag information for user virtual addresses being maintained for fields
inside siginfo_t. One exception to this rule is for signals raised in
response to watchpoint debug exceptions, where the tag information will
be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
Other considerations
--------------------
Special care should be taken when using tagged pointers, since it is
likely that C compilers will not hazard two virtual addresses differing
only in the upper byte.

View File

@ -1,6 +1,6 @@
VERSION = 4
PATCHLEVEL = 11
SUBLEVEL = 0
SUBLEVEL = 6
EXTRAVERSION =
NAME = Fearless Coyote

View File

@ -1199,8 +1199,10 @@ SYSCALL_DEFINE4(osf_wait4, pid_t, pid, int __user *, ustatus, int, options,
if (!access_ok(VERIFY_WRITE, ur, sizeof(*ur)))
return -EFAULT;
err = 0;
err |= put_user(status, ustatus);
err = put_user(status, ustatus);
if (ret < 0)
return err ? err : ret;
err |= __put_user(r.ru_utime.tv_sec, &ur->ru_utime.tv_sec);
err |= __put_user(r.ru_utime.tv_usec, &ur->ru_utime.tv_usec);
err |= __put_user(r.ru_stime.tv_sec, &ur->ru_stime.tv_sec);

View File

@ -162,9 +162,10 @@
};
adc0: adc@f8018000 {
atmel,adc-vref = <3300>;
atmel,adc-channels-used = <0xfe>;
pinctrl-0 = <
&pinctrl_adc0_adtrg
&pinctrl_adc0_ad0
&pinctrl_adc0_ad1
&pinctrl_adc0_ad2
&pinctrl_adc0_ad3
@ -172,8 +173,6 @@
&pinctrl_adc0_ad5
&pinctrl_adc0_ad6
&pinctrl_adc0_ad7
&pinctrl_adc0_ad8
&pinctrl_adc0_ad9
>;
status = "okay";
};

View File

@ -12,23 +12,6 @@
model = "Freescale i.MX6 SoloX SDB RevB Board";
};
&cpu0 {
operating-points = <
/* kHz uV */
996000 1250000
792000 1175000
396000 1175000
198000 1175000
>;
fsl,soc-operating-points = <
/* ARM kHz SOC uV */
996000 1250000
792000 1175000
396000 1175000
198000 1175000
>;
};
&i2c1 {
clock-frequency = <100000>;
pinctrl-names = "default";

View File

@ -137,8 +137,8 @@ netcp: netcp@26000000 {
/* NetCP address range */
ranges = <0 0x26000000 0x1000000>;
clocks = <&clkpa>, <&clkcpgmac>, <&chipclk12>, <&clkosr>;
clock-names = "pa_clk", "ethss_clk", "cpts", "osr_clk";
clocks = <&clkpa>, <&clkcpgmac>, <&chipclk12>;
clock-names = "pa_clk", "ethss_clk", "cpts";
dma-coherent;
ti,navigator-dmas = <&dma_gbe 0>,

View File

@ -232,6 +232,14 @@
};
};
osr: sram@70000000 {
compatible = "mmio-sram";
reg = <0x70000000 0x10000>;
#address-cells = <1>;
#size-cells = <1>;
clocks = <&clkosr>;
};
dspgpio0: keystone_dsp_gpio@02620240 {
compatible = "ti,keystone-dsp-gpio";
gpio-controller;

View File

@ -15,6 +15,9 @@ struct dev_archdata {
#endif
#ifdef CONFIG_ARM_DMA_USE_IOMMU
struct dma_iommu_mapping *mapping;
#endif
#ifdef CONFIG_XEN
const struct dma_map_ops *dev_dma_ops;
#endif
bool dma_coherent;
};

View File

@ -16,19 +16,9 @@
extern const struct dma_map_ops arm_dma_ops;
extern const struct dma_map_ops arm_coherent_dma_ops;
static inline const struct dma_map_ops *__generic_dma_ops(struct device *dev)
{
if (dev && dev->dma_ops)
return dev->dma_ops;
return &arm_dma_ops;
}
static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
if (xen_initial_domain())
return xen_dma_ops;
else
return __generic_dma_ops(NULL);
return &arm_dma_ops;
}
#define HAVE_ARCH_DMA_SUPPORTED 1

View File

@ -41,7 +41,7 @@ static const enum fixed_addresses __end_of_fixed_addresses =
#define FIXMAP_PAGE_COMMON (L_PTE_YOUNG | L_PTE_PRESENT | L_PTE_XN | L_PTE_DIRTY)
#define FIXMAP_PAGE_NORMAL (FIXMAP_PAGE_COMMON | L_PTE_MT_WRITEBACK)
#define FIXMAP_PAGE_NORMAL (pgprot_kernel | L_PTE_XN)
#define FIXMAP_PAGE_RO (FIXMAP_PAGE_NORMAL | L_PTE_RDONLY)
/* Used by set_fixmap_(io|nocache), both meant for mapping a device */

View File

@ -31,7 +31,8 @@ void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table);
int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);

View File

@ -18,13 +18,18 @@ enum {
};
#endif
struct mod_plt_sec {
struct elf32_shdr *plt;
int plt_count;
};
struct mod_arch_specific {
#ifdef CONFIG_ARM_UNWIND
struct unwind_table *unwind[ARM_SEC_MAX];
#endif
#ifdef CONFIG_ARM_MODULE_PLTS
struct elf32_shdr *plt;
int plt_count;
struct mod_plt_sec core;
struct mod_plt_sec init;
#endif
};

View File

@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
* Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@ -31,9 +31,17 @@ struct plt_entries {
u32 lit[PLT_ENT_COUNT];
};
static bool in_init(const struct module *mod, unsigned long loc)
{
return loc - (u32)mod->init_layout.base < mod->init_layout.size;
}
u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
{
struct plt_entries *plt = (struct plt_entries *)mod->arch.plt->sh_addr;
struct mod_plt_sec *pltsec = !in_init(mod, loc) ? &mod->arch.core :
&mod->arch.init;
struct plt_entries *plt = (struct plt_entries *)pltsec->plt->sh_addr;
int idx = 0;
/*
@ -41,9 +49,9 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
* relocations are sorted, this will be the last entry we allocated.
* (if one exists).
*/
if (mod->arch.plt_count > 0) {
plt += (mod->arch.plt_count - 1) / PLT_ENT_COUNT;
idx = (mod->arch.plt_count - 1) % PLT_ENT_COUNT;
if (pltsec->plt_count > 0) {
plt += (pltsec->plt_count - 1) / PLT_ENT_COUNT;
idx = (pltsec->plt_count - 1) % PLT_ENT_COUNT;
if (plt->lit[idx] == val)
return (u32)&plt->ldr[idx];
@ -53,8 +61,8 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
plt++;
}
mod->arch.plt_count++;
BUG_ON(mod->arch.plt_count * PLT_ENT_SIZE > mod->arch.plt->sh_size);
pltsec->plt_count++;
BUG_ON(pltsec->plt_count * PLT_ENT_SIZE > pltsec->plt->sh_size);
if (!idx)
/* Populate a new set of entries */
@ -129,7 +137,7 @@ static bool duplicate_rel(Elf32_Addr base, const Elf32_Rel *rel, int num)
/* Count how many PLT entries we may need */
static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
const Elf32_Rel *rel, int num)
const Elf32_Rel *rel, int num, Elf32_Word dstidx)
{
unsigned int ret = 0;
const Elf32_Sym *s;
@ -144,13 +152,17 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
case R_ARM_THM_JUMP24:
/*
* We only have to consider branch targets that resolve
* to undefined symbols. This is not simply a heuristic,
* it is a fundamental limitation, since the PLT itself
* is part of the module, and needs to be within range
* as well, so modules can never grow beyond that limit.
* to symbols that are defined in a different section.
* This is not simply a heuristic, it is a fundamental
* limitation, since there is no guaranteed way to emit
* PLT entries sufficiently close to the branch if the
* section size exceeds the range of a branch
* instruction. So ignore relocations against defined
* symbols if they live in the same section as the
* relocation target.
*/
s = syms + ELF32_R_SYM(rel[i].r_info);
if (s->st_shndx != SHN_UNDEF)
if (s->st_shndx == dstidx)
break;
/*
@ -161,7 +173,12 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
* So we need to support them, but there is no need to
* take them into consideration when trying to optimize
* this code. So let's only check for duplicates when
* the addend is zero.
* the addend is zero. (Note that calls into the core
* module via init PLT entries could involve section
* relative symbol references with non-zero addends, for
* which we may end up emitting duplicates, but the init
* PLT is released along with the rest of the .init
* region as soon as module loading completes.)
*/
if (!is_zero_addend_relocation(base, rel + i) ||
!duplicate_rel(base, rel, i))
@ -174,7 +191,8 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
char *secstrings, struct module *mod)
{
unsigned long plts = 0;
unsigned long core_plts = 0;
unsigned long init_plts = 0;
Elf32_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
Elf32_Sym *syms = NULL;
@ -184,13 +202,15 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
*/
for (s = sechdrs; s < sechdrs_end; ++s) {
if (strcmp(".plt", secstrings + s->sh_name) == 0)
mod->arch.plt = s;
mod->arch.core.plt = s;
else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
mod->arch.init.plt = s;
else if (s->sh_type == SHT_SYMTAB)
syms = (Elf32_Sym *)s->sh_addr;
}
if (!mod->arch.plt) {
pr_err("%s: module PLT section missing\n", mod->name);
if (!mod->arch.core.plt || !mod->arch.init.plt) {
pr_err("%s: module PLT section(s) missing\n", mod->name);
return -ENOEXEC;
}
if (!syms) {
@ -213,16 +233,29 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
/* sort by type and symbol index */
sort(rels, numrels, sizeof(Elf32_Rel), cmp_rel, NULL);
plts += count_plts(syms, dstsec->sh_addr, rels, numrels);
if (strncmp(secstrings + dstsec->sh_name, ".init", 5) != 0)
core_plts += count_plts(syms, dstsec->sh_addr, rels,
numrels, s->sh_info);
else
init_plts += count_plts(syms, dstsec->sh_addr, rels,
numrels, s->sh_info);
}
mod->arch.plt->sh_type = SHT_NOBITS;
mod->arch.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
mod->arch.plt->sh_addralign = L1_CACHE_BYTES;
mod->arch.plt->sh_size = round_up(plts * PLT_ENT_SIZE,
sizeof(struct plt_entries));
mod->arch.plt_count = 0;
mod->arch.core.plt->sh_type = SHT_NOBITS;
mod->arch.core.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
mod->arch.core.plt->sh_addralign = L1_CACHE_BYTES;
mod->arch.core.plt->sh_size = round_up(core_plts * PLT_ENT_SIZE,
sizeof(struct plt_entries));
mod->arch.core.plt_count = 0;
pr_debug("%s: plt=%x\n", __func__, mod->arch.plt->sh_size);
mod->arch.init.plt->sh_type = SHT_NOBITS;
mod->arch.init.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
mod->arch.init.plt->sh_addralign = L1_CACHE_BYTES;
mod->arch.init.plt->sh_size = round_up(init_plts * PLT_ENT_SIZE,
sizeof(struct plt_entries));
mod->arch.init.plt_count = 0;
pr_debug("%s: plt=%x, init.plt=%x\n", __func__,
mod->arch.core.plt->sh_size, mod->arch.init.plt->sh_size);
return 0;
}

View File

@ -1,3 +1,4 @@
SECTIONS {
.plt : { BYTE(0) }
.init.plt : { BYTE(0) }
}

View File

@ -80,7 +80,7 @@ __setup("fpe=", fpe_setup);
extern void init_default_cache_policy(unsigned long);
extern void paging_init(const struct machine_desc *desc);
extern void early_paging_init(const struct machine_desc *);
extern void early_mm_init(const struct machine_desc *);
extern void adjust_lowmem_bounds(void);
extern enum reboot_mode reboot_mode;
extern void setup_dma_zone(const struct machine_desc *desc);
@ -1088,7 +1088,7 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
#ifdef CONFIG_MMU
early_paging_init(mdesc);
early_mm_init(mdesc);
#endif
setup_dma_zone(mdesc);
xen_early_init();

View File

@ -93,12 +93,6 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
return 1;
}
int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
kvm_inject_undefined(vcpu);
return 1;
}
static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
{
/*
@ -514,12 +508,7 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
return 1;
}
/**
* kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
static struct coproc_params decode_64bit_hsr(struct kvm_vcpu *vcpu)
{
struct coproc_params params;
@ -533,9 +522,38 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
params.Rt2 = (kvm_vcpu_get_hsr(vcpu) >> 10) & 0xf;
params.CRm = 0;
return params;
}
/**
* kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct coproc_params params = decode_64bit_hsr(vcpu);
return emulate_cp15(vcpu, &params);
}
/**
* kvm_handle_cp14_64 -- handles a mrrc/mcrr trap on a guest CP14 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct coproc_params params = decode_64bit_hsr(vcpu);
/* raz_wi cp14 */
pm_fake(vcpu, &params, NULL);
/* handled */
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
return 1;
}
static void reset_coproc_regs(struct kvm_vcpu *vcpu,
const struct coproc_reg *table, size_t num)
{
@ -546,12 +564,7 @@ static void reset_coproc_regs(struct kvm_vcpu *vcpu,
table[i].reset(vcpu, &table[i]);
}
/**
* kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
static struct coproc_params decode_32bit_hsr(struct kvm_vcpu *vcpu)
{
struct coproc_params params;
@ -565,9 +578,37 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
params.Op2 = (kvm_vcpu_get_hsr(vcpu) >> 17) & 0x7;
params.Rt2 = 0;
return params;
}
/**
* kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct coproc_params params = decode_32bit_hsr(vcpu);
return emulate_cp15(vcpu, &params);
}
/**
* kvm_handle_cp14_32 -- handles a mrc/mcr trap on a guest CP14 access
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct coproc_params params = decode_32bit_hsr(vcpu);
/* raz_wi cp14 */
pm_fake(vcpu, &params, NULL);
/* handled */
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
return 1;
}
/******************************************************************************
* Userspace API
*****************************************************************************/

View File

@ -95,9 +95,9 @@ static exit_handle_fn arm_exit_handlers[] = {
[HSR_EC_WFI] = kvm_handle_wfx,
[HSR_EC_CP15_32] = kvm_handle_cp15_32,
[HSR_EC_CP15_64] = kvm_handle_cp15_64,
[HSR_EC_CP14_MR] = kvm_handle_cp14_access,
[HSR_EC_CP14_MR] = kvm_handle_cp14_32,
[HSR_EC_CP14_LS] = kvm_handle_cp14_load_store,
[HSR_EC_CP14_64] = kvm_handle_cp14_access,
[HSR_EC_CP14_64] = kvm_handle_cp14_64,
[HSR_EC_CP_0_13] = kvm_handle_cp_0_13_access,
[HSR_EC_CP10_ID] = kvm_handle_cp10_id,
[HSR_EC_HVC] = handle_hvc,

View File

@ -2,6 +2,8 @@
# Makefile for Kernel-based Virtual Machine module, HYP part
#
ccflags-y += -fno-stack-protector
KVM=../../../../virt/kvm
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o

View File

@ -48,7 +48,9 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu, u32 *fpexc_host)
write_sysreg(HSTR_T(15), HSTR);
write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR);
val = read_sysreg(HDCR);
write_sysreg(val | HDCR_TPM | HDCR_TPMCR, HDCR);
val |= HDCR_TPM | HDCR_TPMCR; /* trap performance monitors */
val |= HDCR_TDRA | HDCR_TDOSA | HDCR_TDA; /* trap debug regs */
write_sysreg(val, HDCR);
}
static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)

View File

@ -95,7 +95,6 @@ __do_hyp_init:
@ - Write permission implies XN: disabled
@ - Instruction cache: enabled
@ - Data/Unified cache: enabled
@ - Memory alignment checks: enabled
@ - MMU: enabled (this code must be run from an identity mapping)
mrc p15, 4, r0, c1, c0, 0 @ HSCR
ldr r2, =HSCTLR_MASK
@ -103,8 +102,8 @@ __do_hyp_init:
mrc p15, 0, r1, c1, c0, 0 @ SCTLR
ldr r2, =(HSCTLR_EE | HSCTLR_FI | HSCTLR_I | HSCTLR_C)
and r1, r1, r2
ARM( ldr r2, =(HSCTLR_M | HSCTLR_A) )
THUMB( ldr r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_TE) )
ARM( ldr r2, =(HSCTLR_M) )
THUMB( ldr r2, =(HSCTLR_M | HSCTLR_TE) )
orr r1, r1, r2
orr r0, r0, r1
mcr p15, 4, r0, c1, c0, 0 @ HSCR

View File

@ -295,6 +295,13 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
assert_spin_locked(&kvm->mmu_lock);
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
/*
* Make sure the page table is still active, as another thread
* could have possibly freed the page table, while we released
* the lock.
*/
if (!READ_ONCE(kvm->arch.pgd))
break;
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
unmap_stage2_puds(kvm, pgd, addr, next);
@ -829,22 +836,22 @@ void stage2_unmap_vm(struct kvm *kvm)
* Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
* underlying level-2 and level-3 tables before freeing the actual level-1 table
* and setting the struct pointer to NULL.
*
* Note we don't need locking here as this is only called when the VM is
* destroyed, which can only be done once.
*/
void kvm_free_stage2_pgd(struct kvm *kvm)
{
if (kvm->arch.pgd == NULL)
return;
void *pgd = NULL;
spin_lock(&kvm->mmu_lock);
unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
if (kvm->arch.pgd) {
unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
pgd = READ_ONCE(kvm->arch.pgd);
kvm->arch.pgd = NULL;
}
spin_unlock(&kvm->mmu_lock);
/* Free the HW pgd, one page at a time */
free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
kvm->arch.pgd = NULL;
if (pgd)
free_pages_exact(pgd, S2_PGD_SIZE);
}
static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
@ -872,6 +879,9 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
pmd_t *pmd;
pud = stage2_get_pud(kvm, cache, addr);
if (!pud)
return NULL;
if (stage2_pud_none(*pud)) {
if (!cache)
return NULL;
@ -1170,11 +1180,13 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
* large. Otherwise, we may see kernel panics with
* CONFIG_DETECT_HUNG_TASK, CONFIG_LOCKUP_DETECTOR,
* CONFIG_LOCKDEP. Additionally, holding the lock too long
* will also starve other vCPUs.
* will also starve other vCPUs. We have to also make sure
* that the page tables are not freed while we released
* the lock.
*/
if (need_resched() || spin_needbreak(&kvm->mmu_lock))
cond_resched_lock(&kvm->mmu_lock);
cond_resched_lock(&kvm->mmu_lock);
if (!READ_ONCE(kvm->arch.pgd))
break;
next = stage2_pgd_addr_end(addr, end);
if (stage2_pgd_present(*pgd))
stage2_wp_puds(pgd, addr, next);

View File

@ -208,9 +208,10 @@ int kvm_psci_version(struct kvm_vcpu *vcpu)
static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
{
int ret = 1;
struct kvm *kvm = vcpu->kvm;
unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
int ret = 1;
switch (psci_fn) {
case PSCI_0_2_FN_PSCI_VERSION:
@ -230,7 +231,9 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
break;
case PSCI_0_2_FN_CPU_ON:
case PSCI_0_2_FN64_CPU_ON:
mutex_lock(&kvm->lock);
val = kvm_psci_vcpu_on(vcpu);
mutex_unlock(&kvm->lock);
break;
case PSCI_0_2_FN_AFFINITY_INFO:
case PSCI_0_2_FN64_AFFINITY_INFO:
@ -279,6 +282,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
@ -288,7 +292,9 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
val = PSCI_RET_SUCCESS;
break;
case KVM_PSCI_FN_CPU_ON:
mutex_lock(&kvm->lock);
val = kvm_psci_vcpu_on(vcpu);
mutex_unlock(&kvm->lock);
break;
default:
val = PSCI_RET_NOT_SUPPORTED;

View File

@ -2414,6 +2414,13 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
dma_ops = arm_get_dma_map_ops(coherent);
set_dma_ops(dev, dma_ops);
#ifdef CONFIG_XEN
if (xen_initial_domain()) {
dev->archdata.dev_dma_ops = dev->dma_ops;
dev->dma_ops = xen_dma_ops;
}
#endif
}
void arch_teardown_dma_ops(struct device *dev)

View File

@ -414,6 +414,11 @@ void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot)
FIXADDR_END);
BUG_ON(idx >= __end_of_fixed_addresses);
/* we only support device mappings until pgprot_kernel has been set */
if (WARN_ON(pgprot_val(prot) != pgprot_val(FIXMAP_PAGE_IO) &&
pgprot_val(pgprot_kernel) == 0))
return;
if (pgprot_val(prot))
set_pte_at(NULL, vaddr, pte,
pfn_pte(phys >> PAGE_SHIFT, prot));
@ -1492,7 +1497,7 @@ pgtables_remap lpae_pgtables_remap_asm;
* early_paging_init() recreates boot time page table setup, allowing machines
* to switch over to a high (>4G) address space on LPAE systems
*/
void __init early_paging_init(const struct machine_desc *mdesc)
static void __init early_paging_init(const struct machine_desc *mdesc)
{
pgtables_remap *lpae_pgtables_remap;
unsigned long pa_pgd;
@ -1560,7 +1565,7 @@ void __init early_paging_init(const struct machine_desc *mdesc)
#else
void __init early_paging_init(const struct machine_desc *mdesc)
static void __init early_paging_init(const struct machine_desc *mdesc)
{
long long offset;
@ -1616,7 +1621,6 @@ void __init paging_init(const struct machine_desc *mdesc)
{
void *zero_page;
build_mem_type_table();
prepare_page_table();
map_lowmem();
memblock_set_current_limit(arm_lowmem_limit);
@ -1636,3 +1640,9 @@ void __init paging_init(const struct machine_desc *mdesc)
empty_zero_page = virt_to_page(zero_page);
__flush_dcache_page(NULL, empty_zero_page);
}
void __init early_mm_init(const struct machine_desc *mdesc)
{
build_mem_type_table();
early_paging_init(mdesc);
}

View File

@ -147,10 +147,10 @@ __v7m_setup_cont:
@ Configure caches (if implemented)
teq r8, #0
stmneia r12, {r0-r6, lr} @ v7m_invalidate_l1 touches r0-r6
stmneia sp, {r0-r6, lr} @ v7m_invalidate_l1 touches r0-r6
blne v7m_invalidate_l1
teq r8, #0 @ re-evalutae condition
ldmneia r12, {r0-r6, lr}
ldmneia sp, {r0-r6, lr}
@ Configure the System Control Register to ensure 8-byte stack alignment
@ Note the STKALIGN bit is either RW or RAO.

View File

@ -774,6 +774,7 @@
clocks = <&sys_ctrl 2>, <&sys_ctrl 1>;
clock-names = "ciu", "biu";
resets = <&sys_ctrl PERIPH_RSTDIS0_MMC0>;
reset-names = "reset";
bus-width = <0x8>;
vmmc-supply = <&ldo19>;
pinctrl-names = "default";
@ -797,6 +798,7 @@
clocks = <&sys_ctrl 4>, <&sys_ctrl 3>;
clock-names = "ciu", "biu";
resets = <&sys_ctrl PERIPH_RSTDIS0_MMC1>;
reset-names = "reset";
vqmmc-supply = <&ldo7>;
vmmc-supply = <&ldo10>;
bus-width = <0x4>;
@ -815,6 +817,7 @@
clocks = <&sys_ctrl HI6220_MMC2_CIUCLK>, <&sys_ctrl HI6220_MMC2_CLK>;
clock-names = "ciu", "biu";
resets = <&sys_ctrl PERIPH_RSTDIS0_MMC2>;
reset-names = "reset";
bus-width = <0x4>;
broken-cd;
pinctrl-names = "default", "idle";

View File

@ -62,4 +62,13 @@ alternative_if ARM64_ALT_PAN_NOT_UAO
alternative_else_nop_endif
.endm
/*
* Remove the address tag from a virtual address, if present.
*/
.macro clear_address_tag, dst, addr
tst \addr, #(1 << 55)
bic \dst, \addr, #(0xff << 56)
csel \dst, \dst, \addr, eq
.endm
#endif

View File

@ -42,25 +42,35 @@
#define __smp_rmb() dmb(ishld)
#define __smp_wmb() dmb(ishst)
#define __smp_store_release(p, v) \
#define __smp_store_release(p, v) \
do { \
union { typeof(*p) __val; char __c[1]; } __u = \
{ .__val = (__force typeof(*p)) (v) }; \
compiletime_assert_atomic_type(*p); \
switch (sizeof(*p)) { \
case 1: \
asm volatile ("stlrb %w1, %0" \
: "=Q" (*p) : "r" (v) : "memory"); \
: "=Q" (*p) \
: "r" (*(__u8 *)__u.__c) \
: "memory"); \
break; \
case 2: \
asm volatile ("stlrh %w1, %0" \
: "=Q" (*p) : "r" (v) : "memory"); \
: "=Q" (*p) \
: "r" (*(__u16 *)__u.__c) \
: "memory"); \
break; \
case 4: \
asm volatile ("stlr %w1, %0" \
: "=Q" (*p) : "r" (v) : "memory"); \
: "=Q" (*p) \
: "r" (*(__u32 *)__u.__c) \
: "memory"); \
break; \
case 8: \
asm volatile ("stlr %1, %0" \
: "=Q" (*p) : "r" (v) : "memory"); \
: "=Q" (*p) \
: "r" (*(__u64 *)__u.__c) \
: "memory"); \
break; \
} \
} while (0)

View File

@ -46,7 +46,7 @@ static inline unsigned long __xchg_case_##name(unsigned long x, \
" swp" #acq_lse #rel #sz "\t%" #w "3, %" #w "0, %2\n" \
__nops(3) \
" " #nop_lse) \
: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr) \
: "=&r" (ret), "=&r" (tmp), "+Q" (*(unsigned long *)ptr) \
: "r" (x) \
: cl); \
\

View File

@ -19,6 +19,9 @@
struct dev_archdata {
#ifdef CONFIG_IOMMU_API
void *iommu; /* private IOMMU data */
#endif
#ifdef CONFIG_XEN
const struct dma_map_ops *dev_dma_ops;
#endif
bool dma_coherent;
};

View File

@ -27,11 +27,8 @@
#define DMA_ERROR_CODE (~(dma_addr_t)0)
extern const struct dma_map_ops dummy_dma_ops;
static inline const struct dma_map_ops *__generic_dma_ops(struct device *dev)
static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
if (dev && dev->dma_ops)
return dev->dma_ops;
/*
* We expect no ISA devices, and all other DMA masters are expected to
* have someone call arch_setup_dma_ops at device creation time.
@ -39,14 +36,6 @@ static inline const struct dma_map_ops *__generic_dma_ops(struct device *dev)
return &dummy_dma_ops;
}
static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
{
if (xen_initial_domain())
return xen_dma_ops;
else
return __generic_dma_ops(NULL);
}
void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent);
#define arch_setup_dma_ops arch_setup_dma_ops

View File

@ -240,6 +240,12 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_FSC_TYPE;
}
static inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
{
u32 esr = kvm_vcpu_get_hsr(vcpu);
return (esr & ESR_ELx_SYS64_ISS_RT_MASK) >> ESR_ELx_SYS64_ISS_RT_SHIFT;
}
static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
{
return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;

View File

@ -138,6 +138,10 @@
#define SCTLR_ELx_A (1 << 1)
#define SCTLR_ELx_M 1
#define SCTLR_EL2_RES1 ((1 << 4) | (1 << 5) | (1 << 11) | (1 << 16) | \
(1 << 16) | (1 << 18) | (1 << 22) | (1 << 23) | \
(1 << 28) | (1 << 29))
#define SCTLR_ELx_FLAGS (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | \
SCTLR_ELx_SA | SCTLR_ELx_I)

View File

@ -95,20 +95,21 @@ static inline void set_fs(mm_segment_t fs)
*/
#define __range_ok(addr, size) \
({ \
unsigned long __addr = (unsigned long __force)(addr); \
unsigned long flag, roksum; \
__chk_user_ptr(addr); \
asm("adds %1, %1, %3; ccmp %1, %4, #2, cc; cset %0, ls" \
: "=&r" (flag), "=&r" (roksum) \
: "1" (addr), "Ir" (size), \
: "1" (__addr), "Ir" (size), \
"r" (current_thread_info()->addr_limit) \
: "cc"); \
flag; \
})
/*
* When dealing with data aborts or instruction traps we may end up with
* a tagged userland pointer. Clear the tag to get a sane pointer to pass
* on to access_ok(), for instance.
* When dealing with data aborts, watchpoints, or instruction traps we may end
* up with a tagged userland pointer. Clear the tag to get a sane pointer to
* pass on to access_ok(), for instance.
*/
#define untagged_addr(addr) sign_extend64(addr, 55)

View File

@ -306,7 +306,8 @@ do { \
_ASM_EXTABLE(0b, 4b) \
_ASM_EXTABLE(1b, 4b) \
: "=&r" (res), "+r" (data), "=&r" (temp), "=&r" (temp2) \
: "r" (addr), "i" (-EAGAIN), "i" (-EFAULT), \
: "r" ((unsigned long)addr), "i" (-EAGAIN), \
"i" (-EFAULT), \
"i" (__SWP_LL_SC_LOOPS) \
: "memory"); \
uaccess_disable(); \

View File

@ -428,12 +428,13 @@ el1_da:
/*
* Data abort handling
*/
mrs x0, far_el1
mrs x3, far_el1
enable_dbg
// re-enable interrupts if they were enabled in the aborted context
tbnz x23, #7, 1f // PSR_I_BIT
enable_irq
1:
clear_address_tag x0, x3
mov x2, sp // struct pt_regs
bl do_mem_abort
@ -594,7 +595,7 @@ el0_da:
// enable interrupts before calling the main handler
enable_dbg_and_irq
ct_user_exit
bic x0, x26, #(0xff << 56)
clear_address_tag x0, x26
mov x1, x25
mov x2, sp
bl do_mem_abort

View File

@ -36,6 +36,7 @@
#include <asm/traps.h>
#include <asm/cputype.h>
#include <asm/system_misc.h>
#include <asm/uaccess.h>
/* Breakpoint currently in use for each BRP. */
static DEFINE_PER_CPU(struct perf_event *, bp_on_reg[ARM_MAX_BRP]);
@ -721,6 +722,8 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
u64 wp_low, wp_high;
u32 lens, lene;
addr = untagged_addr(addr);
lens = __ffs(ctrl->len);
lene = __fls(ctrl->len);

View File

@ -443,7 +443,7 @@ int cpu_enable_cache_maint_trap(void *__unused)
}
#define __user_cache_maint(insn, address, res) \
if (untagged_addr(address) >= user_addr_max()) { \
if (address >= user_addr_max()) { \
res = -EFAULT; \
} else { \
uaccess_ttbr0_enable(); \
@ -469,7 +469,7 @@ static void user_cache_maint_handler(unsigned int esr, struct pt_regs *regs)
int crm = (esr & ESR_ELx_SYS64_ISS_CRM_MASK) >> ESR_ELx_SYS64_ISS_CRM_SHIFT;
int ret = 0;
address = pt_regs_read_reg(regs, rt);
address = untagged_addr(pt_regs_read_reg(regs, rt));
switch (crm) {
case ESR_ELx_SYS64_ISS_CRM_DC_CVAU: /* DC CVAU, gets promoted */

View File

@ -102,10 +102,13 @@ __do_hyp_init:
tlbi alle2
dsb sy
mrs x4, sctlr_el2
and x4, x4, #SCTLR_ELx_EE // preserve endianness of EL2
ldr x5, =SCTLR_ELx_FLAGS
orr x4, x4, x5
/*
* Preserve all the RES1 bits while setting the default flags,
* as well as the EE bit on BE. Drop the A flag since the compiler
* is allowed to generate unaligned accesses.
*/
ldr x4, =(SCTLR_EL2_RES1 | (SCTLR_ELx_FLAGS & ~SCTLR_ELx_A))
CPU_BE( orr x4, x4, #SCTLR_ELx_EE)
msr sctlr_el2, x4
isb

View File

@ -2,6 +2,8 @@
# Makefile for Kernel-based Virtual Machine module, HYP part
#
ccflags-y += -fno-stack-protector
KVM=../../../../virt/kvm
obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o

View File

@ -1638,8 +1638,8 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
{
struct sys_reg_params params;
u32 hsr = kvm_vcpu_get_hsr(vcpu);
int Rt = (hsr >> 5) & 0xf;
int Rt2 = (hsr >> 10) & 0xf;
int Rt = kvm_vcpu_sys_get_rt(vcpu);
int Rt2 = (hsr >> 10) & 0x1f;
params.is_aarch32 = true;
params.is_32bit = false;
@ -1690,7 +1690,7 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
{
struct sys_reg_params params;
u32 hsr = kvm_vcpu_get_hsr(vcpu);
int Rt = (hsr >> 5) & 0xf;
int Rt = kvm_vcpu_sys_get_rt(vcpu);
params.is_aarch32 = true;
params.is_32bit = true;
@ -1805,7 +1805,7 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct sys_reg_params params;
unsigned long esr = kvm_vcpu_get_hsr(vcpu);
int Rt = (esr >> 5) & 0x1f;
int Rt = kvm_vcpu_sys_get_rt(vcpu);
int ret;
trace_kvm_handle_sys_reg(esr);

View File

@ -977,4 +977,11 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
dev->archdata.dma_coherent = coherent;
__iommu_setup_dma_ops(dev, dma_base, size, iommu);
#ifdef CONFIG_XEN
if (xen_initial_domain()) {
dev->archdata.dev_dma_ops = dev->dma_ops;
dev->dma_ops = xen_dma_ops;
}
#endif
}

View File

@ -252,8 +252,9 @@ static int emit_bpf_tail_call(struct jit_ctx *ctx)
*/
off = offsetof(struct bpf_array, ptrs);
emit_a64_mov_i64(tmp, off, ctx);
emit(A64_LDR64(tmp, r2, tmp), ctx);
emit(A64_LDR64(prg, tmp, r3), ctx);
emit(A64_ADD(1, tmp, r2, tmp), ctx);
emit(A64_LSL(1, prg, r3, 3), ctx);
emit(A64_LDR64(prg, tmp, prg), ctx);
emit(A64_CBZ(1, prg, jmp_offset), ctx);
/* goto *(prog->bpf_func + prologue_size); */
@ -779,14 +780,14 @@ static int build_body(struct jit_ctx *ctx)
int ret;
ret = build_insn(insn, ctx);
if (ctx->image == NULL)
ctx->offset[i] = ctx->idx;
if (ret > 0) {
i++;
if (ctx->image == NULL)
ctx->offset[i] = ctx->idx;
continue;
}
if (ctx->image == NULL)
ctx->offset[i] = ctx->idx;
if (ret)
return ret;
}

View File

@ -28,24 +28,32 @@
#define segment_eq(a, b) ((a).seg == (b).seg)
#define __kernel_ok (segment_eq(get_fs(), KERNEL_DS))
/*
* Explicitly allow NULL pointers here. Parts of the kernel such
* as readv/writev use access_ok to validate pointers, but want
* to allow NULL pointers for various reasons. NULL pointers are
* safe to allow through because the first page is not mappable on
* Meta.
*
* We also wish to avoid letting user code access the system area
* and the kernel half of the address space.
*/
#define __user_bad(addr, size) (((addr) > 0 && (addr) < META_MEMORY_BASE) || \
((addr) > PAGE_OFFSET && \
(addr) < LINCORE_BASE))
static inline int __access_ok(unsigned long addr, unsigned long size)
{
return __kernel_ok || !__user_bad(addr, size);
/*
* Allow access to the user mapped memory area, but not the system area
* before it. The check extends to the top of the address space when
* kernel access is allowed (there's no real reason to user copy to the
* system area in any case).
*/
if (likely(addr >= META_MEMORY_BASE && addr < get_fs().seg &&
size <= get_fs().seg - addr))
return true;
/*
* Explicitly allow NULL pointers here. Parts of the kernel such
* as readv/writev use access_ok to validate pointers, but want
* to allow NULL pointers for various reasons. NULL pointers are
* safe to allow through because the first page is not mappable on
* Meta.
*/
if (!addr)
return true;
/* Allow access to core code memory area... */
if (addr >= LINCORE_CODE_BASE && addr <= LINCORE_CODE_LIMIT &&
size <= LINCORE_CODE_LIMIT + 1 - addr)
return true;
/* ... but no other areas. */
return false;
}
#define access_ok(type, addr, size) __access_ok((unsigned long)(addr), \
@ -186,8 +194,13 @@ do { \
extern long __must_check __strncpy_from_user(char *dst, const char __user *src,
long count);
#define strncpy_from_user(dst, src, count) __strncpy_from_user(dst, src, count)
static inline long
strncpy_from_user(char *dst, const char __user *src, long count)
{
if (!access_ok(VERIFY_READ, src, 1))
return -EFAULT;
return __strncpy_from_user(dst, src, count);
}
/*
* Return the size of a string (including the ending 0)
*

View File

@ -1373,6 +1373,7 @@ config CPU_LOONGSON3
select WEAK_ORDERING
select WEAK_REORDERING_BEYOND_LLSC
select MIPS_PGD_C0_CONTEXT
select MIPS_L1_CACHE_SHIFT_6
select GPIOLIB
help
The Loongson 3 processor implements the MIPS64R2 instruction

View File

@ -120,7 +120,6 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
struct thread_info *ti = task_thread_info(p);
struct pt_regs *childregs, *regs = current_pt_regs();
unsigned long childksp;
p->set_child_tid = p->clear_child_tid = NULL;
childksp = (unsigned long)task_stack_page(p) + THREAD_SIZE - 32;

View File

@ -167,8 +167,6 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
top_of_kernel_stack = sp;
p->set_child_tid = p->clear_child_tid = NULL;
/* Locate userspace context on stack... */
sp -= STACK_FRAME_OVERHEAD; /* redzone */
sp -= sizeof(struct pt_regs);

View File

@ -70,8 +70,9 @@ extern void drop_cop(unsigned long acop, struct mm_struct *mm);
* switch_mm is the entry point called from the architecture independent
* code in kernel/sched/core.c
*/
static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)
static inline void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
struct task_struct *tsk)
{
/* Mark this context has been used on the new CPU */
if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next)))
@ -110,6 +111,18 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
switch_mmu_context(prev, next, tsk);
}
static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
struct task_struct *tsk)
{
unsigned long flags;
local_irq_save(flags);
switch_mm_irqs_off(prev, next, tsk);
local_irq_restore(flags);
}
#define switch_mm_irqs_off switch_mm_irqs_off
#define deactivate_mm(tsk,mm) do { } while (0)
/*

View File

@ -44,8 +44,22 @@ extern void __init dump_numa_cpu_topology(void);
extern int sysfs_add_device_to_node(struct device *dev, int nid);
extern void sysfs_remove_device_from_node(struct device *dev, int nid);
static inline int early_cpu_to_node(int cpu)
{
int nid;
nid = numa_cpu_lookup_table[cpu];
/*
* Fall back to node 0 if nid is unset (it should be, except bugs).
* This allows callers to safely do NODE_DATA(early_cpu_to_node(cpu)).
*/
return (nid < 0) ? 0 : nid;
}
#else
static inline int early_cpu_to_node(int cpu) { return 0; }
static inline void dump_numa_cpu_topology(void) {}
static inline int sysfs_add_device_to_node(struct device *dev, int nid)

View File

@ -724,7 +724,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
*/
#define MAX_WAIT_FOR_RECOVERY 300
static void eeh_handle_normal_event(struct eeh_pe *pe)
static bool eeh_handle_normal_event(struct eeh_pe *pe)
{
struct pci_bus *frozen_bus;
struct eeh_dev *edev, *tmp;
@ -736,7 +736,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
if (!frozen_bus) {
pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
__func__, pe->phb->global_number, pe->addr);
return;
return false;
}
eeh_pe_update_time_stamp(pe);
@ -870,7 +870,7 @@ static void eeh_handle_normal_event(struct eeh_pe *pe)
pr_info("EEH: Notify device driver to resume\n");
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
return;
return false;
excess_failures:
/*
@ -915,8 +915,12 @@ perm_error:
pci_lock_rescan_remove();
pci_hp_remove_devices(frozen_bus);
pci_unlock_rescan_remove();
/* The passed PE should no longer be used */
return true;
}
}
return false;
}
static void eeh_handle_special_event(void)
@ -982,7 +986,14 @@ static void eeh_handle_special_event(void)
*/
if (rc == EEH_NEXT_ERR_FROZEN_PE ||
rc == EEH_NEXT_ERR_FENCED_PHB) {
eeh_handle_normal_event(pe);
/*
* eeh_handle_normal_event() can make the PE stale if it
* determines that the PE cannot possibly be recovered.
* Don't modify the PE state if that's the case.
*/
if (eeh_handle_normal_event(pe))
continue;
eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
} else {
pci_lock_rescan_remove();

View File

@ -735,8 +735,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
andis. r15,r14,(DBSR_IC|DBSR_BT)@h
beq+ 1f
#ifdef CONFIG_RELOCATABLE
ld r15,PACATOC(r13)
ld r14,interrupt_base_book3e@got(r15)
ld r15,__end_interrupts@got(r15)
#else
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
LOAD_REG_IMMEDIATE(r15,__end_interrupts)
#endif
cmpld cr0,r10,r14
cmpld cr1,r10,r15
blt+ cr0,1f
@ -799,8 +805,14 @@ kernel_dbg_exc:
andis. r15,r14,(DBSR_IC|DBSR_BT)@h
beq+ 1f
#ifdef CONFIG_RELOCATABLE
ld r15,PACATOC(r13)
ld r14,interrupt_base_book3e@got(r15)
ld r15,__end_interrupts@got(r15)
#else
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
LOAD_REG_IMMEDIATE(r15,__end_interrupts)
#endif
cmpld cr0,r10,r14
cmpld cr1,r10,r15
blt+ cr0,1f

View File

@ -221,6 +221,8 @@ static void machine_check_process_queued_event(struct irq_work *work)
{
int index;
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
/*
* For now just print it to console.
* TODO: log this error event to FSP or nvram.

View File

@ -561,6 +561,7 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type,
static struct pstore_info nvram_pstore_info = {
.owner = THIS_MODULE,
.name = "nvram",
.flags = PSTORE_FLAGS_DMESG,
.open = nvram_pstore_open,
.read = nvram_pstore_read,
.write = nvram_pstore_write,

View File

@ -864,6 +864,25 @@ static void tm_reclaim_thread(struct thread_struct *thr,
if (!MSR_TM_SUSPENDED(mfmsr()))
return;
/*
* If we are in a transaction and FP is off then we can't have
* used FP inside that transaction. Hence the checkpointed
* state is the same as the live state. We need to copy the
* live state to the checkpointed state so that when the
* transaction is restored, the checkpointed state is correct
* and the aborted transaction sees the correct state. We use
* ckpt_regs.msr here as that's what tm_reclaim will use to
* determine if it's going to write the checkpointed state or
* not. So either this will write the checkpointed registers,
* or reclaim will. Similarly for VMX.
*/
if ((thr->ckpt_regs.msr & MSR_FP) == 0)
memcpy(&thr->ckfp_state, &thr->fp_state,
sizeof(struct thread_fp_state));
if ((thr->ckpt_regs.msr & MSR_VEC) == 0)
memcpy(&thr->ckvr_state, &thr->vr_state,
sizeof(struct thread_vr_state));
giveup_all(container_of(thr, struct task_struct, thread));
tm_reclaim(thr, thr->ckpt_regs.msr, cause);
@ -1647,6 +1666,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
#ifdef CONFIG_VSX
current->thread.used_vsr = 0;
#endif
current->thread.load_fp = 0;
memset(&current->thread.fp_state, 0, sizeof(current->thread.fp_state));
current->thread.fp_save_area = NULL;
#ifdef CONFIG_ALTIVEC
@ -1655,6 +1675,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
current->thread.vr_save_area = NULL;
current->thread.vrsave = 0;
current->thread.used_vr = 0;
current->thread.load_vec = 0;
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_SPE
memset(current->thread.evr, 0, sizeof(current->thread.evr));
@ -1666,6 +1687,7 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
current->thread.tm_tfhar = 0;
current->thread.tm_texasr = 0;
current->thread.tm_tfiar = 0;
current->thread.load_tm = 0;
#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
}
EXPORT_SYMBOL(start_thread);

View File

@ -161,7 +161,9 @@ static struct ibm_pa_feature {
{ .pabyte = 0, .pabit = 3, .cpu_features = CPU_FTR_CTRL },
{ .pabyte = 0, .pabit = 6, .cpu_features = CPU_FTR_NOEXECUTE },
{ .pabyte = 1, .pabit = 2, .mmu_features = MMU_FTR_CI_LARGE_PAGE },
#ifdef CONFIG_PPC_RADIX_MMU
{ .pabyte = 40, .pabit = 0, .mmu_features = MMU_FTR_TYPE_RADIX },
#endif
{ .pabyte = 1, .pabit = 1, .invert = 1, .cpu_features = CPU_FTR_NODSISRALIGN },
{ .pabyte = 5, .pabit = 0, .cpu_features = CPU_FTR_REAL_LE,
.cpu_user_ftrs = PPC_FEATURE_TRUE_LE },

View File

@ -650,7 +650,7 @@ void __init emergency_stack_init(void)
static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
{
return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, align,
__pa(MAX_DMA_ADDRESS));
}
@ -661,7 +661,7 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
static int pcpu_cpu_distance(unsigned int from, unsigned int to)
{
if (cpu_to_node(from) == cpu_to_node(to))
if (early_cpu_to_node(from) == early_cpu_to_node(to))
return LOCAL_DISTANCE;
else
return REMOTE_DISTANCE;

View File

@ -710,6 +710,10 @@ static int register_cpu_online(unsigned int cpu)
struct device_attribute *attrs, *pmc_attrs;
int i, nattrs;
/* For cpus present at boot a reference was already grabbed in register_cpu() */
if (!s->of_node)
s->of_node = of_get_cpu_node(cpu, NULL);
#ifdef CONFIG_PPC64
if (cpu_has_feature(CPU_FTR_SMT))
device_create_file(s, &dev_attr_smt_snooze_delay);
@ -864,6 +868,8 @@ static int unregister_cpu_online(unsigned int cpu)
}
#endif
cacheinfo_cpu_offline(cpu);
of_node_put(s->of_node);
s->of_node = NULL;
#endif /* CONFIG_HOTPLUG_CPU */
return 0;
}

View File

@ -306,8 +306,6 @@ long machine_check_early(struct pt_regs *regs)
__this_cpu_inc(irq_stat.mce_exceptions);
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
handled = cur_cpu_spec->machine_check_early(regs);
return handled;
@ -741,6 +739,8 @@ void machine_check_exception(struct pt_regs *regs)
__this_cpu_inc(irq_stat.mce_exceptions);
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
/* See if any machine dependent calls. In theory, we would want
* to call the CPU first, and call the ppc_md. one if the CPU
* one returns a positive number. However there is existing code

View File

@ -16,6 +16,7 @@
*/
#include <linux/debugfs.h>
#include <linux/fs.h>
#include <linux/hugetlb.h>
#include <linux/io.h>
#include <linux/mm.h>
#include <linux/sched.h>
@ -331,7 +332,7 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start)
for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
addr = start + i * PMD_SIZE;
if (!pmd_none(*pmd))
if (!pmd_none(*pmd) && !pmd_huge(*pmd))
/* pmd exists */
walk_pte(st, pmd, addr);
else
@ -347,7 +348,7 @@ static void walk_pud(struct pg_state *st, pgd_t *pgd, unsigned long start)
for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
addr = start + i * PUD_SIZE;
if (!pud_none(*pud))
if (!pud_none(*pud) && !pud_huge(*pud))
/* pud exists */
walk_pmd(st, pud, addr);
else
@ -367,7 +368,7 @@ static void walk_pagetables(struct pg_state *st)
*/
for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
addr = KERN_VIRT_START + i * PGDIR_SIZE;
if (!pgd_none(*pgd))
if (!pgd_none(*pgd) && !pgd_huge(*pgd))
/* pgd exists */
walk_pud(st, pgd, addr);
else

View File

@ -81,7 +81,7 @@ struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
gfp_t gfp_mask = GFP_USER;
struct page *new_page;
if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
if (PageCompound(page))
return NULL;
if (PageHighMem(page))
@ -100,7 +100,7 @@ static int mm_iommu_move_page_from_cma(struct page *page)
LIST_HEAD(cma_migrate_pages);
/* Ignore huge pages for now */
if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
if (PageCompound(page))
return -EBUSY;
lru_add_drain();

View File

@ -197,7 +197,9 @@ static int __spu_trap_data_map(struct spu *spu, unsigned long ea, u64 dsisr)
(REGION_ID(ea) != USER_REGION_ID)) {
spin_unlock(&spu->register_lock);
ret = hash_page(ea, _PAGE_PRESENT | _PAGE_READ, 0x300, dsisr);
ret = hash_page(ea,
_PAGE_PRESENT | _PAGE_READ | _PAGE_PRIVILEGED,
0x300, dsisr);
spin_lock(&spu->register_lock);
if (!ret) {

View File

@ -180,7 +180,7 @@ long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
pe_err(npe, "Failed to configure TCE table, err %lld\n", rc);
return rc;
}
pnv_pci_phb3_tce_invalidate_entire(phb, false);
pnv_pci_ioda2_tce_invalidate_entire(phb, false);
/* Add the table to the list so its TCE cache will get invalidated */
pnv_pci_link_table_and_group(phb->hose->node, num,
@ -204,7 +204,7 @@ long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num)
pe_err(npe, "Unmapping failed, ret = %lld\n", rc);
return rc;
}
pnv_pci_phb3_tce_invalidate_entire(phb, false);
pnv_pci_ioda2_tce_invalidate_entire(phb, false);
pnv_pci_unlink_table_and_group(npe->table_group.tables[num],
&npe->table_group);
@ -270,7 +270,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
0 /* bypass base */, top);
if (rc == OPAL_SUCCESS)
pnv_pci_phb3_tce_invalidate_entire(phb, false);
pnv_pci_ioda2_tce_invalidate_entire(phb, false);
return rc;
}
@ -334,7 +334,7 @@ void pnv_npu_take_ownership(struct pnv_ioda_pe *npe)
pe_err(npe, "Failed to disable bypass, err %lld\n", rc);
return;
}
pnv_pci_phb3_tce_invalidate_entire(npe->phb, false);
pnv_pci_ioda2_tce_invalidate_entire(npe->phb, false);
}
struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)

View File

@ -1883,7 +1883,7 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
#define PHB3_TCE_KILL_INVAL_PE PPC_BIT(1)
#define PHB3_TCE_KILL_INVAL_ONE PPC_BIT(2)
void pnv_pci_phb3_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
static void pnv_pci_phb3_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
{
__be64 __iomem *invalidate = pnv_ioda_get_inval_reg(phb, rm);
const unsigned long val = PHB3_TCE_KILL_INVAL_ALL;
@ -1979,6 +1979,14 @@ static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
}
}
void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
{
if (phb->model == PNV_PHB_MODEL_NPU || phb->model == PNV_PHB_MODEL_PHB3)
pnv_pci_phb3_tce_invalidate_entire(phb, rm);
else
opal_pci_tce_kill(phb->opal_id, OPAL_PCI_TCE_KILL, 0, 0, 0, 0);
}
static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
long npages, unsigned long uaddr,
enum dma_data_direction direction,

View File

@ -229,7 +229,7 @@ extern void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
/* Nvlink functions */
extern void pnv_npu_try_dma_set_bypass(struct pci_dev *gpdev, bool bypass);
extern void pnv_pci_phb3_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
extern void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm);
extern struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe);
extern long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
struct iommu_table *tbl);

View File

@ -288,7 +288,6 @@ int dlpar_detach_node(struct device_node *dn)
if (rc)
return rc;
of_node_put(dn); /* Must decrement the refcount */
return 0;
}

View File

@ -124,6 +124,7 @@ static struct property *dlpar_clone_drconf_property(struct device_node *dn)
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = be64_to_cpu(lmbs[i].base_addr);
lmbs[i].drc_index = be32_to_cpu(lmbs[i].drc_index);
lmbs[i].aa_index = be32_to_cpu(lmbs[i].aa_index);
lmbs[i].flags = be32_to_cpu(lmbs[i].flags);
}
@ -147,6 +148,7 @@ static void dlpar_update_drconf_property(struct device_node *dn,
for (i = 0; i < num_lmbs; i++) {
lmbs[i].base_addr = cpu_to_be64(lmbs[i].base_addr);
lmbs[i].drc_index = cpu_to_be32(lmbs[i].drc_index);
lmbs[i].aa_index = cpu_to_be32(lmbs[i].aa_index);
lmbs[i].flags = cpu_to_be32(lmbs[i].flags);
}

View File

@ -75,7 +75,8 @@ static int u8_gpio_dir_out(struct gpio_chip *gc, unsigned int gpio, int val)
static void u8_gpio_save_regs(struct of_mm_gpio_chip *mm_gc)
{
struct u8_gpio_chip *u8_gc = gpiochip_get_data(&mm_gc->gc);
struct u8_gpio_chip *u8_gc =
container_of(mm_gc, struct u8_gpio_chip, mm_gc);
u8_gc->data = in_8(mm_gc->regs);
}

View File

@ -428,6 +428,20 @@ static void *nt_vmcoreinfo(void *ptr)
return nt_init_name(ptr, 0, vmcoreinfo, size, "VMCOREINFO");
}
/*
* Initialize final note (needed for /proc/vmcore code)
*/
static void *nt_final(void *ptr)
{
Elf64_Nhdr *note;
note = (Elf64_Nhdr *) ptr;
note->n_namesz = 0;
note->n_descsz = 0;
note->n_type = 0;
return PTR_ADD(ptr, sizeof(Elf64_Nhdr));
}
/*
* Initialize ELF header (new kernel)
*/
@ -515,6 +529,7 @@ static void *notes_init(Elf64_Phdr *phdr, void *ptr, u64 notes_offset)
if (sa->prefix != 0)
ptr = fill_cpu_elf_notes(ptr, cpu++, sa);
ptr = nt_vmcoreinfo(ptr);
ptr = nt_final(ptr);
memset(phdr, 0, sizeof(*phdr));
phdr->p_type = PT_NOTE;
phdr->p_offset = notes_offset;

View File

@ -233,12 +233,17 @@ ENTRY(sie64a)
lctlg %c1,%c1,__LC_USER_ASCE # load primary asce
.Lsie_done:
# some program checks are suppressing. C code (e.g. do_protection_exception)
# will rewind the PSW by the ILC, which is 4 bytes in case of SIE. Other
# instructions between sie64a and .Lsie_done should not cause program
# interrupts. So lets use a nop (47 00 00 00) as a landing pad.
# will rewind the PSW by the ILC, which is often 4 bytes in case of SIE. There
# are some corner cases (e.g. runtime instrumentation) where ILC is unpredictable.
# Other instructions between sie64a and .Lsie_done should not cause program
# interrupts. So lets use 3 nops as a landing pad for all possible rewinds.
# See also .Lcleanup_sie
.Lrewind_pad:
nop 0
.Lrewind_pad6:
nopr 7
.Lrewind_pad4:
nopr 7
.Lrewind_pad2:
nopr 7
.globl sie_exit
sie_exit:
lg %r14,__SF_EMPTY+8(%r15) # load guest register save area
@ -251,7 +256,9 @@ sie_exit:
stg %r14,__SF_EMPTY+16(%r15) # set exit reason code
j sie_exit
EX_TABLE(.Lrewind_pad,.Lsie_fault)
EX_TABLE(.Lrewind_pad6,.Lsie_fault)
EX_TABLE(.Lrewind_pad4,.Lsie_fault)
EX_TABLE(.Lrewind_pad2,.Lsie_fault)
EX_TABLE(sie_exit,.Lsie_fault)
EXPORT_SYMBOL(sie64a)
EXPORT_SYMBOL(sie_exit)
@ -314,6 +321,7 @@ ENTRY(system_call)
lg %r14,__LC_VDSO_PER_CPU
lmg %r0,%r10,__PT_R0(%r11)
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
.Lsysc_exit_timer:
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
lmg %r11,%r15,__PT_R11(%r11)
@ -601,6 +609,7 @@ ENTRY(io_int_handler)
lg %r14,__LC_VDSO_PER_CPU
lmg %r0,%r10,__PT_R0(%r11)
mvc __LC_RETURN_PSW(16),__PT_PSW(%r11)
.Lio_exit_timer:
stpt __LC_EXIT_TIMER
mvc __VDSO_ECTG_BASE(16,%r14),__LC_EXIT_TIMER
lmg %r11,%r15,__PT_R11(%r11)
@ -1124,15 +1133,23 @@ cleanup_critical:
br %r14
.Lcleanup_sysc_restore:
# check if stpt has been executed
clg %r9,BASED(.Lcleanup_sysc_restore_insn)
jh 0f
mvc __LC_EXIT_TIMER(8),__LC_ASYNC_ENTER_TIMER
cghi %r11,__LC_SAVE_AREA_ASYNC
je 0f
mvc __LC_EXIT_TIMER(8),__LC_MCCK_ENTER_TIMER
0: clg %r9,BASED(.Lcleanup_sysc_restore_insn+8)
je 1f
lg %r9,24(%r11) # get saved pointer to pt_regs
mvc __LC_RETURN_PSW(16),__PT_PSW(%r9)
mvc 0(64,%r11),__PT_R8(%r9)
lmg %r0,%r7,__PT_R0(%r9)
0: lmg %r8,%r9,__LC_RETURN_PSW
1: lmg %r8,%r9,__LC_RETURN_PSW
br %r14
.Lcleanup_sysc_restore_insn:
.quad .Lsysc_exit_timer
.quad .Lsysc_done - 4
.Lcleanup_io_tif:
@ -1140,15 +1157,20 @@ cleanup_critical:
br %r14
.Lcleanup_io_restore:
# check if stpt has been executed
clg %r9,BASED(.Lcleanup_io_restore_insn)
je 0f
jh 0f
mvc __LC_EXIT_TIMER(8),__LC_MCCK_ENTER_TIMER
0: clg %r9,BASED(.Lcleanup_io_restore_insn+8)
je 1f
lg %r9,24(%r11) # get saved r11 pointer to pt_regs
mvc __LC_RETURN_PSW(16),__PT_PSW(%r9)
mvc 0(64,%r11),__PT_R8(%r9)
lmg %r0,%r7,__PT_R0(%r9)
0: lmg %r8,%r9,__LC_RETURN_PSW
1: lmg %r8,%r9,__LC_RETURN_PSW
br %r14
.Lcleanup_io_restore_insn:
.quad .Lio_exit_timer
.quad .Lio_done - 4
.Lcleanup_idle:

View File

@ -192,9 +192,9 @@ config NR_CPUS
int "Maximum number of CPUs"
depends on SMP
range 2 32 if SPARC32
range 2 1024 if SPARC64
range 2 4096 if SPARC64
default 32 if SPARC32
default 64 if SPARC64
default 4096 if SPARC64
source kernel/Kconfig.hz

View File

@ -24,9 +24,11 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
static inline int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len)
{
if (len & ~HPAGE_MASK)
struct hstate *h = hstate_file(file);
if (len & ~huge_page_mask(h))
return -EINVAL;
if (addr & ~HPAGE_MASK)
if (addr & ~huge_page_mask(h))
return -EINVAL;
return 0;
}

View File

@ -52,7 +52,7 @@
#define CTX_NR_MASK TAG_CONTEXT_BITS
#define CTX_HW_MASK (CTX_NR_MASK | CTX_PGSZ_MASK)
#define CTX_FIRST_VERSION ((_AC(1,UL) << CTX_VERSION_SHIFT) + _AC(1,UL))
#define CTX_FIRST_VERSION BIT(CTX_VERSION_SHIFT)
#define CTX_VALID(__ctx) \
(!(((__ctx.sparc64_ctx_val) ^ tlb_context_cache) & CTX_VERSION_MASK))
#define CTX_HWBITS(__ctx) ((__ctx.sparc64_ctx_val) & CTX_HW_MASK)

View File

@ -19,13 +19,8 @@ extern spinlock_t ctx_alloc_lock;
extern unsigned long tlb_context_cache;
extern unsigned long mmu_context_bmap[];
DECLARE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm);
void get_new_mmu_context(struct mm_struct *mm);
#ifdef CONFIG_SMP
void smp_new_mmu_context_version(void);
#else
#define smp_new_mmu_context_version() do { } while (0)
#endif
int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
void destroy_context(struct mm_struct *mm);
@ -76,8 +71,9 @@ void __flush_tlb_mm(unsigned long, unsigned long);
static inline void switch_mm(struct mm_struct *old_mm, struct mm_struct *mm, struct task_struct *tsk)
{
unsigned long ctx_valid, flags;
int cpu;
int cpu = smp_processor_id();
per_cpu(per_cpu_secondary_mm, cpu) = mm;
if (unlikely(mm == &init_mm))
return;
@ -123,7 +119,6 @@ static inline void switch_mm(struct mm_struct *old_mm, struct mm_struct *mm, str
* for the first time, we must flush that context out of the
* local TLB.
*/
cpu = smp_processor_id();
if (!ctx_valid || !cpumask_test_cpu(cpu, mm_cpumask(mm))) {
cpumask_set_cpu(cpu, mm_cpumask(mm));
__flush_tlb_mm(CTX_HWBITS(mm->context),
@ -133,26 +128,7 @@ static inline void switch_mm(struct mm_struct *old_mm, struct mm_struct *mm, str
}
#define deactivate_mm(tsk,mm) do { } while (0)
/* Activate a new MM instance for the current task. */
static inline void activate_mm(struct mm_struct *active_mm, struct mm_struct *mm)
{
unsigned long flags;
int cpu;
spin_lock_irqsave(&mm->context.lock, flags);
if (!CTX_VALID(mm->context))
get_new_mmu_context(mm);
cpu = smp_processor_id();
if (!cpumask_test_cpu(cpu, mm_cpumask(mm)))
cpumask_set_cpu(cpu, mm_cpumask(mm));
load_secondary_context(mm);
__flush_tlb_mm(CTX_HWBITS(mm->context), SECONDARY_CONTEXT);
tsb_context_switch(mm);
spin_unlock_irqrestore(&mm->context.lock, flags);
}
#define activate_mm(active_mm, mm) switch_mm(active_mm, mm, NULL)
#endif /* !(__ASSEMBLY__) */
#endif /* !(__SPARC64_MMU_CONTEXT_H) */

View File

@ -91,9 +91,9 @@ extern unsigned long pfn_base;
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
*/
extern unsigned long empty_zero_page;
extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
#define ZERO_PAGE(vaddr) (virt_to_page(&empty_zero_page))
#define ZERO_PAGE(vaddr) (virt_to_page(empty_zero_page))
/*
* In general all page table modifications should use the V8 atomic

View File

@ -20,7 +20,6 @@
#define PIL_SMP_CALL_FUNC 1
#define PIL_SMP_RECEIVE_SIGNAL 2
#define PIL_SMP_CAPTURE 3
#define PIL_SMP_CTX_NEW_VERSION 4
#define PIL_DEVICE_IRQ 5
#define PIL_SMP_CALL_FUNC_SNGL 6
#define PIL_DEFERRED_PCR_WORK 7

View File

@ -16,7 +16,7 @@ extern char reboot_command[];
*/
extern unsigned char boot_cpu_id;
extern unsigned long empty_zero_page;
extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
extern int serial_console;
static inline int con_is_present(void)

View File

@ -327,6 +327,7 @@ struct vio_dev {
int compat_len;
u64 dev_no;
u64 id;
unsigned long channel_id;

View File

@ -130,18 +130,17 @@ unsigned long prepare_ftrace_return(unsigned long parent,
if (unlikely(atomic_read(&current->tracing_graph_pause)))
return parent + 8UL;
trace.func = self_addr;
trace.depth = current->curr_ret_stack + 1;
/* Only trace if the calling function expects to */
if (!ftrace_graph_entry(&trace))
return parent + 8UL;
if (ftrace_push_return_trace(parent, self_addr, &trace.depth,
frame_pointer, NULL) == -EBUSY)
return parent + 8UL;
trace.func = self_addr;
/* Only trace if the calling function expects to */
if (!ftrace_graph_entry(&trace)) {
current->curr_ret_stack--;
return parent + 8UL;
}
return return_hooker;
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

View File

@ -939,3 +939,9 @@ ENTRY(__retl_o1)
retl
mov %o1, %o0
ENDPROC(__retl_o1)
ENTRY(__retl_o1_asi)
wr %o5, 0x0, %asi
retl
mov %o1, %o0
ENDPROC(__retl_o1_asi)

View File

@ -1034,17 +1034,26 @@ static void __init init_cpu_send_mondo_info(struct trap_per_cpu *tb)
{
#ifdef CONFIG_SMP
unsigned long page;
void *mondo, *p;
BUILD_BUG_ON((NR_CPUS * sizeof(u16)) > (PAGE_SIZE - 64));
BUILD_BUG_ON((NR_CPUS * sizeof(u16)) > PAGE_SIZE);
/* Make sure mondo block is 64byte aligned */
p = kzalloc(127, GFP_KERNEL);
if (!p) {
prom_printf("SUN4V: Error, cannot allocate mondo block.\n");
prom_halt();
}
mondo = (void *)(((unsigned long)p + 63) & ~0x3f);
tb->cpu_mondo_block_pa = __pa(mondo);
page = get_zeroed_page(GFP_KERNEL);
if (!page) {
prom_printf("SUN4V: Error, cannot allocate cpu mondo page.\n");
prom_printf("SUN4V: Error, cannot allocate cpu list page.\n");
prom_halt();
}
tb->cpu_mondo_block_pa = __pa(page);
tb->cpu_list_pa = __pa(page + 64);
tb->cpu_list_pa = __pa(page);
#endif
}

View File

@ -37,7 +37,6 @@ void handle_stdfmna(struct pt_regs *regs, unsigned long sfar, unsigned long sfsr
/* smp_64.c */
void __irq_entry smp_call_function_client(int irq, struct pt_regs *regs);
void __irq_entry smp_call_function_single_client(int irq, struct pt_regs *regs);
void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs);
void __irq_entry smp_penguin_jailcell(int irq, struct pt_regs *regs);
void __irq_entry smp_receive_signal_client(int irq, struct pt_regs *regs);

View File

@ -964,37 +964,6 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
preempt_enable();
}
void __irq_entry smp_new_mmu_context_version_client(int irq, struct pt_regs *regs)
{
struct mm_struct *mm;
unsigned long flags;
clear_softint(1 << irq);
/* See if we need to allocate a new TLB context because
* the version of the one we are using is now out of date.
*/
mm = current->active_mm;
if (unlikely(!mm || (mm == &init_mm)))
return;
spin_lock_irqsave(&mm->context.lock, flags);
if (unlikely(!CTX_VALID(mm->context)))
get_new_mmu_context(mm);
spin_unlock_irqrestore(&mm->context.lock, flags);
load_secondary_context(mm);
__flush_tlb_mm(CTX_HWBITS(mm->context),
SECONDARY_CONTEXT);
}
void smp_new_mmu_context_version(void)
{
smp_cross_call(&xcall_new_mmu_context_version, 0, 0, 0);
}
#ifdef CONFIG_KGDB
void kgdb_roundup_cpus(unsigned long flags)
{

View File

@ -455,13 +455,16 @@ __tsb_context_switch:
.type copy_tsb,#function
copy_tsb: /* %o0=old_tsb_base, %o1=old_tsb_size
* %o2=new_tsb_base, %o3=new_tsb_size
* %o4=page_size_shift
*/
sethi %uhi(TSB_PASS_BITS), %g7
srlx %o3, 4, %o3
add %o0, %o1, %g1 /* end of old tsb */
add %o0, %o1, %o1 /* end of old tsb */
sllx %g7, 32, %g7
sub %o3, 1, %o3 /* %o3 == new tsb hash mask */
mov %o4, %g1 /* page_size_shift */
661: prefetcha [%o0] ASI_N, #one_read
.section .tsb_phys_patch, "ax"
.word 661b
@ -486,9 +489,9 @@ copy_tsb: /* %o0=old_tsb_base, %o1=old_tsb_size
/* This can definitely be computed faster... */
srlx %o0, 4, %o5 /* Build index */
and %o5, 511, %o5 /* Mask index */
sllx %o5, PAGE_SHIFT, %o5 /* Put into vaddr position */
sllx %o5, %g1, %o5 /* Put into vaddr position */
or %o4, %o5, %o4 /* Full VADDR. */
srlx %o4, PAGE_SHIFT, %o4 /* Shift down to create index */
srlx %o4, %g1, %o4 /* Shift down to create index */
and %o4, %o3, %o4 /* Mask with new_tsb_nents-1 */
sllx %o4, 4, %o4 /* Shift back up into tsb ent offset */
TSB_STORE(%o2 + %o4, %g2) /* Store TAG */
@ -496,7 +499,7 @@ copy_tsb: /* %o0=old_tsb_base, %o1=old_tsb_size
TSB_STORE(%o2 + %o4, %g3) /* Store TTE */
80: add %o0, 16, %o0
cmp %o0, %g1
cmp %o0, %o1
bne,pt %xcc, 90b
nop

View File

@ -50,7 +50,7 @@ tl0_resv03e: BTRAP(0x3e) BTRAP(0x3f) BTRAP(0x40)
tl0_irq1: TRAP_IRQ(smp_call_function_client, 1)
tl0_irq2: TRAP_IRQ(smp_receive_signal_client, 2)
tl0_irq3: TRAP_IRQ(smp_penguin_jailcell, 3)
tl0_irq4: TRAP_IRQ(smp_new_mmu_context_version_client, 4)
tl0_irq4: BTRAP(0x44)
#else
tl0_irq1: BTRAP(0x41)
tl0_irq2: BTRAP(0x42)

View File

@ -302,13 +302,16 @@ static struct vio_dev *vio_create_one(struct mdesc_handle *hp, u64 mp,
if (!id) {
dev_set_name(&vdev->dev, "%s", bus_id_name);
vdev->dev_no = ~(u64)0;
vdev->id = ~(u64)0;
} else if (!cfg_handle) {
dev_set_name(&vdev->dev, "%s-%llu", bus_id_name, *id);
vdev->dev_no = *id;
vdev->id = ~(u64)0;
} else {
dev_set_name(&vdev->dev, "%s-%llu-%llu", bus_id_name,
*cfg_handle, *id);
vdev->dev_no = *cfg_handle;
vdev->id = *id;
}
vdev->dev.parent = parent;
@ -351,27 +354,84 @@ static void vio_add(struct mdesc_handle *hp, u64 node)
(void) vio_create_one(hp, node, &root_vdev->dev);
}
struct vio_md_node_query {
const char *type;
u64 dev_no;
u64 id;
};
static int vio_md_node_match(struct device *dev, void *arg)
{
struct vio_md_node_query *query = (struct vio_md_node_query *) arg;
struct vio_dev *vdev = to_vio_dev(dev);
if (vdev->mp == (u64) arg)
return 1;
if (vdev->dev_no != query->dev_no)
return 0;
if (vdev->id != query->id)
return 0;
if (strcmp(vdev->type, query->type))
return 0;
return 0;
return 1;
}
static void vio_remove(struct mdesc_handle *hp, u64 node)
{
const char *type;
const u64 *id, *cfg_handle;
u64 a;
struct vio_md_node_query query;
struct device *dev;
dev = device_find_child(&root_vdev->dev, (void *) node,
type = mdesc_get_property(hp, node, "device-type", NULL);
if (!type) {
type = mdesc_get_property(hp, node, "name", NULL);
if (!type)
type = mdesc_node_name(hp, node);
}
query.type = type;
id = mdesc_get_property(hp, node, "id", NULL);
cfg_handle = NULL;
mdesc_for_each_arc(a, hp, node, MDESC_ARC_TYPE_BACK) {
u64 target;
target = mdesc_arc_target(hp, a);
cfg_handle = mdesc_get_property(hp, target,
"cfg-handle", NULL);
if (cfg_handle)
break;
}
if (!id) {
query.dev_no = ~(u64)0;
query.id = ~(u64)0;
} else if (!cfg_handle) {
query.dev_no = *id;
query.id = ~(u64)0;
} else {
query.dev_no = *cfg_handle;
query.id = *id;
}
dev = device_find_child(&root_vdev->dev, &query,
vio_md_node_match);
if (dev) {
printk(KERN_INFO "VIO: Removing device %s\n", dev_name(dev));
device_unregister(dev);
put_device(dev);
} else {
if (!id)
printk(KERN_ERR "VIO: Removed unknown %s node.\n",
type);
else if (!cfg_handle)
printk(KERN_ERR "VIO: Removed unknown %s node %llu.\n",
type, *id);
else
printk(KERN_ERR "VIO: Removed unknown %s node %llu-%llu.\n",
type, *cfg_handle, *id);
}
}

View File

@ -8,7 +8,7 @@
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_o1; \
.word 98b, __retl_o1_asi;\
.text; \
.align 4;

View File

@ -15,6 +15,7 @@ lib-$(CONFIG_SPARC32) += copy_user.o locks.o
lib-$(CONFIG_SPARC64) += atomic_64.o
lib-$(CONFIG_SPARC32) += lshrdi3.o ashldi3.o
lib-$(CONFIG_SPARC32) += muldi3.o bitext.o cmpdi2.o
lib-$(CONFIG_SPARC64) += multi3.o
lib-$(CONFIG_SPARC64) += copy_page.o clear_page.o bzero.o
lib-$(CONFIG_SPARC64) += csum_copy.o csum_copy_from_user.o csum_copy_to_user.o

View File

@ -8,7 +8,7 @@
98: x,y; \
.section __ex_table,"a";\
.align 4; \
.word 98b, __retl_o1; \
.word 98b, __retl_o1_asi;\
.text; \
.align 4;

35
arch/sparc/lib/multi3.S Normal file
View File

@ -0,0 +1,35 @@
#include <linux/linkage.h>
#include <asm/export.h>
.text
.align 4
ENTRY(__multi3) /* %o0 = u, %o1 = v */
mov %o1, %g1
srl %o3, 0, %g4
mulx %g4, %g1, %o1
srlx %g1, 0x20, %g3
mulx %g3, %g4, %g5
sllx %g5, 0x20, %o5
srl %g1, 0, %g4
sub %o1, %o5, %o5
srlx %o5, 0x20, %o5
addcc %g5, %o5, %g5
srlx %o3, 0x20, %o5
mulx %g4, %o5, %g4
mulx %g3, %o5, %o5
sethi %hi(0x80000000), %g3
addcc %g5, %g4, %g5
srlx %g5, 0x20, %g5
add %g3, %g3, %g3
movcc %xcc, %g0, %g3
addcc %o5, %g5, %o5
sllx %g4, 0x20, %g4
add %o1, %g4, %o1
add %o5, %g3, %g2
mulx %g1, %o2, %g1
add %g1, %g2, %g1
mulx %o0, %o3, %o0
retl
add %g1, %o0, %o0
ENDPROC(__multi3)
EXPORT_SYMBOL(__multi3)

View File

@ -290,7 +290,7 @@ void __init mem_init(void)
/* Saves us work later. */
memset((void *)&empty_zero_page, 0, PAGE_SIZE);
memset((void *)empty_zero_page, 0, PAGE_SIZE);
i = last_valid_pfn >> ((20 - PAGE_SHIFT) + 5);
i += 1;

View File

@ -358,7 +358,8 @@ static int __init setup_hugepagesz(char *string)
}
if ((hv_pgsz_mask & cpu_pgsz_mask) == 0U) {
pr_warn("hugepagesz=%llu not supported by MMU.\n",
hugetlb_bad_size();
pr_err("hugepagesz=%llu not supported by MMU.\n",
hugepage_size);
goto out;
}
@ -706,10 +707,58 @@ EXPORT_SYMBOL(__flush_dcache_range);
/* get_new_mmu_context() uses "cache + 1". */
DEFINE_SPINLOCK(ctx_alloc_lock);
unsigned long tlb_context_cache = CTX_FIRST_VERSION - 1;
unsigned long tlb_context_cache = CTX_FIRST_VERSION;
#define MAX_CTX_NR (1UL << CTX_NR_BITS)
#define CTX_BMAP_SLOTS BITS_TO_LONGS(MAX_CTX_NR)
DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
DEFINE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm) = {0};
static void mmu_context_wrap(void)
{
unsigned long old_ver = tlb_context_cache & CTX_VERSION_MASK;
unsigned long new_ver, new_ctx, old_ctx;
struct mm_struct *mm;
int cpu;
bitmap_zero(mmu_context_bmap, 1 << CTX_NR_BITS);
/* Reserve kernel context */
set_bit(0, mmu_context_bmap);
new_ver = (tlb_context_cache & CTX_VERSION_MASK) + CTX_FIRST_VERSION;
if (unlikely(new_ver == 0))
new_ver = CTX_FIRST_VERSION;
tlb_context_cache = new_ver;
/*
* Make sure that any new mm that are added into per_cpu_secondary_mm,
* are going to go through get_new_mmu_context() path.
*/
mb();
/*
* Updated versions to current on those CPUs that had valid secondary
* contexts
*/
for_each_online_cpu(cpu) {
/*
* If a new mm is stored after we took this mm from the array,
* it will go into get_new_mmu_context() path, because we
* already bumped the version in tlb_context_cache.
*/
mm = per_cpu(per_cpu_secondary_mm, cpu);
if (unlikely(!mm || mm == &init_mm))
continue;
old_ctx = mm->context.sparc64_ctx_val;
if (likely((old_ctx & CTX_VERSION_MASK) == old_ver)) {
new_ctx = (old_ctx & ~CTX_VERSION_MASK) | new_ver;
set_bit(new_ctx & CTX_NR_MASK, mmu_context_bmap);
mm->context.sparc64_ctx_val = new_ctx;
}
}
}
/* Caller does TLB context flushing on local CPU if necessary.
* The caller also ensures that CTX_VALID(mm->context) is false.
@ -725,48 +774,30 @@ void get_new_mmu_context(struct mm_struct *mm)
{
unsigned long ctx, new_ctx;
unsigned long orig_pgsz_bits;
int new_version;
spin_lock(&ctx_alloc_lock);
retry:
/* wrap might have happened, test again if our context became valid */
if (unlikely(CTX_VALID(mm->context)))
goto out;
orig_pgsz_bits = (mm->context.sparc64_ctx_val & CTX_PGSZ_MASK);
ctx = (tlb_context_cache + 1) & CTX_NR_MASK;
new_ctx = find_next_zero_bit(mmu_context_bmap, 1 << CTX_NR_BITS, ctx);
new_version = 0;
if (new_ctx >= (1 << CTX_NR_BITS)) {
new_ctx = find_next_zero_bit(mmu_context_bmap, ctx, 1);
if (new_ctx >= ctx) {
int i;
new_ctx = (tlb_context_cache & CTX_VERSION_MASK) +
CTX_FIRST_VERSION;
if (new_ctx == 1)
new_ctx = CTX_FIRST_VERSION;
/* Don't call memset, for 16 entries that's just
* plain silly...
*/
mmu_context_bmap[0] = 3;
mmu_context_bmap[1] = 0;
mmu_context_bmap[2] = 0;
mmu_context_bmap[3] = 0;
for (i = 4; i < CTX_BMAP_SLOTS; i += 4) {
mmu_context_bmap[i + 0] = 0;
mmu_context_bmap[i + 1] = 0;
mmu_context_bmap[i + 2] = 0;
mmu_context_bmap[i + 3] = 0;
}
new_version = 1;
goto out;
mmu_context_wrap();
goto retry;
}
}
if (mm->context.sparc64_ctx_val)
cpumask_clear(mm_cpumask(mm));
mmu_context_bmap[new_ctx>>6] |= (1UL << (new_ctx & 63));
new_ctx |= (tlb_context_cache & CTX_VERSION_MASK);
out:
tlb_context_cache = new_ctx;
mm->context.sparc64_ctx_val = new_ctx | orig_pgsz_bits;
out:
spin_unlock(&ctx_alloc_lock);
if (unlikely(new_version))
smp_new_mmu_context_version();
}
static int numa_enabled = 1;

View File

@ -496,7 +496,8 @@ retry_tsb_alloc:
extern void copy_tsb(unsigned long old_tsb_base,
unsigned long old_tsb_size,
unsigned long new_tsb_base,
unsigned long new_tsb_size);
unsigned long new_tsb_size,
unsigned long page_size_shift);
unsigned long old_tsb_base = (unsigned long) old_tsb;
unsigned long new_tsb_base = (unsigned long) new_tsb;
@ -504,7 +505,9 @@ retry_tsb_alloc:
old_tsb_base = __pa(old_tsb_base);
new_tsb_base = __pa(new_tsb_base);
}
copy_tsb(old_tsb_base, old_size, new_tsb_base, new_size);
copy_tsb(old_tsb_base, old_size, new_tsb_base, new_size,
tsb_index == MM_TSB_BASE ?
PAGE_SHIFT : REAL_HPAGE_SHIFT);
}
mm->context.tsb_block[tsb_index].tsb = new_tsb;

View File

@ -971,11 +971,6 @@ xcall_capture:
wr %g0, (1 << PIL_SMP_CAPTURE), %set_softint
retry
.globl xcall_new_mmu_context_version
xcall_new_mmu_context_version:
wr %g0, (1 << PIL_SMP_CTX_NEW_VERSION), %set_softint
retry
#ifdef CONFIG_KGDB
.globl xcall_kgdb_capture
xcall_kgdb_capture:

View File

@ -14,7 +14,7 @@
static char *initrd __initdata = NULL;
static int load_initrd(char *filename, void *buf, int size);
static int __init read_initrd(void)
int __init read_initrd(void)
{
void *area;
long long size;
@ -46,8 +46,6 @@ static int __init read_initrd(void)
return 0;
}
__uml_postsetup(read_initrd);
static int __init uml_initrd_setup(char *line, int *add)
{
initrd = line;

View File

@ -338,11 +338,17 @@ int __init linux_main(int argc, char **argv)
return start_uml();
}
int __init __weak read_initrd(void)
{
return 0;
}
void __init setup_arch(char **cmdline_p)
{
stack_protections((unsigned long) &init_thread_info);
setup_physmem(uml_physmem, uml_reserved, physmem_size, highmem);
mem_total_pages(physmem_size, iomem_size, highmem);
read_initrd();
paging_init();
strlcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);

View File

@ -16,7 +16,7 @@
#ifndef BOOT_BOOT_H
#define BOOT_BOOT_H
#define STACK_SIZE 512 /* Minimum number of bytes for stack */
#define STACK_SIZE 1024 /* Minimum number of bytes for stack */
#ifndef __ASSEMBLY__

View File

@ -94,7 +94,7 @@ vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
quiet_cmd_check_data_rel = DATAREL $@
define cmd_check_data_rel
for obj in $(filter %.o,$^); do \
readelf -S $$obj | grep -qF .rel.local && { \
${CROSS_COMPILE}readelf -S $$obj | grep -qF .rel.local && { \
echo "error: $$obj has data relocations!" >&2; \
exit 1; \
} || true; \

View File

@ -761,7 +761,7 @@ static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_CORE, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_GT3E, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_X, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_X, hsx_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_BROADWELL_XEON_D, hsw_rapl_init),
X86_RAPL_MODEL_MATCH(INTEL_FAM6_XEON_PHI_KNL, knl_rapl_init),

View File

@ -264,6 +264,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
#endif
int mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);

Some files were not shown because too many files have changed in this diff Show More