[ Upstream commit 15e3d5a285 ]
3w controller don't dma map small single SGL entry commands but instead
bounce buffer them. Add a helper to identify these commands and don't
call scsi_dma_unmap for them.
Based on an earlier patch from James Bottomley.
Fixes: 118c85 ("3w-9xxx: fix command completion race")
Reported-by: Tóth Attila <atoth@atoth.sote.hu>
Tested-by: Tóth Attila <atoth@atoth.sote.hu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8ea4b34355 ]
Backports of 41fc014332 ("fib_rules: fix fib rule dumps across
multiple skbs") introduced a regression in "ip rule show" - it ends up
dumping the first rule over and over and never exiting, because 3.19
and earlier are missing commit 053c095a82 ("netlink: make
nlmsg_end() and genlmsg_end() void"), so fib_nl_fill_rule() ends up
returning skb->len (i.e. > 0) in the success case.
Fix this by checking the return code for < 0 instead of != 0.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 397d425dc2 ]
In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.
Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected. We certainly can not match
the current behavior as the current behavior is a security hole.
Therefore when encounting .. when following an unconnected path
return -ENOENT.
- Add a function path_connected to verify path->dentry is reachable
from path->mnt.mnt_root. AKA to validate that rename did not do
something nasty to the bind mount.
To avoid races path_connected must be called after following a path
component to it's next path component.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ee5d004fd0 ]
The 'event_work' worker used by dm-raid may still be running
when the array is stopped. This can result in an oops.
So flush the workqueue on which it is run after detaching
and before destroying the device.
Reported-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org (2.6.38+ please delay 2 weeks after -final release)
Fixes: 9d09e663d5 ("dm: raid456 basic support")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a0a2a66024 ]
The commit 738ac1ebb9 ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram. This is because skb_set_peeked may create
a new skb and free the existing one. As it stands the caller will
continue to use the old freed skb.
This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).
Fixes: 738ac1ebb9 ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 04697858d8 ]
Tony Luck found on his setup, if memory block size 512M will cause crash
during booting.
BUG: unable to handle kernel paging request at ffffea0074000020
IP: get_nid_for_pfn+0x17/0x40
PGD 128ffcb067 PUD 128ffc9067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc8 #1
...
Call Trace:
? register_mem_sect_under_node+0x66/0xe0
register_one_node+0x17b/0x240
? pci_iommu_alloc+0x6e/0x6e
topology_init+0x3c/0x95
do_one_initcall+0xcd/0x1f0
The system has non continuous RAM address:
BIOS-e820: [mem 0x0000001300000000-0x0000001cffffffff] usable
BIOS-e820: [mem 0x0000001d70000000-0x0000001ec7ffefff] usable
BIOS-e820: [mem 0x0000001f00000000-0x0000002bffffffff] usable
BIOS-e820: [mem 0x0000002c18000000-0x0000002d6fffefff] usable
BIOS-e820: [mem 0x0000002e00000000-0x00000039ffffffff] usable
So there are start sections in memory block not present. For example:
memory block : [0x2c18000000, 0x2c20000000) 512M
first three sections are not present.
The current register_mem_sect_under_node() assume first section is
present, but memory block section number range [start_section_nr,
end_section_nr] would include not present section.
For arch that support vmemmap, we don't setup memmap for struct page
area within not present sections area.
So skip the pfn range that belong to absent section.
[akpm@linux-foundation.org: simplification]
[rientjes@google.com: more simplification]
Fixes: bdee237c03 ("x86: mm: Use 2GB memory block size on large memory x86-64 systems")
Fixes: 982792c782 ("x86, mm: probe memory block size for generic x86 64bit")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Tested-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7f5dcaf1fd ]
The unregister path of platform_device is broken. On registration, it
will register all resources with either a parent already set, or
type==IORESOURCE_{IO,MEM}. However, on unregister it will release
everything with type==IORESOURCE_{IO,MEM}, but ignore the others. There
are also cases where resources don't get registered in the first place,
like with devices created by of_platform_populate()*.
Fix the unregister path to be symmetrical with the register path by
checking the parent pointer instead of the type field to decide which
resources to unregister. This is safe because the upshot of the
registration path algorithm is that registered resources have a parent
pointer, and non-registered resources do not.
* It can be argued that of_platform_populate() should be registering
it's resources, and they argument has some merit. However, there are
quite a few platforms that end up broken if we try to do that due to
overlapping resources in the device tree. Until that is fixed, we need
to solve the immediate problem.
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b9e23f3219 ]
Legacy IPs like PWMSS, present under l4per2_7xx_clkdm, cannot support
smart-idle when its clock domain is in HW_AUTO on DRA7 SoCs. Hence,
program clock domain to SW_WKUP.
Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3a496b00b6 ]
If the internal call to of_address_to_resource() fails, we end up
looping forever in of_find_matching_node_by_address(). This can be
caused by a defective device tree, or calling with an incorrect
matches argument.
Fix by calling of_find_matching_node() unconditionally at the end of
the loop.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bab383de3b ]
parport_find_base() will implicitly do parport_get_port() which
increases the refcount. Then parport_register_device() will again
increment the refcount. But while unloading the module we are only
doing parport_unregister_device() decrementing the refcount only once.
We add an parport_put_port() to neutralize the effect of
parport_get_port().
Cc: <stable@vger.kernel.org> # 2.6.32+
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 77d6273e79 ]
call12 can't be safely used as the first call in the inline function,
because the compiler does not extend the stack frame of the bounding
function accordingly, which may result in corruption of local variables.
If a call needs to be done, do call8 first followed by call12.
For pure assembly code in _switch_to increase stack frame size of the
bounding function.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4229fb12a0 ]
Userspace return code may skip restoring THREADPTR register if there are
no registers that need to be zeroed. This leads to spurious failures in
libc NPTL tests.
Always restore THREADPTR on return to userspace.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f691251c0 ]
We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1
This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)
However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.
VCPU 0 VCPU 1
access the invalid mmio spte
access the region originally was MMIO before
set the spte to the normal ram map
mmio #PF
check the spte and see it becomes normal ram mapping !!!
This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches
Reported-by: Pavel Shirshov <ru.pchel@gmail.com>
Tested-by: Pavel Shirshov <ru.pchel@gmail.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3af4e5a951 ]
It was reported that after 10-20 reboots, a usb keyboard plugged
into a docking station would not work unless it was replugged in.
Using usbmon, it turns out the interrupt URBs were streaming with
callback errors of -71 for some reason. The hid-core.c::hid_io_error was
supposed to retry and then reset, but the reset wasn't really happening.
The check for HID_NO_BANDWIDTH was inverted. Fix was simple.
Tested by reporter and locally by me by unplugging a keyboard halfway until I
could recreate a stream of errors but no disconnect.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71c6da846b ]
Currently context size (cra_ctxsize) doesn't specified for
ghash_async_alg. Which means it's zero. Thus crypto_create_tfm()
doesn't allocate needed space for ghash_async_ctx, so any
read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid.
Cc: stable@vger.kernel.org
Signed-off-by: Andrey Ryabinin <aryabinin@odin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b2fb5b1a0f ]
DWC3 uses bounce buffer to handle non max packet aligned OUT transfers and
the size of bounce buffer is 512 bytes. However if the host initiates OUT
transfers of size more than 512 bytes (and non max packet aligned), the
driver throws a WARN dump but still programs the TRB to receive more than
512 bytes. This will cause bounce buffer to overflow and corrupt the
adjacent memory locations which can be fatal.
Fix it by programming the TRB to receive a maximum of DWC3_EP0_BOUNCE_SIZE
(512) bytes.
Cc: <stable@vger.kernel.org> # 3.4+
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 951d3793bb ]
The driver used usb_get_serial_data(port->serial) which compiled but resulted
in a NULL pointer being returned (and subsequently used). I did not go deeper
into this but I guess this is a regression.
Signed-off-by: Philipp Hachtmann <hachti@hachti.de>
Fixes: a85796ee51 ("USB: symbolserial: move private-data allocation to
port_probe")
Cc: stable <stable@vger.kernel.org> # v3.10
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d1541dc977 ]
In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO".
But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class
and needs to be shifted to make space for the low-order interface byte.
Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code.
Fixes: 63c4408074 ("PCI: Add quirk for setting valid class for TI816X Endpoint")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Hemant Pedanekar <hemantp@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ad83dbd974 ]
The "adl_pci7x3x" driver replaced the "adl_pci7230" and "adl_pci7432"
drivers in commits 8f567c373c ("staging: comedi: new adl_pci7x3x
driver") and 657f77d173 ("staging: comedi: remove adl_pci7230 and
adl_pci7432 drivers"). Although the new driver code agrees with the
user manuals for the respective boards, digital outputs stopped working
on the PCI-7230. This has 16 digital output channels and the previous
adl_pci7230 driver shifted the 16 bit output state left by 16 bits
before writing to the hardware register. The new adl_pci7x3x driver
doesn't do that. Fix it in `adl_pci7x3x_do_insn_bits()` by checking
for the special case of the subdevice having only 16 channels and
duplicating the 16 bit output state into both halves of the 32-bit
register. That should work both for what the board actually does and
for what the user manual says it should do.
Fixes: 8f567c373c ("staging: comedi: new adl_pci7x3x driver")
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: <stable@vger.kernel.org> # 3.13+, needs backporting for 3.7 to 3.12
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7abad1063d ]
The different devices support by the adis16480 driver have slightly
different scales for the gyroscope and accelerometer channels.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c689a923c8 ]
Add inverse unit conversion macro to convert from standard IIO units to
units that might be used by some devices.
Those are useful in combination with scale factors that are specified as
IIO_VAL_FRACTIONAL. Typically the denominator for those specifications will
contain the maximum raw value the sensor will generate and the numerator
the value it maps to in a specific unit. Sometimes datasheets specify those
in different units than the standard IIO units (e.g. degree/s instead of
rad/s) and so we need to do a unit conversion.
From a mathematical point of view it does not make a difference whether we
apply the unit conversion to the numerator or the inverse unit conversion
to the denominator since (x / y) / z = x / (y * z). But as the denominator
is typically a larger value and we are rounding both the numerator and
denominator to integer values using the later method gives us a better
precision (E.g. the relative error is smaller if we round 8000.3 to 8000
rather than rounding 8.3 to 8).
This is where in inverse unit conversion macros will be used.
Marked for stable as used by some upcoming fixes.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a313bdc531 ]
Fix this error when compiling with CONFIG_SMP=n and
CONFIG_DYNAMIC_DEBUG=y:
drivers/s390/char/sclp_early.c: In function 'sclp_read_info_early':
drivers/s390/char/sclp_early.c:87:19: error: 'EBUSY' undeclared (first use in this function)
} while (rc == -EBUSY);
^
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bd3e1c7c6d ]
Due to some recent changes in
drm_helper_probe_single_connector_modes_merge_bits(), old custom modes
were not being pruned properly. In current kernels,
drm_mode_validate_basic() is called to sanity-check each mode in the
list. If the sanity-check passes, the mode's status gets set to to
MODE_OK. In older kernels this check was not done, so old custom modes
would still have a status of MODE_UNVERIFIED at this point, and would
therefore be pruned later in the function.
As a result of this new behavior, the list of modes for a device always
includes every custom mode ever configured for the device, with the
largest one listed first. Since desktop environments usually choose the
first preferred mode when a hotplug event is emitted, this had the
result of making it very difficult for the user to reduce the size of
the display.
The qxl driver did implement the mode_valid connector function, but it
was empty. In order to restore the old behavior where old custom modes
are pruned, we implement a proper mode_valid function for the qxl
driver. This function now checks each mode against the last configured
custom mode and the list of standard modes. If the mode doesn't match
any of these, its status is set to MODE_BAD so that it will be pruned as
expected.
Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 03a2d2a3ea ]
Commit description is copied from the original post of this bug:
http://comments.gmane.org/gmane.linux.kernel.mm/135349
Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next
larger cache size than the size index INDEX_NODE mapping. In kernels
3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size.
However, sometimes we can't get the right output we expected via
kmalloc_size(INDEX_NODE + 1), causing a BUG().
The mapping table in the latest kernel is like:
index = {0, 1, 2 , 3, 4, 5, 6, n}
size = {0, 96, 192, 8, 16, 32, 64, 2^n}
The mapping table before 3.10 is like this:
index = {0 , 1 , 2, 3, 4 , 5 , 6, n}
size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)}
The problem on my mips64 machine is as follows:
(1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC
&& DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150",
and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE
kmalloc_index(sizeof(struct kmem_cache_node))
(2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8.
(3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size
= PAGE_SIZE".
(4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and
"flags |= CFLGS_OFF_SLAB" will be covered.
(5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to
"cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result
here may be NULL while kernel bootup.
(6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the
BUG info as the following shows (may be only mips64 has this problem):
This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes
the BUG by adding 'size >= 256' check to guarantee that all necessary
small sized slabs are initialized regardless sequence of slab size in
mapping table.
Fixes: e33660165c ("slab: Use common kmalloc_index/kmalloc_size...")
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Liuhailong <liu.hailong6@zte.com.cn>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e81107d4c6 ]
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty. kernel stack for the stuck process looks like below.
#0 [ffff88303d107b58] __schedule at ffffffff815c4b20
#1 [ffff88303d107bd0] schedule at ffffffff815c513e
#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
#5 [ffff88303d107dd0] tty_read at ffffffff81368013
#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
#8 [ffff88303d107f00] sys_read at ffffffff811a4306
#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
There seems to be two problems causing this issue.
First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active(). However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
add_wait_queue(&tty->read_wait, &wait);
...
if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
ldata->read_head);
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
spin_lock_irqsave(&q->lock, flags);
/* from add_wait_queue() */
...
if (!input_available_p(tty, 0)) {
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
ldata->read_head);
if (waitqueue_active(&tty->read_wait))
__add_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock,flags);
/* from add_wait_queue() */
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.
This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation). Moreover, the resulting code is much simpler.
Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2bffa1503c ]
The cleaner policy doesn't make use of the per cache block hint space in
the metadata (unlike the other policies). When switching from the
cleaner policy to mq or smq a NULL pointer crash (in dm_tm_new_block)
was observed. The crash was caused by bugs in dm-cache-metadata.c
when trying to skip creation of the hint btree.
The minimal fix is to change hint size for the cleaner policy to 4 bytes
(only hint size supported).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4bacc9c923 ]
Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).
Using my union testsuite to set things up, before the patch I see:
[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
...
lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 13381 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 13381 Links: 1
...
After the patch:
[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
...
lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 40346 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 40346 Links: 1
...
Note the change in where /proc/$$/fd/5 points to in the ls command. It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).
The inode accessed, however, is the lower layer. The union layer is on device
25h/37d and the upper layer on 24h/36d.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 155e35d4da ]
Introduce some function for getting the inode (and also the dentry) in an
environment where layered/unioned filesystems are in operation.
The problem is that we have places where we need *both* the union dentry and
the lower source or workspace inode or dentry available, but we can only have
a handle on one of them. Therefore we need to derive the handle to the other
from that.
The idea is to introduce an extra field in struct dentry that allows the union
dentry to refer to and pin the lower dentry.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f25801ee46 ]
Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
as we've done the copy up for which we needed the freeze-write lock by that
point.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6423fc3416 ]
During driver probing the following code path is triggered.
igb_probe
->igb_sw_init
->igb_probe_vfs
->igb_pci_enable_sriov
->igb_sriov_reinit
Doing the SR-IOV re-init is not necessary during probing since we're
starting from scratch. Here we can call igb_enable_sriov() right away.
Running igb_sriov_reinit() during igb_probe() also seems to cause
occasional packet loss on some onboard 82576 NICs. Reproduced on
Dell and HP servers with onboard 82576 NICs.
Example:
Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
Subsystem: Dell Device [1028:0481]
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8474ba7419 ]
Make sure the compiler does not modify arguments of syscall functions.
This can happen if the compiler generates a tailcall to another
function. For example, without asmlinkage_protect sys_openat is compiled
into this function:
sys_openat:
clr.l %d0
move.w 18(%sp),%d0
move.l %d0,16(%sp)
jbra do_sys_open
Note how the fourth argument is modified in place, modifying the register
%d4 that gets restored from this stack slot when the function returns to
user-space. The caller may expect the register to be unmodified across
system calls.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 569ba74a7b ]
This is the arm64 portion of commit 45cac65b0f ("readahead: fault
retry breaks mmap file read random detection"), which was absent from
the initial port and has since gone unnoticed. The original commit says:
> .fault now can retry. The retry can break state machine of .fault. In
> filemap_fault, if page is miss, ra->mmap_miss is increased. In the second
> try, since the page is in page cache now, ra->mmap_miss is decreased. And
> these are done in one fault, so we can't detect random mmap file access.
>
> Add a new flag to indicate .fault is tried once. In the second try, skip
> ra->mmap_miss decreasing. The filemap_fault state machine is ok with it.
With this change, Mark reports that:
> Random read improves by 250%, sequential read improves by 40%, and
> random write by 400% to an eMMC device with dm crypto wrapped around it.
Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Riley Andrews <riandrews@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cde93be45a ]
A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root. For lack of a better term
I call this an escaped path.
prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.
__d_path only wants to see paths are connected to the root it passes
in. So __d_path needs prepend_path to return an error.
d_absolute_path similarly wants to see paths that are connected to
some root. Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1. So escaped paths will be treated like paths on lazily
unmounted mounts.
getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.
d_path is the interesting hold out. d_path just wants to print
something, and does not care about the weird cases. Which raises
the question what should be printed?
Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths. As there are not really any meaninful path components when
considered from the perspective of a mount tree.
So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7c7feb2ebf ]
UBI: attaching mtd1 to ubi0
UBI: scanning is finished
UBI error: init_volumes: not enough PEBs, required 706, available 686
UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
UBI error: ubi_init: cannot attach mtd1
If available PEBs are not enough when initializing volumes, return -ENOSPC
directly. If available PEBs are not enough when initializing WL, return
-ENOSPC instead of -ENOMEM.
Cc: stable@vger.kernel.org
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 281fda2767 ]
Make sure that data_size is less than LEB size.
Otherwise a handcrafted UBI image is able to trigger
an out of bounds memory access in ubi_compare_lebs().
Cc: stable@vger.kernel.org
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e297c939b7 ]
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts. The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.
The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use. Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using). CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.
To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping(). Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.
The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d7 ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.
Fixes: 05af7bd2d7 ("[POWERPC] MPIC U3/U4 MSI backend")
Cc: stable@vger.kernel.org # v2.6.22+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c2e4b24ff8 ]
When a trace recorded on a 32-bit device is processed with a 64-bit
binary, the higher 32-bits of the address need to ignored.
The lack of this results in the output of the 64-bit pointer
value to the trace as the 32-bit address lookup fails in find_printk().
Before:
burn-1778 [003] 548.600305: bputs: 0xc0046db2s: 2cec5c058d98c
After:
burn-1778 [003] 548.600305: bputs: 0xc0046db2s: RT throttling activated
The problem occurs in PRINT_FIELD when the field is recognized as a
pointer to a string (of the type const char *)
Heterogeneous architectures cases below can arise and should be handled:
* Traces recorded using 32-bit addresses processed on a 64-bit machine
* Traces recorded using 64-bit addresses processed on a 32-bit machine
Reported-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1442928123-13824-1-git-send-email-kapileshwar.singh@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 53cf037bf8 ]
The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
need a correctly set skb network header.
Unfortunately we cannot rely on the device drivers to set it for us.
Therefore setting it in the beginning of the according ndo_start_xmit
handler.
Fixes: 1d8ab8d3c1 ("batman-adv: Modified forwarding behaviour for multicast packets")
Fixes: ab49886e3d ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ac4eebd484 ]
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: e17931d1a6 ("batman-adv: introduce capability initialization bitfield")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4635469f5c ]
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 3f4841ffb3 ("batman-adv: tvlv - add network coding container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 53960059d5 ]
If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
will cause DMA memory allocated for devices with a 32..63-bit
coherent_dma_mask to fall back to using __GFP_DMA, even though there may
only be 32-bits of physical address available anyway.
Correct that case to compare against a mask the size of phys_addr_t
instead of always using a 64-bit mask.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Fixes: a2e715a86c ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 2.6.36+
Patchwork: https://patchwork.linux-mips.org/patch/9610/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a2022001ce ]
Tolerance applies on both sides of the target voltage, i.e. both min and
max sides. But while checking if a voltage is supported by the regulator
or not, we haven't taken care of tolerance on the lower side. Fix that.
Cc: Lucas Stach <l.stach@pengutronix.de>
Fixes: 045ee45c4f ("cpufreq: cpufreq-dt: disable unsupported OPPs")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 72194739f5 ]
Add a device quirk for the Logitech PTZ Pro Camera and its sibling the
ConferenceCam CC3000e Camera.
This fixes the failed camera enumeration on some boot, particularly on
machines with fast CPU.
Tested by connecting a Logitech PTZ Pro Camera to a machine with a
Haswell Core i7-4600U CPU @ 2.10GHz, and doing thousands of reboot cycles
while recording the kernel logs and taking camera picture after each boot.
Before the patch, more than 7% of the boots show some enumeration transfer
failures and in a few of them, the kernel is giving up before actually
enumerating the webcam. After the patch, the enumeration has been correct
on every reboot.
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b0a688ddcc ]
since commit 33c300cb90 ("usb: musb: dsps:
don't fake of_node to musb core") we have been
preventing CPPI 4.1 from probing due to NULL
of_node. We can't revert said commit otherwise
a different regression would show up, so the fix
is to look for the parent device's (glue layer's)
of_node instead, since that's the thing which
is actually described in DTS.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ff30cbc8da ]
Bits 1:0 of the bmAttributes are used for the burst multiplier.
The rest of the bits used to be reserved (zero), but USB3.1 takes bit 7
into use.
Use the existing USB_SS_MULT() macro instead to make sure the mult value
and hence max packet calculations are correct for USB3.1 devices.
Note that burst multiplier in bmAttributes is zero based and that
the USB_SS_MULT() macro adds one.
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 56ffa1d154 ]
According to spec, there are functional and protocol stalls.
For functional stall, it is for bulk and interrupt endpoints,
below are cases for it:
- Host sends SET_FEATURE request for Set-Halt, the udc driver
needs to set stall, and return true unconditionally.
- The gadget driver may call usb_ep_set_halt to stall certain
endpoints, if there is a transfer in pending, the udc driver
should not set stall, and return -EAGAIN accordingly.
These two kinds of stall need to be cleared by host using CLEAR_FEATURE
request (Clear-Halt).
For protocol stall, it is for control endpoint, this stall will
be set if the control request has failed. This stall will be
cleared by next setup request (hardware will do it).
It fixed usbtest (drivers/usb/misc/usbtest.c) Test 13 "set/clear halt"
test failure, meanwhile, this change has been verified by
USB2 CV Compliance Test and MSC Tests.
Cc: <stable@vger.kernel.org> #3.10+
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 176fc2d577 ]
The in kernel snprintf() will conveniently return the actual length of
the printed string even if not given an output beffer at all so just do
that rather than relying on the user to pass in a suitable buffer,
ensuring that we don't need to worry if the buffer was truncated due to
the size of the buffer passed in.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b763ec17ac ]
If a read is attempted which is smaller than the line length then we may
underflow the subtraction we're doing with the unsigned size_t type so
move some of the calculation to be additions on the right hand side
instead in order to avoid this.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 28c1f1628e ]
The rockchip io-domain driver currently only depends on ARCH_ROCKCHIP
itself. This makes it possible to select the power-domain driver, but
not the POWER_AVS class and results in the iodomain-driver not getting
build in this case.
So add the additional dependency, which also results in the driver
config option now being placed nicely into the AVS submenu.
Fixes: 662a958638 ("PM / AVS: rockchip-io: add driver handling Rockchip io domains")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bc3e00f04c ]
When keeping the configuration set by the bootloader (by using
the marvell,nand-keep-config property), the pxa3xx_nand_detect_config()
function is called and set the chunk size to 512 as a default value if
NDCR_PAGE_SZ is not set.
In the other case, when not keeping the bootloader configuration, no
chunk size is set. Fix this by adding a default chunk size of 512.
Fixes: 70ed85232a ("mtd: nand: pxa3xx: Introduce multiple page I/O
support")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 98ce94c8df ]
Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
10.10.5) share fails in case the clocks differ more than +/-2h:
digest-service: digest-request: od failed with 2 proto=ntlmv2
digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
Fix this by (re-)using the given server timestamp for the
ntlmv2 authentication (as Windows 7 does).
A related problem was also reported earlier by Namjae Jaen (see below):
Windows machine has extended security feature which refuse to allow
authentication when there is time difference between server time and
client time when ntlmv2 negotiation is used. This problem is prevalent
in embedded enviornment where system time is set to default 1970.
Modern servers send the server timestamp in the TargetInfo Av_Pair
structure in the challenge message [see MS-NLMP 2.2.2.1]
In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
use the server provided timestamp if present OR current time if it is
not
Reported-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cf75eb15be ]
cd-gpios polarity should be changed to GPIO_ACTIVE_LOW and wp-gpios
should be changed to GPIO_ACTIVE_HIGH.
Otherwise, the SD may not work properly due to wrong polarity inversion
specified in DT after switch to common parsing function mmc_of_parse().
Signed-off-by: Dong Aisheng <aisheng.dong@freescale.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 94d7694685 ]
cd-gpios polarity should be changed to GPIO_ACTIVE_LOW and wp-gpios
should be changed to GPIO_ACTIVE_HIGH.
Otherwise, the SD may not work properly due to wrong polarity inversion
specified in DT after switch to common parsing function mmc_of_parse().
Signed-off-by: Dong Aisheng <aisheng.dong@freescale.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aca45c0e95 ]
cd-gpios polarity should be changed to GPIO_ACTIVE_LOW and wp-gpios
should be changed to GPIO_ACTIVE_HIGH.
Otherwise, the SD may not work properly due to wrong polarity inversion
specified in DT after switch to common parsing function mmc_of_parse().
Signed-off-by: Dong Aisheng <aisheng.dong@freescale.com>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 65d7d46050 ]
Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
OGM handler might undo the set/clear of a specific bit from another
handler run in between.
Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.
Fixes: 17cf0ea455 ("batman-adv: tvlv - add distributed arp table container")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ef72706a05 ]
The tt_local_entry deletion performed in batadv_tt_local_remove() was neither
protecting against simultaneous deletes nor checking whether the element was
still part of the list before calling hlist_del_rcu().
Replacing the hlist_del_rcu() call with batadv_hash_remove() provides adequate
protection via hash spinlocks as well as an is-element-still-in-hash check to
avoid 'blind' hash removal.
Fixes: 068ee6e204 ("batman-adv: roaming handling mechanism redesign")
Reported-by: alfonsname@web.de
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2701fa0864 ]
Commit 11c32d7b62
"video: move Versatile CLCD helpers" missed the fact
that the Integrator/CP is also using the helper, and
as a result the platform got only stubs and no graphics.
Add this as a default selection to Kconfig so we have
graphics again.
Fixes: 11c32d7b62 (video: move Versatile CLCD helpers)
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 56184858d1 ]
Fix crash in 3.5+ if FTP is used after switching
sync_version to 0.
Fixes: 749c42b620 ("ipvs: reduce sync rate with time thresholds")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71563f3414 ]
It is possible that we bind against a local socket in early_demux when we
are actually going to want to forward it. In this case, the socket serves
no purpose and only serves to confuse things (particularly functions which
implicitly expect sk_fullsock to be true, like ip_local_out).
Additionally, skb_set_owner_w is totally broken for non full-socks.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Fixes: 41063e9dd1 ("ipv4: Early TCP socket demux.")
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 05f00505a8 ]
I overlooked the svc->sched_data usage from schedulers
when the services were converted to RCU in 3.10. Now
the rare ipvsadm -E command can change the scheduler
but due to the reverse order of ip_vs_bind_scheduler
and ip_vs_unbind_scheduler we provide new sched_data
to the old scheduler resulting in a crash.
To fix it without changing the scheduler methods we
have to use synchronize_rcu() only for the editing case.
It means all svc->scheduler readers should expect a
NULL value. To avoid breakage for the service listing
and ipvsadm -R we can use the "none" name to indicate
that scheduler is not assigned, a state when we drop
new connections.
Reported-by: Alexander Vasiliev <a.vasylev@404-group.com>
Fixes: ceec4c3816 ("ipvs: convert services to rcu")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4754957f04 ]
Michael Vallaly reports about wrong source address used
in rare cases for tunneled traffic. Looks like
__ip_vs_get_out_rt in 3.10+ is providing uninitialized
dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses
kmalloc. While we retry after seeing EINVAL from routing
for data that does not look like valid local address, it
still succeeded when this memory was previously used from
other dests and with different local addresses. As result,
we can use valid local address that is not suitable for
our real server.
Fix it by providing 0.0.0.0 every time our cache is refreshed.
By this way we will get preferred source address from routing.
Reported-by: Michael Vallaly <lvs@nolatency.com>
Fixes: 026ace060d ("ipvs: optimize dst usage for real server")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 54d27365ca ]
The optimized task selection logic optimistically selects a new task
to run without first doing a full put_prev_task(). This is so that we
can avoid a put/set on the common ancestors of the old and new task.
Similarly, we should only call check_cfs_rq_runtime() to throttle
eligible groups if they're part of the common ancestry, otherwise it
is possible to end up with no eligible task in the simple task
selection.
Imagine:
/root
/prev /next
/A /B
If our optimistic selection ends up throttling /next, we goto simple
and our put_prev_task() ends up throttling /prev, after which we're
going to bug out in set_next_entity() because there aren't any tasks
left.
Avoid this scenario by only throttling common ancestors.
Reported-by: Mohammed Naser <mnaser@vexxhost.com>
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Ben Segall <bsegall@google.com>
[ munged Changelog ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: pjt@google.com
Fixes: 678d5718d8 ("sched/fair: Optimize cgroup pick_next_task_fair()")
Link: http://lkml.kernel.org/r/xm26wq1oswoq.fsf@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b9a5322779 ]
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state. Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.
We already did this for the IPC semaphore code (see commit e8577d1f03:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cc8e4fc0c3 ]
Don't check if timer is running with a timer_pending() before
deleting it with del_timer_sync(), this defies the whole point of
the sync part and can cause a possible race.
Instead we just want to make sure the timer is initialized early enough
before we have a chance to delete it.
Cc: <stable@vger.kernel.org>
Reported-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dca7794539 ]
Some changes between xhci 0.96 and xhci 1.0 specifications forced us to
check the hci version in code, some of these checks were implemented as
hci_version == 1.0, which will not work with new xhci 1.1 controllers.
xhci 1.1 behaves similar to xhci 1.0 in these cases, so change these
checks to hci_version >= 1.0
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cbb4be652d ]
Fix potential null-pointer dereference at probe by making sure that the
required endpoints are present.
The whiteheat driver assumes there are at least five pairs of bulk
endpoints, of which the final pair is used for the "command port". An
attempt to bind to an interface with fewer bulk endpoints would
currently lead to an oops.
Fixes CVE-2015-5257.
Reported-by: Moein Ghasemzadeh <moein@istuary.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 74b3112e95 ]
Instead of only enabling the backlight (which seems to set it to max
brightness), just re-set the current backlight level, which also takes
care of enabling the backlight if necessary.
Port of radeon commit:
drm/radeon: Restore LCD backlight level on resume (>= R5xx)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit da168d81b4 ]
I've done some extensive history digging across libdrm, mesa and
xf86-video-{intel,nouveau,ati}. The only potential user of this with
kms drivers I could find was ttmtest, which once used drmGetLock
still. But that mistake was quickly fixed up. Even the intel xvmc
library (which otherwise was really good with using dri1 stuff in kms
mode) managed to never take the hw lock for dri2 (and hence kms).
Hence it should be save to unconditionally disallow this.
Cc: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cd67d226eb ]
The VBT MIPI Sequence Block version 3 has forward incompatible changes:
First, the block size in the header has been specified reserved, and the
actual size is a separate 32-bit value within the block. The current
find_section() function to will only look at the size in the block
header, and, depending on what's in that now reserved size field,
continue looking for other sections in the wrong place.
Fix this by taking the new block size field into account. This will
ensure that the lookups for other sections will work properly, as long
as the new 32-bit size does not go beyond the opregion VBT mailbox size.
Second, the contents of the block have been completely
changed. Gracefully refuse parsing the yet unknown data version.
Cc: Deepak M <m.deepak@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Deepak M <m.deepak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 646200a041 ]
The error paths in set_file_size for cifs and smb3 are incorrect.
In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.
Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.
Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
Pointed out by PaX/grsecurity team
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: PaX Team <pageexec@freemail.hu>
CC: Emese Revfy <re.emese@gmail.com>
CC: Brad Spengler <spender@grsecurity.net>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 048883e0b9 ]
We really want sizeof(struct page *) instead. Otherwise we limit
maximum IO size to 64 pages rather than 512 pages on a 64bit system.
Fixes 2e11f829(nfs: cap request size to fit a kmalloced page array).
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: 2e11f8296d ("nfs: cap request size to fit a kmalloced page array")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 205ee117d4 ]
like nf_log_unset, nf_log_unregister must not reset the list of loggers.
Otherwise, a call to nf_log_unregister() will render loggers of other nf
protocols unusable:
iptables -A INPUT -j LOG
modprobe nf_log_arp ; rmmod nf_log_arp
iptables -A INPUT -j LOG
iptables: No chain/target/match by that name
Fixes: 30e0c6a6be ("netfilter: nf_log: prepare net namespace support for loggers")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ba378ca9c0 ]
Fix lookup of existing match/target structures in the corresponding list
by skipping the family check if NFPROTO_UNSPEC is used.
This is resulting in the allocation and insertion of one match/target
structure for each use of them. So this not only bloats memory
consumption but also severely affects the time to reload the ruleset
from the iptables-compat utility.
After this patch, iptables-compat-restore and iptables-compat take
almost the same time to reload large rulesets.
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ad5001cc7c ]
The nf_log_unregister() function needs to call synchronize_rcu() to make sure
that the objects are not dereferenced anymore on module removal.
Fixes: 5962815a6a ("netfilter: nf_log: use an array of loggers instead of list")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 95dd8653de ]
We have to put back the references to the master conntrack and the expectation
that we just created, otherwise we'll leak them.
Fixes: 0ef71ee1a5 ("netfilter: ctnetlink: refactor ctnetlink_create_expect")
Reported-by: Tim Wiess <Tim.Wiess@watchguard.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4b31814d20 ]
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a9de9777d6 ]
The convention in nfnetlink is to use network byte order in every header field
as well as in the attribute payload. The initial version of the batching
infrastructure assumes that res_id comes in host byte order though.
The only client of the batching infrastructure is nf_tables, so let's add a
workaround to address this inconsistency. We currently have 11 nfnetlink
subsystems according to NFNL_SUBSYS_COUNT, so we can assume that the subsystem
2560, ie. htons(10), will not be allocated anytime soon, so it can be an alias
of nf_tables from the nfnetlink batching path when interpreting the res_id
field.
Based on original patch from Florian Westphal.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 042745ee53 ]
Commit 3a0f9aaee0 ("dm raid: round region_size to power of two")
intended to make sure that the default region size is a power of two.
However, the logic in that commit is incorrect and sets the variable
region_size to 0 or 1, depending on whether min_region_size is a power
of two.
Fix this logic, using roundup_pow_of_two(), so that region_size is
properly rounded up to the next power of two.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 3a0f9aaee0 ("dm raid: round region_size to power of two")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b0dc3c8bc1 ]
When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down. If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.
This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp). And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9d11b51ce7 ]
The Linux NFS server returns garbage in the data payload of inline
NFS/RDMA READ replies. These are READs of under 1000 bytes or so
where the client has not provided either a reply chunk or a write
list.
The NFS server delivers the data payload for an NFS READ reply to
the transport in an xdr_buf page list. If the NFS client did not
provide a reply chunk or a write list, send_reply() is supposed to
set up a separate sge for the page containing the READ data, and
another sge for XDR padding if needed, then post all of the sges via
a single SEND Work Request.
The problem is send_reply() does not advance through the xdr_buf
when setting up scatter/gather entries for SEND WR. It always calls
dma_map_xdr with xdr_off set to zero. When there's more than one
sge, dma_map_xdr() sets up the SEND sge's so they all point to the
xdr_buf's head.
The current Linux NFS/RDMA client always provides a reply chunk or
a write list when performing an NFS READ over RDMA. Therefore, it
does not exercise this particular case. The Linux server has never
had to use more than one extra sge for building RPC/RDMA replies
with a Linux client.
However, an NFS/RDMA client _is_ allowed to send small NFS READs
without setting up a write list or reply chunk. The NFS READ reply
fits entirely within the inline reply buffer in this case. This is
perhaps a more efficient way of performing NFS READs that the Linux
NFS/RDMA client may some day adopt.
Fixes: b432e6b3d9 ('svcrdma: Change DMA mapping logic to . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 808f80b467 ]
My previous fix in commit 005efedf2c ("Btrfs: fix read corruption of
compressed and shared extents") was effective only if the compressed
extents cover a file range with a length that is not a multiple of 16
pages. That's because the detection of when we reached a different range
of the file that shares the same compressed extent as the previously
processed range was done at extent_io.c:__do_contiguous_readpages(),
which covers subranges with a length up to 16 pages, because
extent_readpages() groups the pages in clusters no larger than 16 pages.
So fix this by tracking the start of the previously processed file
range's extent map at extent_readpages().
The following test case for fstests reproduces the issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
rm -f $seqres.full
test_clone_and_read_compressed_extent()
{
local mount_opts=$1
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
# Create our test file with a single extent of 64Kb that is going to
# be compressed no matter which compression algo is used (zlib/lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now clone the compressed extent into an adjacent file offset.
$CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
echo "File digest before unmount:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
# Remount the fs or clear the page cache to trigger the bug in
# btrfs. Because the extent has an uncompressed length that is a
# multiple of 16 pages, all the pages belonging to the second range
# of the file (64K to 128K), which points to the same extent as the
# first range (0K to 64K), had their contents full of zeroes instead
# of the byte 0xaa. This was a bug exclusively in the read path of
# compressed extents, the correct data was stored on disk, btrfs
# just failed to fill in the pages correctly.
_scratch_remount
echo "File digest after remount:"
# Must match the digest we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
}
echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"
_scratch_unmount
echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"
status=0
exit
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 005efedf2c ]
If a file has a range pointing to a compressed extent, followed by
another range that points to the same compressed extent and a read
operation attempts to read both ranges (either completely or part of
them), the pages that correspond to the second range are incorrectly
filled with zeroes.
Consider the following example:
File layout
[0 - 8K] [8K - 24K]
| |
| |
points to extent X, points to extent X,
offset 4K, length of 8K offset 0, length 16K
[extent X, compressed length = 4K uncompressed length = 16K]
If a readpages() call spans the 2 ranges, a single bio to read the extent
is submitted - extent_io.c:submit_extent_page() would only create a new
bio to cover the second range pointing to the extent if the extent it
points to had a different logical address than the extent associated with
the first range. This has a consequence of the compressed read end io
handler (compression.c:end_compressed_bio_read()) finish once the extent
is decompressed into the pages covering the first range, leaving the
remaining pages (belonging to the second range) filled with zeroes (done
by compression.c:btrfs_clear_biovec_end()).
So fix this by submitting the current bio whenever we find a range
pointing to a compressed extent that was preceded by a range with a
different extent map. This is the simplest solution for this corner
case. Making the end io callback populate both ranges (or more, if we
have multiple pointing to the same extent) is a much more complex
solution since each bio is tightly coupled with a single extent map and
the extent maps associated to the ranges pointing to the shared extent
can have different offsets and lengths.
The following test case for fstests triggers the issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
rm -f $seqres.full
test_clone_and_read_compressed_extent()
{
local mount_opts=$1
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
# Create a test file with a single extent that is compressed (the
# data we write into it is highly compressible no matter which
# compression algorithm is used, zlib or lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K" \
-c "pwrite -S 0xbb 4K 8K" \
-c "pwrite -S 0xcc 12K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now clone our extent into an adjacent offset.
$CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Same as before but for this file we clone the extent into a lower
# file offset.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
-c "pwrite -S 0xbb 12K 8K" \
-c "pwrite -S 0xcc 20K 4K" \
$SCRATCH_MNT/bar | _filter_xfs_io
$CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/bar
echo "File digests before unmounting filesystem:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
# Evicting the inode or clearing the page cache before reading
# again the file would also trigger the bug - reads were returning
# all bytes in the range corresponding to the second reference to
# the extent with a value of 0, but the correct data was persisted
# (it was a bug exclusively in the read path). The issue happened
# only if the same readpages() call targeted pages belonging to the
# first and second ranges that point to the same compressed extent.
_scratch_remount
echo "File digests after mounting filesystem again:"
# Must match the same digests we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
}
echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"
_scratch_unmount
echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"
status=0
exit
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a30e577c96 ]
In btrfs_evict_inode, we properly truncate the page cache for evicted
inodes but then we call btrfs_wait_ordered_range for every inode as well.
It's the right thing to do for regular files but results in incorrect
behavior for device inodes for block devices.
filemap_fdatawrite_range gets called with inode->i_mapping which gets
resolved to the block device inode before getting passed to
wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi. What happens
next depends on whether there's an open file handle associated with the
inode. If there is, we write to the block device, which is unexpected
behavior. If there isn't, we through normally and inode->i_data is used.
We can also end up racing against open/close which can result in crashes
when i_mapping points to a block device inode that has been closed.
Since there can't be any page cache associated with special file inodes,
it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
the problem.
Cc: <stable@vger.kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3c8f7710c1 ]
The previous fix of pxa library support, which was introduced to fix the
library dependency, broke the previous SoC behavior, where a machine
code binding pxa2xx-ac97 with a coded relied on :
- sound/soc/pxa/pxa2xx-ac97.c
- sound/soc/codecs/XXX.c
For example, the mioa701_wm9713.c machine code is currently broken. The
"select ARM" statement wrongly selects the soc/arm/pxa2xx-ac97 for
compilation, as per an unfortunate fate SND_PXA2XX_AC97 is both declared
in sound/arm/Kconfig and sound/soc/pxa/Kconfig.
Fix this by ensuring that SND_PXA2XX_SOC correctly triggers the correct
pxa2xx-ac97 compilation.
Fixes: 846172dfe3 ("ASoC: fix SND_PXA2XX_LIB Kconfig warning")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8811191fdf ]
PCM receive and transmit DMA requestor lines were reverted, breaking the
PCM playback interface for PXA platforms using the sound/soc/ variant
instead of the sound/arm variant.
The commit below shows the inversion in the requestor lines.
Fixes: d65a14587a ("ASoC: pxa: use snd_dmaengine_dai_dma_data")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 225db5762d ]
When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel
warnings like:
WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80()
sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0'
It's because both emux synth and opl3 drivers try to register their
OSS device object with the same static index number 0. This hasn't
been a big problem until the recent rewrite of device management code
(that exposes sysfs at the same time), but it's been an obvious bug.
This patch works around it just by using a different index number of
emux synth object. There can be a more elegant way to fix, but it's
enough for now, as this code won't be touched so often, in anyway.
Reported-and-tested-by: Michael Shell <list1@michaelshell.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f84a8990e ]
SunDong reported the following on
https://bugzilla.kernel.org/show_bug.cgi?id=103841
I think I find a linux bug, I have the test cases is constructed. I
can stable recurring problems in fedora22(4.0.4) kernel version,
arch for x86_64. I construct transparent huge page, when the parent
and child process with MAP_SHARE, MAP_PRIVATE way to access the same
huge page area, it has the opportunity to lead to huge page copy on
write failure, and then it will munmap the child corresponding mmap
area, but then the child mmap area with VM_MAYSHARE attributes, child
process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
functions (vma - > vm_flags & VM_MAYSHARE).
There were a number of problems with the report (e.g. it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this
vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0
pgoff 0 file ffff88106bdb9800 private_data (null)
flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
------------
kernel BUG at mm/hugetlb.c:462!
SMP
Modules linked in: xt_pkttype xt_LOG xt_limit [..]
CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
set_vma_resv_flags+0x2d/0x30
The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.
When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.
The problem is that the same file is mapped shared and private. During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered. This
patch identifies such VMAs and skips them.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 012572d4fc ]
The order of the following three spinlocks should be:
dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
But dlm_dispatch_assert_master() is called while holding
dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
dlm_grab() which will take dlm_domain_lock.
Once another thread (for example, dlm_query_join_handler) has already
taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
happens.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 02bc933ebb ]
On Intel Baytrail, there is case when interrupt handler get called, no SPI
message is captured. The RX FIFO is indeed empty when RX timeout pending
interrupt (SSSR_TINT) happens.
Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
drivers are using IRQF_SHARED when calling the request_irq function. When
running two separate and independent SPI and HSUART application that
generate data traffic on both components, user will see messages like
below on the console:
pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
This commit will fix this by first checking Receiver Time-out Interrupt,
if it is disabled, ignore the request and return without servicing.
Signed-off-by: Tan, Jui Nee <jui.nee.tan@intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a394d63519 ]
Actually, spi_master_put() after spi_alloc_master() must _not_ be followed
by kfree(). The memory is already freed with the call to spi_master_put()
through spi_master_class, which registers a release function. Calling both
spi_master_put() and kfree() results in often nasty (and delayed) crashes
elsewhere in the kernel, often in the networking stack.
This reverts commit eb4af0f534.
Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269
or
http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html
Alexey Klimov: This revert becomes valid after
94c69f765f when spi-imx.c
has been fixed and there is no need to call kfree() so comment
for spi_alloc_master() should be fixed.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit adc0b7fbf6 ]
my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for
spilling/filling into a floating point register in our decompressor.
This will cause an AFP-register data exception as the decompressor
did not setup the additional floating point registers via cr0.
That causes a program check loop that looked like a hang with
one "Uncompressing Linux... " message (directly booted via kvm)
or a loop of "Uncompressing Linux... " messages (when booted via
zipl boot loader).
The offending code in my build was
48e400: e3 c0 af ff ff 71 lay %r12,-1(%r10)
-->48e406: b3 c1 00 1c ldgr %f1,%r12
48e40a: ec 6c 01 22 02 7f clij %r6,2,12,0x48e64e
but gcc could do spilling into an fpr at any function. We can
simply disable floating point support at that early stage.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8d4bd0ed04 ]
The uc_sigmask in the ucontext structure is an array of words to keep
the 64 signal bits (or 1024 if you ask glibc but the kernel sigset_t
only has 64 bits).
For 64 bit the sigset_t contains a single 8 byte word, but for 31 bit
there are two 4 byte words. The compat signal handler code uses a
simple copy of the 64 bit sigset_t to the 31 bit compat_sigset_t.
As s390 is a big-endian architecture this is incorrect, the two words
in the 31 bit sigset_t array need to be swapped.
Cc: <stable@vger.kernel.org>
Reported-by: Stefan Liebler <stli@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 95913d9791 ]
So the problem this patch is trying to address is as follows:
CPU0 CPU1
context_switch(A, B)
ttwu(A)
LOCK A->pi_lock
A->on_cpu == 0
finish_task_switch(A)
prev_state = A->state <-.
WMB |
A->on_cpu = 0; |
UNLOCK rq0->lock |
| context_switch(C, A)
`-- A->state = TASK_DEAD
prev_state == TASK_DEAD
put_task_struct(A)
context_switch(A, C)
finish_task_switch(A)
A->state == TASK_DEAD
put_task_struct(A)
The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.
Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.
We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.
The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # v3.1+
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0b34a166f2 ]
Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:
- Bound event channels.
- Registered vcpu_info.
- PIRQ/emuirq mappings.
- shared_info frame after XENMAPSPACE_shared_info operation.
- Active grant mappings.
Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.
Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.' which gives a clue to what the problem is and
eliminates false expectations.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ab76f7b4ab ]
Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables. Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.
Before:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd
After:
---[ High Kernel Mapping ]---
0xffffffff80000000-0xffffffff81000000 16M pmd
0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd
0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte
0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte
0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd
0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte
0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte
0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd
0xffffffff82200000-0xffffffffa0000000 478M pmd
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eddd3826a1 ]
Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444
The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:
- The lower bound of the stack is not the start of the stack
page. It's the start of the stack page plus sizeof (struct
thread_info)
- The upper bound must be:
top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
The 2 * sizeof(unsigned long) is required because the stack pointer
points at the frame pointer. The layout on the stack is: ... IP FP
... IP FP. So we need to make sure that both IP and FP are in the
bounds.
Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.
Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.
Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.
Add proper comments while at it.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e3c41e37b0 ]
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:
BUG: unable to handle kernel paging request at ffffc90613fc9000
IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.
This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.
Here's how I found the bug:
After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.
But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.
I printed those small memory regions, for example:
kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
Based on the code in walk_system_ram_range(), this memory region
will be filtered out:
pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
end_pfn - pfn = 0x77593 - 0x77593 = 0 <=== if (end_pfn > pfn) is FALSE
So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.
This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.
Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a5caa209ba ]
Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
that signals that the firmware PE/COFF loader supports splitting
code and data sections of PE/COFF images into separate EFI
memory map entries. This allows the kernel to map those regions
with strict memory protections, e.g. EFI_MEMORY_RO for code,
EFI_MEMORY_XP for data, etc.
Unfortunately, an unwritten requirement of this new feature is
that the regions need to be mapped with the same offsets
relative to each other as observed in the EFI memory map. If
this is not done crashes like this may occur,
BUG: unable to handle kernel paging request at fffffffefe6086dd
IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
Call Trace:
[<ffffffff8104c90e>] efi_call+0x7e/0x100
[<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
[<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
[<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
[<ffffffff81f37e1b>] start_kernel+0x38a/0x417
[<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
Here 0xfffffffefe6086dd refers to an address the firmware
expects to be mapped but which the OS never claimed was mapped.
The issue is that included in these regions are relative
addresses to other regions which were emitted by the firmware
toolchain before the "splitting" of sections occurred at
runtime.
Needless to say, we don't satisfy this unwritten requirement on
x86_64 and instead map the EFI memory map entries in reverse
order. The above crash is almost certainly triggerable with any
kernel newer than v3.13 because that's when we rewrote the EFI
runtime region mapping code, in commit d2f7cbe7b2 ("x86/efi:
Runtime services virtual mapping"). For kernel versions before
v3.13 things may work by pure luck depending on the
fragmentation of the kernel virtual address space at the time we
map the EFI regions.
Instead of mapping the EFI memory map entries in reverse order,
where entry N has a higher virtual address than entry N+1, map
them in the same order as they appear in the EFI memory map to
preserve this relative offset between regions.
This patch has been kept as small as possible with the intention
that it should be applied aggressively to stable and
distribution kernels. It is very much a bugfix rather than
support for a new feature, since when EFI_PROPERTIES_TABLE is
enabled we must map things as outlined above to even boot - we
have no way of asking the firmware not to split the code/data
regions.
In fact, this patch doesn't even make use of the more strict
memory protections available in UEFI v2.5. That will come later.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chun-Yi <jlee@suse.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d2922422c4 ]
The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).
The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug. This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.
Cc: stable@vger.kernel.org
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fc57a7c680 ]
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):
ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6>
2792: R_X86_64_PC32 pv_irq_ops+0x2c
That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse. This is bad news for a CLBR_NONE operation.
Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in? This can potentially cause breakage
that is very difficult to debug.
A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.
The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.
Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.
The Xen case may have other problems, so document them.
This is part of a fix for some random crashes that Sasha saw.
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5d7c631d92 ]
The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.
The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.
xAPIC:
"1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
x2APIC:
"To allow for efficient access to the APIC registers in x2APIC mode,
the serializing semantics of WRMSR are relaxed when writing to the
APIC registers. Thus, system software should not use 'WRMSR to APIC
registers in x2APIC mode' as a serializing instruction. Read and write
accesses to the APIC registers will occur in program order. A WRMSR to
an APIC register may complete before all preceding stores are globally
visible; software can prevent this by inserting a serializing
instruction, an SFENCE, or an MFENCE before the WRMSR."
The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.
Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.
[ tglx: Massaged the changelog ]
Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org #v3.7+
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6bea0f6d1c ]
In case we have less than maximum allowed channels (8) and autoconfiguration is
enabled the DWC_PARAMS read is wrong because it uses different arithmetic to
what is needed for channel priority setup.
Re-do the caclulations properly. This now works on AVR32 board well.
Fixes: fed2574b3c (dw_dmac: introduce software emulation of LLP transfers)
Cc: yitian.bu@tangramtek.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0af8221108 ]
This fixes a duplicated pin control causing this error:
imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_GPIO_1 already
requested by regulators:regulator@2; cannot claim for 2184000.usb
imx6q-pinctrl 20e0000.iomuxc: pin-137 (2184000.usb) status -22
imx6q-pinctrl 20e0000.iomuxc: could not request pin 137
(MX6Q_PAD_GPIO_1) from group usbotggrp on device 20e0000.iomuxc
imx_usb 2184000.usb: Error applying setting, reverse things
back
imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_EIM_D31 already
requested by regulators:regulator@1; cannot claim for 2184200.usb
imx6q-pinctrl 20e0000.iomuxc: pin-52 (2184200.usb) status -22
imx6q-pinctrl 20e0000.iomuxc: could not request pin 52 (MX6Q_PAD_EIM_D31)
from group usbh1grp on device 20e0000.iomuxc
imx_usb 2184200.usb: Error applying setting, reverse things
back
Signed-off-by: Felipe F. Tonello <eu@felipetonello.com>
Fixes: e2047e33f2 ("ARM: dts: add initial Rex Pro board support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3a2fa775bd ]
Let's fix pinmux address of gpio 170 used by tfp410 powerdown-gpio.
According to the OMAP35x Technical Reference Manual
CONTROL_PADCONF_I2C3_SDA[15:0] 0x480021C4 mode0: i2c3_sda
CONTROL_PADCONF_I2C3_SDA[31:16] 0x480021C4 mode4: gpio_170
the pinmux address of gpio 170 must be 0x480021C6.
The former wrong address broke i2c3 (used by hdmi ddc), resulting in
kernel message:
omap_i2c 48060000.i2c: controller timed out
Fixes: 8cecf52bef ("ARM: omap3-beagle.dts: add display information")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Carl Frederik Werner <frederik@cfbw.eu>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1dbdad7507 ]
The i2c5 pinctrl offsets are wrong. If the bootloader doesn't set the
pins up, communication with tca6424a doesn't work (controller timeouts)
and it is not possible to enable HDMI.
Fixes: 9be495c426 ("ARM: dts: omap5-evm: Add I2c pinctrl data")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fe2b592173 ]
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.
Fixes: 75722d3992 ("[PATCH] ppc64: Thermal control for SMU based machines")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a077224fd3 ]
While working on the 32-bit ARM port of UEFI, I noticed a strange
corruption in the kernel log. The following snprintf() statement
(in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
was producing the following output in the log:
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | | | | |UC]
|RUN| | | | | | | |UC]
As it turns out, this is caused by incorrect code being emitted for
the string() function in lib/vsprintf.c. The following code
if (!(spec.flags & LEFT)) {
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
}
for (i = 0; i < len; ++i) {
if (buf < end)
*buf = *s;
++buf; ++s;
}
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
when called with len == 0, triggers an issue in the GCC SRA optimization
pass (Scalar Replacement of Aggregates), which handles promotion of signed
struct members incorrectly. This is a known but as yet unresolved issue.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
case, it is causing the second while loop to be executed erroneously a
single time, causing the additional space characters to be printed.
So disable the optimization by passing -fno-ipa-sra.
Cc: <stable@vger.kernel.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9b55613f42 ]
When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
IT state when entering a signal handler. This can cause the first
few instructions to be conditionally executed depending on the parent
context.
In any case, the original test for >= ARMv7 is broken - ARMv6 can have
Thumb-2 support as well, and an ARMv6T2 specific build would omit this
code too.
Relax the test back to ARMv6 or greater. This results in us always
clearing the IT state bits in the PSR, even on CPUs where these bits
are reserved. However, they're reserved for the IT state, so this
should cause no harm.
Cc: <stable@vger.kernel.org>
Fixes: d71e1352e2 ("Clear the IT state when invoking a Thumb-2 signal handler")
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 00cc163381 ]
Commit 2ee507c472 ("sched: Add function single_task_running to let a task
check if it is the only task running on a cpu") referenced the current
runqueue with the smp_processor_id. When CONFIG_DEBUG_PREEMPT is enabled,
that is only allowed if preemption is disabled or the currrent task is
bound to the local cpu (e.g. kernel worker).
With commit f781951299 ("kvm: add halt_poll_ns module parameter") KVM
calls single_task_running. If CONFIG_DEBUG_PREEMPT is enabled that
generates a lot of kernel messages.
To avoid adding preemption in that cases, as it would limit the usefulness,
we change single_task_running to access directly the cpu local runqueue.
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Fixes: 2ee507c472
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0919e44451 ]
Commit f2147de334 ("watchdog: sunxi: support parameterized compatible
strings") introduced a regression in sunxi_wdt_start(), by which
the system reset function of the watchdog is not enabled upon
starting the watchdog. As a result, the system is not reset when the
watchdog expires. Fix it.
Fixes: f2147de334 ("watchdog: sunxi: support parameterized compatible strings")
Signed-off-by: Francesco Lavra <francescolavra.fl@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 601083cffb ]
print_aggr() fails to print per-core/per-socket statistics after commit
582ec0829b ("perf stat: Fix per-socket output bug for uncore events")
if events have differnt cpus. Because in print_aggr(), aggr_get_id needs
index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be
used to get aggregated id.
Here is an example:
Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has
cpumask 0,18)
$ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2
Without this patch, it failes to get CPU 18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 7526851 cycles
S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/
S1-C0 0 <not counted> cycles
S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/
With this patch, it can get both CPU0 and CPU18 result.
Performance counter stats for 'CPU(s) 0,18':
S0-C0 1 6327768 cycles
S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/
S1-C0 1 330228 cycles
S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/
Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 582ec0829b ("perf stat: Fix per-socket output bug for uncore events")
Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 31191a85fb ]
In some cases it's useful to characterize samples by file. This is
useful to get a higher level categorization, for example to map cost to
subsystems.
Add a srcfile sort key to perf report. It builds on top of the existing
srcline support.
Commiter notes:
E.g.:
# perf record -F 10000 usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (13 samples) ]
[root@zoo ~]# perf report -s srcfile --stdio
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 869878
#
# Overhead Source File
# ........ ...........
60.99% .
20.62% paravirt.h
14.23% rmap.c
4.04% signal.c
0.11% msr.h
#
The first line is collecting all the files for which srcfiles couldn't somehow
get resolved to:
# perf report -s srcfile,dso --stdio
# Total Lost Samples: 0
#
# Samples: 13 of event 'cycles'
# Event count (approx.): 869878
#
# Overhead Source File Shared Object
# ........ ........... ................
40.97% . ld-2.20.so
20.62% paravirt.h [kernel.vmlinux]
20.02% . libc-2.20.so
14.23% rmap.c [kernel.vmlinux]
4.04% signal.c [kernel.vmlinux]
0.11% msr.h [kernel.vmlinux]
#
XXX: Investigate why that is not resolving on Fedora 21, Andi says he hasn't
seen this on Fedora 22.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1438988064-21834-1-git-send-email-andi@firstfloor.org
[ Added column length update, from 0e65bdb3f90f ('perf hists: Update the column width for the "srcline" sort key') ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b5cabbcbd1 ]
A copy of /proc/kcore containing the kernel text can be made to the
buildid cache. e.g.
perf buildid-cache -v -k /proc/kcore
To workaround objdump limitations, a copy is also made when annotating
against /proc/kcore.
The copying process stops working from libelf about v1.62 onwards (the
problem was found with v1.63).
The cause is that a call to gelf_getphdr() in kcore__add_phdr() fails
because additional validation has been added to gelf_getphdr().
The use of gelf_getphdr() is a misguided attempt to get default
initialization of the Gelf_Phdr structure. That should not be
necessary because every member of the Gelf_Phdr structure is
subsequently assigned. So just remove the call to gelf_getphdr().
Similarly, a call to gelf_getehdr() in gelf_kcore__init() can be
removed also.
Committer notes:
Note to stable@kernel.org, from Adrian in the cover letter for this
patchkit:
The "Fix copying of /proc/kcore" problem goes back to v3.13 if you think
it is important enough for stable.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1443089122-19082-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a4c15cd957 ]
As documented in iscsit_sequence_cmd:
/*
* Existing callers for iscsit_sequence_cmd() will silently
* ignore commands with CMDSN_LOWER_THAN_EXP, so force this
* return for CMDSN_MAXCMDSN_OVERRUN as well..
*/
We need to silently finish a command when it's in ISTATE_REMOVE.
This fixes an teardown hang we were seeing where a mis-behaved
initiator (triggered by allocation error injections) sent us a
cmdsn which was lower than expected.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 537b604c8b ]
b9d5c6b7ef ("[SCSI] cleanup setting task state in
scsi_error_handler()") has introduced a race between scsi_error_handler
and scsi_host_dev_release resulting in the hang when the device goes
away because scsi_error_handler might miss a wake up:
CPU0 CPU1
scsi_error_handler scsi_host_dev_release
kthread_stop()
kthread_should_stop()
test_bit(KTHREAD_SHOULD_STOP)
set_bit(KTHREAD_SHOULD_STOP)
wake_up_process()
wait_for_completion()
set_current_state(TASK_INTERRUPTIBLE)
schedule()
The most straightforward solution seems to be to invert the ordering of
the set_current_state and kthread_should_stop.
The issue has been noticed during reboot test on a 3.0 based kernel but
the current code seems to be affected in the same way.
[jejb: additional comment added]
Cc: <stable@vger.kernel.org> # 3.6+
Reported-and-debugged-by: Mike Mayer <Mike.Meyer@teradata.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 76c28f1fcf ]
Revert commit 1997e6259, which causes double brackets on ipv6
inaddr_any addresses.
Since we have np_sockaddr, if we need a textual representation we can
use "%pISc".
Change iscsit_add_network_portal() and iscsit_add_np() signatures to remove
*ip_str parameter.
Fix and extend some comments earlier in the function.
Tested to work for :: and ::1 via iscsiadm, previously :: failed, see
https://bugzilla.redhat.com/show_bug.cgi?id=1249107 .
CC: stable@vger.kernel.org
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2619d7e9c9 ]
The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.
Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:
dc491596f6 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.
Per include/linux/kernel.h:
"abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).
This patch fixes the issue by using abs64() instead.
Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Tested-by: Nuno Goncalves <nunojpg@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8f4216c7d2 ]
Currently, if we had a zero length mmio eventfd assigned on
KVM_MMIO_BUS. It will never be found by kvm_io_bus_cmp() since it
always compares the kvm_io_range() with the length that guest
wrote. This will cause e.g for vhost, kick will be trapped by qemu
userspace instead of vhost. Fixing this by using zero length if an
iodevice is zero length.
Cc: stable@vger.kernel.org
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ca09f02f12 ]
A critical bug has been found in device memory stage1 translation for
VMs with more then 4GB of address space. Once vm_pgoff size is smaller
then pa (which is true for LPAE case, u32 and u64 respectively) some
more significant bits of pa may be lost as a shift operation is performed
on u32 and later cast onto u64.
Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12
expected pa(u64): 0x0000002010030000
produced pa(u64): 0x0000000010030000
The fix is to change the order of operations (casting first onto phys_addr_t
and then shifting).
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
[maz: fixed changelog and patch formatting]
Cc: stable@vger.kernel.org
Signed-off-by: Marek Majtyka <marek.majtyka@tieto.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8a1513b493 ]
Do not write initialize magic on systems that do not have
feature query 0xb. Fixes Bug #82451.
Redefine FEATURE_QUERY to align with 0xb and FEATURE2 with 0xd
for code clearity.
Add a new test function, hp_wmi_bios_2008_later() & simplify
hp_wmi_bios_2009_later(), which fixes a bug in cases where
an improper value is returned. Probably also fixes Bug #69131.
Add missing __init tag.
Signed-off-by: Kyle Evans <kvans32@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3aaf14da80 ]
zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.
While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.
Fixes: beca3ec71f ("zram: add multi stream functionality")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4cba5c2103 ]
Currently the PHY management type is selected by the MAC driver arbitrary.
The decision is based on the presence of the "fixed-link" node and on a
will of the driver's authors.
This caused a regression recently, when mvneta driver suddenly started
to use the in-band status for auto-negotiation on fixed links.
It appears the auto-negotiation may not work when expected by the MAC driver.
Sebastien Rannou explains:
<< Yes, I confirm that my HW does not generate an in-band status. AFAIK, it's
a PHY that aggregates 4xSGMIIs to 1xQSGMII ; the MAC side of the PHY (with
inband status) is connected to the switch through QSGMII, and in this context
we are on the media side of the PHY. >>
https://lkml.org/lkml/2015/7/10/206
This patch introduces the new string property 'managed' that allows
the user to set the management type explicitly.
The supported values are:
"auto" - default. Uses either MDIO or nothing, depending on the presence
of the fixed-link node
"in-band-status" - use in-band status
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
CC: Rob Herring <robh+dt@kernel.org>
CC: Pawel Moll <pawel.moll@arm.com>
CC: Mark Rutland <mark.rutland@arm.com>
CC: Ian Campbell <ijc+devicetree@hellion.org.uk>
CC: Kumar Gala <galak@codeaurora.org>
CC: Florian Fainelli <f.fainelli@gmail.com>
CC: Grant Likely <grant.likely@linaro.org>
CC: devicetree@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d2eac98f7d ]
The SF2 driver currently overrides speed settings for its port
configured using a fixed PHY, this is both unnecessary and incorrect,
because we keep feedback to the hardware parameters that we read from
the PHY device, which in the case of a fixed PHY cannot possibly change
speed.
This is a required change to allow the fixed PHY code to allow
registering a PHY with a link configured as DOWN by default and avoid
some sort of circular dependency where we require the link_update
callback to run to program the hardware, and we then utilize the fixed
PHY parameters to program the hardware with the same settings.
Fixes: 246d7f773c ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 675ee231d9 ]
RST packets sent on behalf of TCP connections with TS option (RFC 7323
TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
ecr 0], length 0
B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
1460,nop,nop,TS val 7264344 ecr 100], length 0
A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
7264344], length 0
B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
ecr 110], length 0
We need to call skb_mstamp_get() to get proper TS val,
derived from skb->skb_mstamp
Note that RFC 1323 was advocating to not send TS option in RST segment,
but RFC 7323 recommends the opposite :
Once TSopt has been successfully negotiated, that is both <SYN> and
<SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
segment for the duration of the connection, and SHOULD be sent in an
<RST> segment (see Section 5.2 for details)
Note this RFC recommends to send TS val = 0, but we believe it is
premature : We do not know if all TCP stacks are properly
handling the receive side :
When an <RST> segment is
received, it MUST NOT be subjected to the PAWS check by verifying an
acceptable value in SEG.TSval, and information from the Timestamps
option MUST NOT be used to update connection state information.
SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
to decide to send TS val = 0, if it buys something.
Fixes: 7faee5c0d5 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 03679a1473 ]
The macro to write 64-bits quantities to the 32-bits register swapped
the value and offsets arguments, we want to preserve the ordering of the
arguments with respect to how writel() is implemented for instance:
value first, offset/base second.
Fixes: 246d7f773c ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4548a697e4 ]
tse_poll() calls __napi_complete() with irq enabled. This leads napi
poll_list corruption and may stop all napi drivers working.
Use napi_complete() instead of __napi_complete().
Signed-off-by: Atsushi Nemoto <nemoto@toshiba-tops.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c642dc9e1a ]
At some point along this sequence of changes:
f6e63f9 ext4: fold ext4_nojournal_sops into ext4_sops
bb04457 ext4: support freezing ext2 (nojournal) file systems
9ca9238 ext4: Use separate super_operations structure for no_journal filesystems
ext4 started setting needs_recovery on filesystems without journals
when they are unfrozen. This makes no sense, and in fact confuses
blkid to the point where it doesn't recognize the filesystem at all.
(freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs,
see needs_recovery set on fs w/ no journal).
To fix this, don't manipulate the INCOMPAT_RECOVER feature on
filesystems without journals.
Reported-by: Stu Mark <smark@datto.com>
Reviewed-by: Jan Kara <jack@suse.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2925c2fdf1 ]
Currently the first thing we do in cxl_probe is to grab a reference
on the pci device. Later on, we call device_register on our adapter.
In our remove path, we call device_unregister, but we never call
pci_dev_put. We therefore leak the device every time we do a
reflash.
device_register/unregister is sufficient to hold the reference.
Therefore, drop the call to pci_dev_get.
Here's why this is safe.
The proposed cxl_probe(pdev) calls cxl_adapter_init:
a) init calls cxl_adapter_alloc, which creates a struct cxl,
conventionally called adapter. This struct contains a
device entry, adapter->dev.
b) init calls cxl_configure_adapter, where we set
adapter->dev.parent = &dev->dev (here dev is the pci dev)
So at this point, the cxl adapter's device's parent is the PCI
device that I want to be refcounted properly.
c) init calls cxl_register_adapter
*) cxl_register_adapter calls device_register(&adapter->dev)
So now we're in device_register, where dev is the adapter device, and
we want to know if the PCI device is safe after we return.
device_register(&adapter->dev) calls device_initialize() and then
device_add().
device_add() does a get_device(). device_add() also explicitly grabs
the device's parent, and calls get_device() on it:
parent = get_device(dev->parent);
So therefore, device_register() takes a lock on the parent PCI dev,
which is what pci_dev_get() was guarding. pci_dev_get() can therefore
be safely removed.
Fixes: f204e0b8ce ("cxl: Driver code for powernv PCIe based cards for userspace access")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 72ddef0506 ]
When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is
set if adapter->rss_queues exceeds half of max_rss_queues in
igb_init_queue_configuration().
On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of
queues exceeds half of max_combined in igb_set_channels() when changing
the number of queues by "ethtool -L".
In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size
of adapter->msix_entries[], an overflow can occur in
igb_set_interrupt_capability(), which in turn leads to an oops.
Fix this problem as follows:
- When changing the number of queues by "ethtool -L", set
IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver.
- When increasing the size of q_vector, reallocate it appropriately.
(With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.)
Another possible way to fix this problem is to cap the queues at its
initial number, which is the number of the initial online cpus. But this
is not the optimal way because we cannot increase queues when another
cpu becomes online.
Note that before commit cd14ef54d2 ("igb: Change to use statically
allocated array for MSIx entries"), this problem did not cause oops
but just made the number of queues become 1 because of entering msi_only
mode in igb_set_interrupt_capability().
Fixes: 907b783579 ("igb: Add ethtool support to configure number of channels")
CC: stable <stable@vger.kernel.org>
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 810bc075f7 ]
We have a tricky bug in the nested NMI code: if we see RSP
pointing to the NMI stack on NMI entry from kernel mode, we
assume that we are executing a nested NMI.
This isn't quite true. A malicious userspace program can point
RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
happen while RSP is still pointing at the NMI stack.
Fix it with a sneaky trick. Set DF in the region of code that
the RSP check is intended to detect. IRET will clear DF
atomically.
( Note: other than paravirt, there's little need for all this
complexity. We could check RIP instead of RSP. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a27507ca2d ]
Check the repeat_nmi .. end_repeat_nmi special case first. The
next patch will rework the RSP check and, as a side effect, the
RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
we'll need this ordering of the checks.
Note: this is more subtle than it appears. The check for
repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
instead of adjusting the "iret" frame to force a repeat. This
is necessary, because the code between repeat_nmi and
end_repeat_nmi sets "NMI executing" and then writes to the
"iret" frame itself. If a nested NMI comes in and modifies the
"iret" frame while repeat_nmi is also modifying it, we'll end up
with garbage. The old code got this right, as does the new
code, but the new code is a bit more explicit.
If we were to move the check right after the "NMI executing"
check, then we'd get it wrong and have random crashes.
( Because the "NMI executing" check would jump to the code that would
modify the "iret" frame without checking if the interrupted NMI was
currently modifying it. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ade4dc3e61 ]
The commit "e29aa33 bna: Enable Multi Buffer RX" moved packets counter
increment from the beginning of the NAPI processing loop after the check
for erroneous packets so they are never accounted. This counter is used
to inform firmware about number of processed completions (packets).
As these packets are never acked the firmware fires IRQs for them again
and again.
Fixes: e29aa33 ("bna: Enable Multi Buffer RX")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 10e2eb878f ]
Multicast dst are not cached. They carry DST_NOCACHE.
As mentioned in commit f886497212 ("ipv4: fix dst race in
sk_dst_get()"), these dst need special care before caching them
into a socket.
Caching them is allowed only if their refcnt was not 0, ie we
must use atomic_inc_not_zero()
Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned
in commit d0c294c53a ("tcp: prevent fetching dst twice in early demux
code")
Fixes: 421b3885bf ("udp: ipv4: Add udp early demux")
Tested-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz>
Reported-by: Alex Gartrell <agartrell@fb.com>
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 158cd4af8d ]
When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}. However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased. Ultimately, this
caused the deletion of the interface to fail with the following message:
unregister_netdevice: waiting for dummy0 to become free. Usage count = 1
This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.
Fixes: 902fefb82e ('packet: improve socket create/bind latency in some cases')
Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 41fc014332 ]
dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.
This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ae5f2fb1d5 ]
When support for megaflows was introduced, OVS needed to start
installing flows with a mask applied to them. Since masking is an
expensive operation, OVS also had an optimization that would only
take the parts of the flow keys that were covered by a non-zero
mask. The values stored in the remaining pieces should not matter
because they are masked out.
While this works fine for the purposes of matching (which must always
look at the mask), serialization to netlink can be problematic. Since
the flow and the mask are serialized separately, the uninitialized
portions of the flow can be encoded with whatever values happen to be
present.
In terms of functionality, this has little effect since these fields
will be masked out by definition. However, it leaks kernel memory to
userspace, which is a potential security vulnerability. It is also
possible that other code paths could look at the masked key and get
uninitialized data, although this does not currently appear to be an
issue in practice.
This removes the mask optimization for flows that are being installed.
This was always intended to be the case as the mask optimizations were
really targetting per-packet flow operations.
Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e2d61e0ae ]
Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.
During initialization, sctp will add the new protocol type and then
initialize pernet subsys:
status = sctp_v4_protosw_init();
if (status)
goto err_protosw_init;
status = sctp_v6_protosw_init();
if (status)
goto err_v6_protosw_init;
status = register_pernet_subsys(&sctp_net_ops);
The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.
The race happens like this:
CPU 0 | CPU 1
socket() |
__sock_create | socket()
inet_create | __sock_create
list_for_each_entry_rcu( |
answer, &inetsw[sock->type], |
list) { | inet_create
/* no hits */ |
if (unlikely(err)) { |
... |
request_module() |
/* socket creation is blocked |
* the module is fully loaded |
*/ |
sctp_init |
sctp_v4_protosw_init |
inet_register_protosw |
list_add_rcu(&p->list, |
last_perm); |
| list_for_each_entry_rcu(
| answer, &inetsw[sock->type],
sctp_v6_protosw_init | list) {
| /* hit, so assumes protocol
| * is already loaded
| */
| /* socket creation continues
| * before netns is initialized
| */
register_pernet_subsys |
Simply inverting the initialization order between
register_pernet_subsys() and sctp_v4_protosw_init() is not possible
because register_pernet_subsys() will create a control sctp socket, so
the protocol must be already visible by then. Deferring the socket
creation to a work-queue is not good specially because we loose the
ability to handle its errors.
So, as suggested by Vlad, the fix is to split netns initialization in
two moments: defaults and control socket, so that the defaults are
already loaded by when we register the protocol, while control socket
initialization is kept at the same moment it is today.
Fixes: 4db67e8086 ("sctp: Make the address lists per network namespace")
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1853c94964 ]
Ken-ichirou reported that running netlink in mmap mode for receive in
combination with nlmon will throw a NULL pointer dereference in
__kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable
to handle kernel paging request". The problem is the skb_clone() in
__netlink_deliver_tap_skb() for skbs that are mmaped.
I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink
skb has it pointed to netlink_skb_destructor(), set in the handler
netlink_ring_setup_skb(). There, skb->head is being set to NULL, so
that in such cases, __kfree_skb() doesn't perform a skb_release_data()
via skb_release_all(), where skb->head is possibly being freed through
kfree(head) into slab allocator, although netlink mmap skb->head points
to the mmap buffer. Similarly, the same has to be done also for large
netlink skbs where the data area is vmalloced. Therefore, as discussed,
make a copy for these rather rare cases for now. This fixes the issue
on my and Ken-ichirou's test-cases.
Reference: http://thread.gmane.org/gmane.linux.network/371129
Fixes: bcbde0d449 ("net: netlink: virtual tap device management")
Reported-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 25b4a44c19 ]
In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.
This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.
Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e41b0bedba ]
We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.
Fixes: 3336288a9f ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f50791ac1a ]
It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.
The problem was spotted and the fix was provided by
Oliver Neukum <oneukum@suse.de>.
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d4257295ba ]
When a tunnel is deleted, the cached dst entry should be released.
This problem may prevent the removal of a netns (seen with a x-netns IPv6
gre tunnel):
unregister_netdevice: waiting for lo to become free. Usage count = 3
CC: Dmitry Kozlov <xeb@mail.ru>
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4f7d2cdfdd ]
Jason Gunthorpe reported that since commit c02db8c629 ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].
Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.
Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.
Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).
Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c629 ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 01a3d79681 ]
Add configuration setting for drivers to allow/block an RSS Redirection
Table and a Hash Key querying for discrete VFs.
On some devices VF share the mentioned above information with PF and
querying it may adduce a theoretical security risk. We want to let a
system administrator to decide if he/she wants to take this risk or not.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7cb74be6fd ]
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]
The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.
The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887https://bugzilla.kernel.org/show_bug.cgi?id=42342https://bugzilla.kernel.org/show_bug.cgi?id=63841https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit a5e3985fa0
Author: Roman Zippel <zippel@linux-m68k.org>
Date: Tue Sep 6 15:18:47 2005 -0700
[PATCH] hfs: remove debug code
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e99b139f1 ]
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
Fixes: fa417f7b52 ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2b135db3e8 ]
The pkey mapping for RoCE must remain the default mapping:
VFs:
virtual index 0 = mapped to real index 0 (0xFFFF)
All others indices: mapped to a real pkey index containing an
invalid pkey.
PF:
virtual index i = real index i.
Don't allow users to change these mappings using files found in
sysfs.
Fixes: c1e7e46612 ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 35d4a0b63d ]
Fixes: 2a72f21226 ("IB/uverbs: Remove dev_table")
Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.
In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.
The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html
cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.
In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b632ffa7ce ]
We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure. Reject all those not explicitly handled so that we
can't pass invalid information to drivers.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d6f1c17e16 ]
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.
The underlying kernel code cannot allocate that many contiguous pages.
There is no reason the underlying memory needs to be physically
contiguous.
This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
o this matches the module parameter description
Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 299b0685e3 ]
'reshape_position' tracks where in the reshape we have reached.
'reshape_safe' tracks where in the reshape we have safely recorded
in the metadata.
These are compared to determine when to update the metadata.
So it is important that reshape_safe is initialised properly.
Currently it isn't. When starting a reshape from the beginning
it usually has the correct value by luck. But when reducing the
number of devices in a RAID10, it has the wrong value and this leads
to the metadata not being updated correctly.
This can lead to corruption if the reshape is not allowed to complete.
This patch is suitable for any -stable kernel which supports RAID10
reshape, which is 3.5 and later.
Fixes: 3ea7daa5d7 ("md/raid10: add reshape support")
Cc: stable@vger.kernel.org (v3.5+ please wait for -final to be out for 2 weeks)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71f8a4b81d ]
The following panic is captured in ker3.14, but the issue still exists
in latest kernel.
---------------------------------------------------------------------
[ 20.738217] c0 3136 (Compiler) Unable to handle kernel NULL pointer dereference
at virtual address 00000578
......
[ 20.738499] c0 3136 (Compiler) PC is at _raw_spin_lock_irqsave+0x24/0x60
[ 20.738527] c0 3136 (Compiler) LR is at _raw_spin_lock_irqsave+0x20/0x60
[ 20.740134] c0 3136 (Compiler) Call trace:
[ 20.740165] c0 3136 (Compiler) [<ffffffc0008ee900>] _raw_spin_lock_irqsave+0x24/0x60
[ 20.740200] c0 3136 (Compiler) [<ffffffc0000dd024>] __wake_up+0x1c/0x54
[ 20.740230] c0 3136 (Compiler) [<ffffffc000639414>] mmc_wait_data_done+0x28/0x34
[ 20.740262] c0 3136 (Compiler) [<ffffffc0006391a0>] mmc_request_done+0xa4/0x220
[ 20.740314] c0 3136 (Compiler) [<ffffffc000656894>] sdhci_tasklet_finish+0xac/0x264
[ 20.740352] c0 3136 (Compiler) [<ffffffc0000a2b58>] tasklet_action+0xa0/0x158
[ 20.740382] c0 3136 (Compiler) [<ffffffc0000a2078>] __do_softirq+0x10c/0x2e4
[ 20.740411] c0 3136 (Compiler) [<ffffffc0000a24bc>] irq_exit+0x8c/0xc0
[ 20.740439] c0 3136 (Compiler) [<ffffffc00008489c>] handle_IRQ+0x48/0xac
[ 20.740469] c0 3136 (Compiler) [<ffffffc000081428>] gic_handle_irq+0x38/0x7c
----------------------------------------------------------------------
Because in SMP, "mrq" has race condition between below two paths:
path1: CPU0: <tasklet context>
static void mmc_wait_data_done(struct mmc_request *mrq)
{
mrq->host->context_info.is_done_rcv = true;
//
// If CPU0 has just finished "is_done_rcv = true" in path1, and at
// this moment, IRQ or ICache line missing happens in CPU0.
// What happens in CPU1 (path2)?
//
// If the mmcqd thread in CPU1(path2) hasn't entered to sleep mode:
// path2 would have chance to break from wait_event_interruptible
// in mmc_wait_for_data_req_done and continue to run for next
// mmc_request (mmc_blk_rw_rq_prep).
//
// Within mmc_blk_rq_prep, mrq is cleared to 0.
// If below line still gets host from "mrq" as the result of
// compiler, the panic happens as we traced.
wake_up_interruptible(&mrq->host->context_info.wait);
}
path2: CPU1: <The mmcqd thread runs mmc_queue_thread>
static int mmc_wait_for_data_req_done(...
{
...
while (1) {
wait_event_interruptible(context_info->wait,
(context_info->is_done_rcv ||
context_info->is_new_req));
static void mmc_blk_rw_rq_prep(...
{
...
memset(brq, 0, sizeof(struct mmc_blk_request));
This issue happens very coincidentally; however adding mdelay(1) in
mmc_wait_data_done as below could duplicate it easily.
static void mmc_wait_data_done(struct mmc_request *mrq)
{
mrq->host->context_info.is_done_rcv = true;
+ mdelay(1);
wake_up_interruptible(&mrq->host->context_info.wait);
}
At runtime, IRQ or ICache line missing may just happen at the same place
of the mdelay(1).
This patch gets the mmc_context_info at the beginning of function, it can
avoid this race condition.
Signed-off-by: Jialing Fu <jlfu@marvell.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Fixes: 2220eedfd7 ("mmc: fix async request mechanism ....")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fbb1816942 ]
It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory. Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps. Fix
that issue by never writing coredumps into existing files.
Requirements for the attack:
- The attack only applies if the victim's process has a nonzero
RLIMIT_CORE and is dumpable.
- The attacker can trick the victim into coredumping into an
attacker-writable directory D, either because the core_pattern is
relative and the victim's cwd is attacker-writable or because an
absolute core_pattern pointing to a world-writable directory is used.
- The attacker has one of these:
A: on a system with protected_hardlinks=0:
execute access to a folder containing a victim-owned,
attacker-readable file on the same partition as D, and the
victim-owned file will be deleted before the main part of the attack
takes place. (In practice, there are lots of files that fulfill
this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
This does not apply to most Linux systems because most distros set
protected_hardlinks=1.
B: on a system with protected_hardlinks=1:
execute access to a folder containing a victim-owned,
attacker-readable and attacker-writable file on the same partition
as D, and the victim-owned file will be deleted before the main part
of the attack takes place.
(This seems to be uncommon.)
C: on any system, independent of protected_hardlinks:
write access to a non-sticky folder containing a victim-owned,
attacker-readable file on the same partition as D
(This seems to be uncommon.)
The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core. The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.
If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).
A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.
On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps. (This could theoretically lead to a
privilege escalation.)
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c54839a722 ]
reclaim_clean_pages_from_list() assumes that shrink_page_list() returns
number of pages removed from the candidate list. But shrink_page_list()
puts back mlocked pages without passing it to caller and without
counting as nr_reclaimed. This increases nr_isolated.
To fix this, this patch changes shrink_page_list() to pass unevictable
pages back to caller. Caller will take care those pages.
Minchan said:
It fixes two issues.
1. With unevictable page, cma_alloc will be successful.
Exactly speaking, cma_alloc of current kernel will fail due to
unevictable pages.
2. fix leaking of NR_ISOLATED counter of vmstat
With it, too_many_isolated works. Otherwise, it could make hang until
the process get SIGKILL.
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b1b4e435e4 ]
When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.
So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).
This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.
This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware. But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.
For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1b59ddfcf1 ]
The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed. This fixes the 64-bit LWS CAS
operation on 64-bit kernels.
I can now enable 64-bit atomic support in GCC.
Cc: <stable@vger.kernel.org>
Signed-off-by: John David Anglin <dave.anglin>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e9ae58aeee ]
We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit efcbc04e16 ]
It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.
[pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.
When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).
If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.
So only treat O_EXCL specially if O_CREAT was also given.
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1f9b8c8fbc ]
While we are committing a transaction, it's possible the previous one is
still finishing its commit and therefore we wait for it to finish first.
However we were not checking if that previous transaction ended up getting
aborted after we waited for it to commit, so we ended up committing the
current transaction which can lead to fs corruption because the new
superblock can point to trees that have had one or more nodes/leafs that
were never durably persisted.
The following sequence diagram exemplifies how this is possible:
CPU 0 CPU 1
transaction N starts
(...)
btrfs_commit_transaction(N)
cur_trans->state = TRANS_STATE_COMMIT_START;
(...)
cur_trans->state = TRANS_STATE_COMMIT_DOING;
(...)
cur_trans->state = TRANS_STATE_UNBLOCKED;
root->fs_info->running_transaction = NULL;
btrfs_start_transaction()
--> starts transaction N + 1
btrfs_write_and_wait_transaction(trans, root);
--> starts writing all new or COWed ebs created
at transaction N
creates some new ebs, COWs some
existing ebs but doesn't COW or
deletes eb X
btrfs_commit_transaction(N + 1)
(...)
cur_trans->state = TRANS_STATE_COMMIT_START;
(...)
wait_for_commit(root, prev_trans);
--> prev_trans == transaction N
btrfs_write_and_wait_transaction() continues
writing ebs
--> fails writing eb X, we abort transaction N
and set bit BTRFS_FS_STATE_ERROR on
fs_info->fs_state, so no new transactions
can start after setting that bit
cleanup_transaction()
btrfs_cleanup_one_transaction()
wakes up task at CPU 1
continues, doesn't abort because
cur_trans->aborted (transaction N + 1)
is zero, and no checks for bit
BTRFS_FS_STATE_ERROR in fs_info->fs_state
are made
btrfs_write_and_wait_transaction(trans, root);
--> succeeds, no errors during writeback
write_ctree_super(trans, root, 0);
--> succeeds
--> we have now a superblock that points us
to some root that uses eb X, which was
never written to disk
In this scenario future attempts to read eb X from disk results in an
error message like "parent transid verify failed on X wanted Y found Z".
So fix this by aborting the current transaction if after waiting for the
previous transaction we verify that it was aborted.
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9d39f05490 ]
Commit 813f5c0ac5 ("media: Change media device link_notify behaviour")
modified the media controller link setup notification API and updated the
OMAP3 ISP driver accordingly. As a side effect it introduced a bug by
turning power on after setting the link instead of before. This results in
sub-devices not being powered down in some cases when they should be. Fix
it.
Fixes: 813f5c0ac5 [media] media: Change media device link_notify behaviour
Signed-off-by: Sakari Ailus <sakari.ailus@iki.fi>
Cc: stable@vger.kernel.org # since v3.10
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a66b0c41ad ]
The input_dev is already gone when the rc device is being unregistered
so checking for its presence only means that no remove uevent will be
generated.
Cc: stable@kernel.org
Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09bfda10e6 ]
With the radeon driver loaded the HP Compaq dc5750
Small Form Factor machine fails to resume from suspend.
Adding a quirk similar to other devices avoids
the problem and the system resumes properly.
Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4c17a6d56b ]
This might lead to local privilege escalation (code execution as
kernel) for systems where the following conditions are met:
- CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled
- a cifs filesystem is mounted where:
- the mount option "vers" was used and set to a value >=2.0
- the attacker has write access to at least one file on the filesystem
To attack this, an attacker would have to guess the target_tcon
pointer (but guessing wrong doesn't cause a crash, it just returns an
error code) and win a narrow race.
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 36b35d5d80 ]
If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.
Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.
Fixes: 6d492ecc64 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c2cb59444 ]
The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:
BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
Call Trace:
[c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
[c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
[c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
[c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
[c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
[c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
[c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
[c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
[c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
[c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
[c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
[c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
[c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.
The EPOW sensor is defined to be "fast" in sPAPR - mpe.
Fixes: 587f83e8dd ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 74b5037baa ]
The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.
However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.
This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.
In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.
This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.
However in commit 7aa0727f33 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.
That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.
This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.
Fixes: 7aa0727f33 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eb38f3a4f6 ]
We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices. close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV. logind was overreacting to it and decided to kill
itself when an unexpected error code was received. What a tragedy.
The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().
Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.
Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c4cbba9fa0 ]
When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.
The fix is to unconditionally turn off the virtual timer on guest
exit.
Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit df057cc7b4 ]
Cortex-A53 processors <= r0p4 are affected by erratum #843419 which can
lead to a memory access using an incorrect address in certain sequences
headed by an ADRP instruction.
There is a linker fix to generate veneers for ADRP instructions, but
this doesn't work for kernel modules which are built as unlinked ELF
objects.
This patch adds a new config option for the erratum which, when enabled,
builds kernel modules with the mcmodel=large flag. This uses absolute
addressing for all kernel symbols, thereby removing the use of ADRP as
a PC-relative form of addressing. The ADRP relocs are removed from the
module loader so that we fail to load any potentially affected modules.
Cc: <stable@vger.kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d10bcd4733 ]
When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.
This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.
Cc: <stable@vger.kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bdec97a855 ]
When saving/restoring the VFP registers from a compat (AArch32)
signal frame, we rely on the compat registers forming a prefix of the
native register file and therefore make use of copy_{to,from}_user to
transfer between the native fpsimd_state and the compat_vfp_sigframe.
Unfortunately, this doesn't work so well in a big-endian environment.
Our fpsimd save/restore code operates directly on 128-bit quantities
(Q registers) whereas the compat_vfp_sigframe represents the registers
as an array of 64-bit (D) registers. The architecture packs the compat D
registers into the Q registers, with the least significant bytes holding
the lower register. Consequently, we need to swap the 64-bit halves when
converting between these two representations on a big-endian machine.
This patch replaces the __copy_{to,from}_user invocations in our
compat VFP signal handling code with explicit __put_user loops that
operate on 64-bit values and swap them accordingly.
Cc: <stable@vger.kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3633ebebab ]
We already set a station to be associated when peering completes, both
in user space and in the kernel. Thus we should always have an
associated sta before sending data frames to that station.
Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized. This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.
Cc: stable@vger.kernel.org
Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d3d11fe08c ]
The temperature registers appear to report values in degrees Celsius
while the hwmon API mandates values to be exposed in millidegrees
Celsius. Do the conversion so that the values reported by "sensors"
are correct.
Fixes: aed93e0bf4 ("tg3: Add hwmon support for temperature")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: stable@vger.kernel.org [v3.6+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 12c641ab82 ]
In the logic in the initial commit of unshare made creating a new
thread group for a process, contingent upon creating a new memory
address space for that process. That is wrong. Two separate
processes in different thread groups can share a memory address space
and clone allows creation of such proceses.
This is significant because it was observed that mm_users > 1 does not
mean that a process is multi-threaded, as reading /proc/PID/maps
temporarily increments mm_users, which allows other processes to
(accidentally) interfere with unshare() calls.
Correct the check in check_unshare_flags() to test for
!thread_group_empty() for CLONE_THREAD, CLONE_SIGHAND, and CLONE_VM.
For sighand->count > 1 for CLONE_SIGHAND and CLONE_VM.
For !current_is_single_threaded instead of mm_users > 1 for CLONE_VM.
By using the correct checks in unshare this removes the possibility of
an accidental denial of service attack.
Additionally using the correct checks in unshare ensures that only an
explicit unshare(CLONE_VM) can possibly trigger the slow path of
current_is_single_threaded(). As an explict unshare(CLONE_VM) is
pointless it is not expected there are many applications that make
that call.
Cc: stable@vger.kernel.org
Fixes: b2e0d98705 userns: Implement unshare of the user namespace
Reported-by: Ricky Zhou <rickyz@chromium.org>
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 596f5aad2a ]
There may be lots of pending requests so that the buffer of PAGE_SIZE
can't hold them at all.
One typical example is scsi-mq, the queue depth(.can_queue) of
scsi_host and blk-mq is quite big but scsi_device's queue_depth
is a bit small(.cmd_per_lun), then it is quite easy to have lots
of pending requests in hw queue.
This patch fixes the following warning and the related memory
destruction.
[ 359.025101] fill_read_buffer: blk_mq_hw_sysfs_show+0x0/0x7d returned bad count^M
[ 359.055595] irq event stamp: 15537^M
[ 359.055606] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M
[ 359.055614] Dumping ftrace buffer:^M
[ 359.055660] (ftrace buffer empty)^M
[ 359.055672] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M
[ 359.055678] CPU: 4 PID: 21631 Comm: stress-ng-sysfs Not tainted 4.2.0-rc5-next-20150805 #434^M
[ 359.055679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M
[ 359.055682] task: ffff8802161cc000 ti: ffff88021b4a8000 task.ti: ffff88021b4a8000^M
[ 359.055693] RIP: 0010:[<ffffffff811541c5>] [<ffffffff811541c5>] __kmalloc+0xe8/0x152^M
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2c17d27c36 ]
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
CPU 1 CPU 2
skb->dev: no reference
process_backlog:__skb_dequeue
process_backlog:local_irq_enable
on_each_cpu for
flush_backlog => IPI(hardirq): flush_backlog
- packet not found in backlog
CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections
netdev_run_todo,
rcu_barrier: no
ongoing callbacks
__netif_receive_skb_core:rcu_read_lock
- too late
free dev
process packet for freed dev
Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 468b732b6f ]
"len" is a signed integer. We check that len is not negative, so it
goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.
I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.
Fixes: a8c879a7ee ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c1bf34951 ]
The port-change event processing in procedure mlx4_eq_int() uses "slave"
as the vf_oper array index. Since the value of "slave" is the PF function
index, the result is that the PF link state is used for deciding to
propagate the event for all the VFs. The VF link state should be used,
so the VF function index should be used here.
Fixes: 948e306d7d ('net/mlx4: Add VF link state support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0848f6428b ]
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f42 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a951bc1e6b ]
The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:
1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
bond0 has the same mac address with eth0, it is MAC1.
2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
eth1 is backup, eth1 has MAC2.
3) ifconfig eth0 down
eth1 became active slave, bond will swap MAC for eth0 and eth1,
so eth1 has MAC1, and eth0 has MAC2.
4) ifconfig eth1 down
there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
5) ifconfig eth0 up
the eth0 became active slave again, the bond set eth0 to MAC1.
Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.
This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 06f6d1094a ]
When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[ 908.963051] eql: event: 9
[ 908.963052] eql: IFF_SLAVE
[ 908.963054] eql: event: 2
[ 908.963056] eql: IFF_SLAVE
[ 908.963058] eql: event: 6
[ 908.963059] eql: IFF_SLAVE
[ 908.963110] bond0: Releasing active interface eql
[ 908.976168] bond0: Destroying bond bond0
[ 908.976266] bond0 (unregistering): Released all slaves
[ 908.984097] ------------[ cut here ]------------
[ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[ 908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]
[ 908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G W O
4.2.0-rc2+ #8
[ 908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 908.984172] 0000000000000000 ffffffff81732d41 ffffffff81525b34
ffff8800358dfda8
[ 908.984175] ffffffff8106c521 ffff88003595af78 ffff88003595af40
ffff88003e3a4280
[ 908.984178] ffffffffa058d040 0000000000000000 ffffffff8106c59a
ffffffff8172ebd0
[ 908.984181] Call Trace:
[ 908.984188] [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[ 908.984193] [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[ 908.984196] [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[ 908.984199] [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[ 908.984205] [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[ 908.984208] [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[ 908.984217] [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[ 908.984225] [<ffffffff8142f52d>] ?
unregister_pernet_operations+0x8d/0xd0
[ 908.984228] [<ffffffff8142f58d>] ?
unregister_pernet_subsys+0x1d/0x30
[ 908.984232] [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
[ 908.984236] [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[ 908.984241] [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[ 908.984244] [<ffffffff8152b732>] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[ 908.984247] ---[ end trace 7c006ed4abbef24b ]---
Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f939981492 ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.
Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: a64d49c3dd ("bonding: Manage /proc/net/bonding/ entries from
the netdev events")
Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 03645a11a5 ]
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.
This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fd98e9419d ]
Commit 79901317ce ("n_tty: Don't flush buffer when closing ldisc"),
first merged in kernel release 3.10, caused the following regression
in the Gigaset M101 driver:
Before that commit, when closing the N_TTY line discipline in
preparation to switching to N_GIGASET_M101, receive_room would be
reset to a non-zero value by the call to n_tty_flush_buffer() in
n_tty's close method. With the removal of that call, receive_room
might be left at zero, blocking data reception on the serial line.
The present patch fixes that regression by setting receive_room
to an appropriate value in the ldisc open method.
Fixes: 79901317ce ("n_tty: Don't flush buffer when closing ldisc")
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5ebc784625 ]
Since the mdb add/del code was introduced there have been 2 br_mdb_notify
calls when doing br_mdb_add() resulting in 2 notifications on each add.
Example:
Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
Before patch:
root@debian:~# bridge monitor all
[MDB]dev br0 port eth1 grp 239.0.0.1 permanent
[MDB]dev br0 port eth1 grp 239.0.0.1 permanent
After patch:
root@debian:~# bridge monitor all
[MDB]dev br0 port eth1 grp 239.0.0.1 permanent
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: cfd5675435 ("bridge: add support of adding and deleting mdb entries")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 89c22d8c3b ]
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.
This is in fact bogus for the MSG_PEEK case as this is done without
any locking. So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.
This patch fixes this by only storing the result if the skb is not
shared. This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e9e4dd3267 ]
commit 381c759d99 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.
Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.
Thanks to Eric W. Biederman for the test script and his
valuable feedback!
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f1158b74e5 ]
Since commit b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
there's a check in br_ip_equal() for a matching vlan id, but the mdb
functions were not modified to use (or at least zero it) so when an
entry was added it would have a garbage vlan id (from the local br_ip
variable in __br_mdb_add/del) and this would prevent it from being
matched and also deleted. So zero out the whole local ip var to protect
ourselves from future changes and also to fix the current bug, since
there's no vlan id support in the mdb uapi - use always vlan id 0.
Example before patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
RTNETLINK answers: Invalid argument
After patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fdd75ea8df ]
Calling connect() with an AF_TIPC socket would trigger a series
of error messages from SELinux along the lines of:
SELinux: Invalid class 0
type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> }
for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
permissive=0
This was due to a failure to initialize the security state of the new
connection sock by the tipc code, leaving it with junk in the security
class field and an unlabeled secid. Add a call to security_sk_clone()
to inherit the security state from the parent socket.
Reported-by: Tim Shearer <tim.shearer@overturenetworks.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fc24f2b209 ]
Frag needed should be sent only if the inner header asked
to not fragment. Currently fragmentation is broken if the
tunnel has df set, but df was not asked in the original
packet. The tunnel's df needs to be still checked to update
internally the pmtu cache.
Commit 23a3647bc4 broke it, and this commit fixes
the ipv4 df check back to the way it was.
Fixes: 23a3647bc4 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d339727c2b ]
User space can crash kernel with
ip link add ifb10 numtxqueues 100000 type ifb
We must replace a BUG_ON() by proper test and return -EINVAL for
crazy values.
Fixes: 60877a32bc ("net: allow large number of tx queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4c938d22c8 ]
Before commit daad151263 ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.
Make MLD packet only processed locally.
Fixes: daad151263 ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().")
Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1abf25a25b ]
Using -1 as platform device id means that the platform driver core will not
assign any id to the device (the device name will not have id at all). This
results problems on systems that have multiple PCHs (Platform Controller
HUBs) because all of them also include their own copy of LPC device.
All the subsequent device creations will fail because there already exists
platform device with the same name.
Fix this by passing PLATFORM_DEVID_AUTO as platform device id. This makes
the platform device core to allocate new ids automatically.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0a04b01669 ]
I want to use lockless_dereference() from seqlock.h, which would mean
including rcupdate.h from it, however rcupdate.h already includes
seqlock.h.
Avoid this by moving lockless_dereference() into compiler.h. This is
somewhat tricky since it uses smp_read_barrier_depends() which isn't
available there, but its a CPP macro so we can get away with it.
The alternative would be moving it into asm/barrier.h, but that would
be updating each arch (I can do if people feel that is more
appropriate).
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9b6e6a8334 ]
Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.
The NMI nesting fixup relies on a precise stack layout and
atomic IRET. Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.
This will make some nested NMIs visible to C code, but the C
code is okay with that.
As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7250dc3fee ]
I received a report from an user of following mouse which needs this quirk:
usb 1-1.6: USB disconnect, device number 58
usb 1-1.6: new low speed USB device number 59 using ehci_hcd
usb 1-1.6: New USB device found, idVendor=04f2, idProduct=1053
usb 1-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.6: Product: USB Optical Mouse
usb 1-1.6: Manufacturer: PixArt
usb 1-1.6: configuration #1 chosen from 1 choice
input: PixArt USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6:1.0/input/input5887
generic-usb 0003:04F2:1053.16FE: input,hidraw2: USB HID v1.11 Mouse [PixArt USB Optical Mouse] on usb-0000:00:1a.0-1.6/input0
The quirk was tested by the reporter and it fixed the frequent disconnections etc.
[jkosina@suse.cz: reorder the position in hid-ids.h]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 49718f0fb8 ]
The routines in scsi_rpm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).
However, this assumption turns out to be wrong for things like the ses
driver. Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting. If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.
This patch fixes the problem by calling the block layer's runtime-PM
routines only if the device's driver really does have a runtime-PM
callback routine. Since ses doesn't define any such callbacks, the
crash won't occur.
This fixes Bugzilla #101371.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Stanisław Pitucha <viraptor@gmail.com>
Reported-by: Ilan Cohen <ilanco@gmail.com>
Tested-by: Ilan Cohen <ilanco@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 87ffd2b9bb ]
Since commit feb44f1f7a (x86/xen:
Provide a "Xen PV" APIC driver to support >255 VCPUs) Xen guests need
a full APIC driver and thus should depend on X86_LOCAL_APIC.
This fixes an i386 build failure with !SMP && !CONFIG_X86_UP_APIC by
disabling Xen support in this configuration.
Users needing Xen support in a non-SMP i386 kernel will need to enable
CONFIG_X86_UP_APIC.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 126c69a0bd ]
When injecting a fault into a misbehaving 32bit guest, it seems
rather idiotic to also inject a 64bit fault that is only going
to corrupt the guest state. This leads to a situation where we
perform an illegal exception return at EL2 causing the host
to crash instead of killing the guest.
Just fix the stupid bug that has been there from day 1.
Cc: <stable@vger.kernel.org>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7a7184b01a ]
The Crucial M500 is known to have issues with queued TRIM commands, the
factory recertified SSDs use a different model number naming convention
which causes them to get ignored by the blacklist.
The new naming convention boils down to: s/Crucial_/FC/
Signed-off-by: Guillermo A. Amaral <g@maral.me>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 24ee3cf89b ]
The comment says it's using trialcs->mems_allowed as a temp variable but
it didn't match the code. Change the code to match the comment.
This fixes an issue when writing in cpuset.mems when a sub-directory
exists: we need to write several times for the information to persist:
| root@alban:/sys/fs/cgroup/cpuset# mkdir footest9
| root@alban:/sys/fs/cgroup/cpuset# cd footest9
| root@alban:/sys/fs/cgroup/cpuset/footest9# mkdir aa
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
|
| root@alban:/sys/fs/cgroup/cpuset/footest9# echo 0 > aa/cpuset.mems
| root@alban:/sys/fs/cgroup/cpuset/footest9# cat aa/cpuset.mems
| 0
| root@alban:/sys/fs/cgroup/cpuset/footest9#
This should help to fix the following issue in Docker:
https://github.com/opencontainers/runc/issues/133
In some conditions, a Docker container needs to be started twice in
order to work.
Signed-off-by: Alban Crequy <alban@endocode.com>
Tested-by: Iago López Galeiras <iago@endocode.com>
Cc: <stable@vger.kernel.org> # 3.17+
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4f258a4634 ]
Commit bcdb247c6b ("sd: Limit transfer length") clamped the maximum
size of an I/O request to the MAXIMUM TRANSFER LENGTH field in the BLOCK
LIMITS VPD. This had the unfortunate effect of also limiting the maximum
size of non-filesystem requests sent to the device through sg/bsg.
Avoid using blk_queue_max_hw_sectors() and set the max_sectors queue
limit directly.
Also update the comment in blk_limits_max_hw_sectors() to clarify that
max_hw_sectors defines the limit for the I/O controller only.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Brian King <brking@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 660d0831d1 ]
In case of hw iscsi offload, an host can have N-number of active
connections. There can be IO's running on some connections which
make host->host_busy always TRUE. Now if logout from a connection
is tried then the code gets into an infinite loop as host->host_busy
is always TRUE.
iscsi_conn_teardown(....)
{
.........
/*
* Block until all in-progress commands for this connection
* time out or fail.
*/
for (;;) {
spin_lock_irqsave(session->host->host_lock, flags);
if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
spin_unlock_irqrestore(session->host->host_lock, flags);
break;
}
spin_unlock_irqrestore(session->host->host_lock, flags);
msleep_interruptible(500);
iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
"host_busy %d host_failed %d\n",
atomic_read(&session->host->host_busy),
session->host->host_failed);
................
...............
}
}
This is not an issue with software-iscsi/iser as each cxn is a separate
host.
Fix:
Acquiring eh_mutex in iscsi_conn_teardown() before setting
session->state = ISCSI_STATE_TERMINATE.
Signed-off-by: John Soni Jose <sony.john@avagotech.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Chris Leech <cleech@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8ef9724bf9 ]
When inserting a new register into a block, the present bit map size is
increased using krealloc. krealloc does not clear the additionally
allocated memory, leaving it filled with random values. Result is that
some registers are considered cached even though this is not the case.
Fix the problem by clearing the additionally allocated memory. Also, if
the bitmap size does not increase, do not reallocate the bitmap at all
to reduce overhead.
Fixes: 3f4ff561bc ("regmap: rbtree: Make cache_present bitmap per node")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 72e43164fd ]
The PM runtime core by default assumes a chip is suspended when runtime
PM is enabled. Currently the arizona driver enables runtime PM when the
chip is fully active and then disables the DCVDD regulator at the end of
arizona_dev_init. This however has several problems, firstly the if we
reach the end of arizona_dev_init, we did not properly follow all the
proceedures for shutting down the chip, and most notably we never marked
the chip as cache only so any writes occurring between then and the next
PM runtime resume will be lost. Secondly, if we are already resumed when
we reach the end of dev_init, then at best we get unbalanced regulator
enable/disables at work we lose DCVDD whilst we need it.
Additionally, since the commit 4f0216409f7c ("mfd: arizona: Add better
support for system suspend"), the PM runtime operations may
disable/enable the IRQ, so the IRQs must now be enabled before we call
any PM operations.
This patch adds a call to pm_runtime_set_active to inform the PM core
that the device is starting up active and moves the PM enabling to
around the IRQ initialisation to avoid any PM callbacks happening until
the IRQs are initialised.
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f6979adeaa ]
Due to patch "libfc: Do not invoke the response handler after
fc_exch_done()" (commit ID 7030fd62) the lport_recv() call
in fc_exch_recv_req() is passed a dangling pointer. Avoid this
by moving the fc_frame_free() call from fc_invoke_resp() to its
callers. This patch fixes the following crash:
general protection fault: 0000 [#3] PREEMPT SMP
RIP: fc_lport_recv_req+0x72/0x280 [libfc]
Call Trace:
fc_exch_recv+0x642/0xde0 [libfc]
fcoe_percpu_receive_thread+0x46a/0x5ed [fcoe]
kthread+0x10a/0x120
ret_from_fork+0x42/0x70
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3e04e2fe6d ]
This addresses two issues that cause problems with viewperf maya-03 in
situation with memory pressure.
The first issue causes attempts to unreserve buffers if batched
reservation fails due to, for example, a signal pending. While previously
the ttm_eu api was resistant against this type of error, it is no longer
and the lockdep code will complain about attempting to unreserve buffers
that are not reserved. The issue is resolved by avoid calling
ttm_eu_backoff_reservation in the buffer reserve error path.
The second issue is that the binding_mutex may be held when user-space
fence objects are created and hence during memory reclaims. This may cause
recursive attempts to grab the binding mutex. The issue is resolved by not
holding the binding mutex across fence creation and submission.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fc5fee86bd ]
It turns out that a PV domU also requires the "Xen PV" APIC
driver. Otherwise, the flat driver is used and we get stuck in busy
loops that never exit, such as in this stack trace:
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56 while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
#0 __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
#1 __default_send_IPI_shortcut (shortcut=<optimized out>,
dest=<optimized out>, vector=<optimized out>) at
./arch/x86/include/asm/ipi.h:75
#2 apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
#3 0xffffffff81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
#4 0xffffffff8114990c in irq_work_queue (work=0xffff88000fc0e400) at
kernel/irq_work.c:100
#5 0xffffffff8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
#6 0xffffffff8110ca60 in vprintk_emit (facility=0, level=<optimized
out>, dict=0x0 <irq_stack_union>, dictlen=<optimized out>,
fmt=<optimized out>, args=<optimized out>)
at kernel/printk/printk.c:1778
#7 0xffffffff816010c8 in printk (fmt=<optimized out>) at
kernel/printk/printk.c:1868
#8 0xffffffffc00013ea in ?? ()
#9 0x0000000000000000 in ?? ()
Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 53bc7dc004 ]
The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge
work haven't finished.
There is a work_pending() before this BUG_ON, but it doesn't account if the work
is still currently running.
CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7b0767502b ]
We should consider info->feature_persistent when adding indirect page to list
info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered.
When we are using persistent grants the indirect_pages list
should always be empty because blkfront has pre-allocated enough
persistent pages to fill all requests on the ring.
CC: stable@vger.kernel.org
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 036138080a ]
Hugetlbfs pages will get a refcount in get_any_page() or
madvise_hwpoison() if soft offlining through madvise. The refcount which
is held by the soft offline path should be released if we fail to isolate
hugetlbfs pages.
Fix it by reducing the refcount for both isolation success and failure.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bcc5422230 ]
We are not safe from calling isolate_huge_page() on a hugepage
concurrently, which can make the victim hugepage in invalid state and
results in BUG_ON().
The root problem of this is that we don't have any information on struct
page (so easily accessible) about hugepages' activeness. Note that
hugepages' activeness means just being linked to
hstate->hugepage_activelist, which is not the same as normal pages'
activeness represented by PageActive flag.
Normal pages are isolated by isolate_lru_page() which prechecks PageLRU
before isolation, so let's do similarly for hugetlb with a new
paeg_huge_active().
set/clear_page_huge_active() should be called within hugetlb_lock. But
hugetlb_cow() and hugetlb_no_page() don't do this, being justified because
in these functions set_page_huge_active() is called right after the
hugepage is allocated and no other thread tries to isolate it.
[akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/, make it return bool]
[fengguang.wu@intel.com: set_page_huge_active() can be static]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4f32be677b ]
After trying to drain pages from pagevec/pageset, we try to get reference
count of the page again, however, the reference count of the page is not
reduced if the page is still not on LRU list.
Fix it by adding the put_page() to drop the page reference which is from
__get_any_page().
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3ed1f8a99d ]
sem_lock() did not properly pair memory barriers:
!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.
As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.
With regards to -stable:
The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5054daa285 ]
Commit 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
introduced CR4 shadows.
These shadows are initialized in early boot code. The commit missed
initialization for 64-bit PV(H) guests that this patch adds.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d55c670cbc ]
The vti6_rcv_cb and vti_rcv_cb calls were leaving the skb->mark modified
after completing the function. This resulted in the original skb->mark
value being lost. Since we only need skb->mark to be set for
xfrm_policy_check we can pull the assignment into the rcv_cb calls and then
just restore the original mark after xfrm_policy_check has been completed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 049f8e2e28 ]
This change makes it so that if a tunnel is defined we just use the mark
from the tunnel instead of the mark from the skb header. By doing this we
can avoid the need to set skb->mark inside of the tunnel receive functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cd5279c194 ]
Instead of modifying skb->mark we can simply modify the flowi_mark that is
generated as a result of the xfrm_decode_session. By doing this we don't
need to actually touch the skb->mark and it can be preserved as it passes
out through the tunnel.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9051bd393c ]
A new Micron drive was just announced, once again recycling the first
part of the model string. Add an underscore to the M510/M550 pattern to
avoid picking up the new DC drive.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71d126fd28 ]
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f303074160 ]
Create a sysfs "trim" attribute for each ata_device that displays
whether DSM TRIM is "unsupported", "unqueued", "forced_unqueued"
(blacklisted) or "queued".
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 243918be63 ]
Queued TRIM got disabled on Micron M500DC drives thanks to the
"Micron_M500*" pattern we had in place to accommodate the previous
generation of this drive family. Tweak the blacklist entry slightly so
we only disable queued TRIM for the non-DC variants of M500 drives.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9a9324d396 ]
The queued TRIM problems appear to be generic to Samsung's firmware and
not tied to a particular model. A recent update to the 840 EVO firmware
introduced the same issue as we saw on 850 Pro.
Blacklist queued TRIM on all 800-series drives while we work this issue
with Samsung.
Reported-by: Günter Waller <g.wal@web.de>
Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e61f7d1c3c ]
As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return
Zero After Trim) flags in the ATA Command Set are unreliable in the
sense that they only define what happens if the device successfully
executed the DSM TRIM command. TRIM is only advisory, however, and the
device is free to silently ignore all or parts of the request.
In practice this renders the DRAT and RZAT flags completely useless and
because the results are unpredictable we decided to disable discard in
MD for 3.18 to avoid the risk of data corruption.
Hardware vendors in the real world obviously need better guarantees than
what the standards bodies provide. Unfortuntely those guarantees are
encoded in product requirements documents rather than somewhere we can
key off of them programatically. So we are compelled to disabling
discard_zeroes_data for all devices unless we explicitly have data to
support whitelisting them.
This patch whitelists SSDs from a few of the main vendors. None of the
whitelists are based on written guarantees. They are purely based on
empirical evidence collected from internal and external users that have
tested or qualified these drives in RAID deployments.
The whitelist is only meant as a starting point and is by no means
comprehensive:
- All intel SSD models except for 510
- Micron M5?0/M600
- Samsung SSDs
- Seagate SSDs
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7e01b5acd8 ]
Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
to override the gfp flags of the allocation for the kexec control
page. The loop in kimage_alloc_normal_control_pages allocates pages
with GFP_KERNEL until a page is found that happens to have an
address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
the loop will keep allocating memory until the oom killer steps in.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 432ac1a2c0 ]
Skylake and Haswell have the same behavior on display audio. So this patch
applys Haswell fix-ups to Skylake.
Signed-off-by: Libin Yang <libin.yang@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1131374654 ]
On R-Mobile APE6, since it has 3 thermal zones, ENR register
has enable bits in bit 19-16, bit 11-8 and bit 3-0.
However, on R-Car gen2, since it has 1 thermal zone, ENR register has
enable bits in bit 3-0. (In other words, the write value should always
be 0 for bit 31-4 of ENR register.)
So, this patch fixes the ENR register value using I/O resource sets.
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d6c763afab ]
Since commit 8a0a9bd4db ('random: make get_random_int() more
random'), get_random_int() returns a random value for each call,
so comment and hack introduced in mmap_rnd() as part of commit
1d18c47c73 ('arm64: MMU fault handling and page table management')
are incorrects.
Commit 1d18c47c73 seems to use the same hack introduced by
commit a5adc91a4b ('powerpc: Ensure random space between stack
and mmaps'), latter copied in commit 5a0efea09f ('sparc64: Sharpen
address space randomization calculations.').
But both architectures were cleaned up as part of commit
fa8cbaaf5a ('powerpc+sparc64/mm: Remove hack in mmap randomize
layout') as hack is no more needed since commit 8a0a9bd4db.
So the present patch removes the comment and the hack around
get_random_int() on AArch64's mmap_rnd().
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a8c8316b11 ]
The Microchip Pick16F1454 is exported as a HID device and is used by for
example the Yepkit YKUSH three-port switchable USB hub. However, it is not an
actual HID-device. On the Yepkit, it is used to power up/down the ports on the
hub. The HID driver should ignore this device.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit adc2325923 ]
The Si4713 development board contains a Si4713 FM transmitter chip
and is handled by the radio-usb-si4713 driver.
The board reports itself as (10c4:8244) Cygnal Integrated Products, Inc.
and misidentifies itself as a HID device in its USB interface descriptor.
This patch ignores this device as an HID device and hence loads the custom driver.
Signed-off-by: Dinesh Ram <dinesh.ram@cern.ch>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9b028649b9 ]
The linux kernel has supported the TiVo Slide remote control for some time, but
does not recognize the USB ID of the newer Slide Pro. This patch adds the
missing data structures so the newer remote will be recognized by the driver,
thereby allowing the TiVo, LiveTV, and Thumbs Up/Down buttons to be
mapped with a hwdb file.
Signed-off-by: Forest Wilkinson <web11.forest@tibit.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d2cce58a7 ]
Fix a couple of pci id table mistakes:
Subdevice ID 0x3323 missing from product[] table
(another name for HP Smart Storage 1210m)
Bogus 0x1925 subdevice id removed from hpsa_pci_device_id[] (no such thing.)
Signed-off-by: Don Brace <don.brace@pmcs.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7e7e8fe698 ]
The pcc-cpufreq driver is not automatically loaded on systems where
the platform's power management setting requires this driver.
Instead, on those systems no CPU frequency driver is registered and
active.
Make the autoloading matching criteria for loading the pcc-cpufreq
driver the same as done in acpi-cpufreq by commit c655affbd5
("ACPI / cpufreq: Add ACPI processor device IDs to acpi-cpufreq").
x86 CPU frequency drivers are now typically autoloaded by specifying
MODULE_DEVICE_TABLE entries and x86cpu model specific matching.
But pcc-cpufreq was omitted when acpi-cpufreq and other drivers were
changed to use this approach.
Both acpi-cpufreq and pcc-cpufreq depend on a distinct and mutually
exclusive set of ACPI methods which are not directly tied to specific
processor model numbers. Both of these drivers have init routines
which look for their required ACPI methods. As a result, only the
appropriate driver registers as the cpu frequency driver and the other
one ends up being unloaded.
Tested on various systems where acpi-cpufreq, intel_pstate, and
pcc-cpufreq are the expected cpu frequency drivers.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: Joseph Szczypek <joseph.szczypek@hp.com>
Reported-by: Trinh Dao <trinh.dao@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b4c2526134 ]
commit 93fb9127cb upstream.
This patch fixes an issue that sometimes this controller is not able
to complete the Control write status stage.
This driver should enable DCPCTR.CCPL and PID_BUF to complete the status
stage. However, if this driver detects the ctrl_stage interruption first
before the control write data is received, this driver will clear the
PID_BUF wrongly in the usbhsf_pio_try_pop(). To avoid this issue, this
patch doesn't clear the PID_BUF in the usbhsf_pio_try_pop().
(Since also the privious code doesn't disable the PID_BUF after a control
transfer was finished, this patch doesn't have any side efforts.)
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e582ff309 ]
This patch fixes an issue for control write. When usbhsf_prepare_pop()
is called after this driver called a gadget setup function, this controller
doesn't receive the control write data. So, this patch adds a code to clear
the fifo for control write in usbhsf_prepare_pop().
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 727b9784b6 ]
Orphans in the fs tree are cleaned up via open_ctree and subvolume
orphans are cleaned via btrfs_lookup_dentry -- except when a default
subvolume is in use. The name for the default subvolume uses a manual
lookup that doesn't trigger orphan cleanup and needs to trigger it
manually as well. This doesn't apply to the remount case since the
subvolumes are cleaned up by walking the root radix tree.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 26e726afe0 ]
fiemap_fill_next_extent returns 0 on success, -errno on error, 1 if this was
the last extent that will fit in user array. If 1 is returned, the return
value may eventually returned to user space, which should not happen, according
to manpage of ioctl.
Signed-off-by: Chengyu Song <csong84@gatech.edu>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
commit b97e92574c upstream
Use separate bitmaps for each nodes in the cluster
bitmap_read_sb() validates the bitmap superblock that it reads in.
If it finds an inconsistency like a bad magic number or out-of-range
version number, it prints an error and returns, but it incorrectly
returns zero, so the array is still assembled with the (invalid) bitmap.
This means it could try to use a bitmap with a new version number which
it therefore does not understand.
This bug was introduced in 3.5 and fix as part of a larger patch in 4.1.
So the patch is suitable for any -stable kernel in that range.
Fixes: 27581e5ae0 ("md/bitmap: centralise allocation of bitmap file pages.")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: GuoQing Jiang <gqjiang@suse.com>
(cherry picked from commit ed9691677d)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 451a2886b6 ]
unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there. If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.
X-Coverup: TINC (and there's no lumber cartel either)
Cc: stable@vger.kernel.org # way, way back
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e54198657b ]
This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where a iscsit_start_kthreads() failure triggers
a NULL pointer dereference OOPs:
commit 88dcd2dab5
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Thu Feb 26 22:19:15 2015 -0800
iscsi-target: Convert iscsi_thread_set usage to kthread.h
To address this bug, move iscsit_start_kthreads() immediately
preceeding the transmit of last login response, before signaling
a successful transition into full-feature-phase within existing
iscsi_target_do_tx_login_io() logic.
This ensures that no target-side resource allocation failures can
occur after the final login response has been successfully sent.
Also, it adds a iscsi_conn->rx_login_comp to allow the RX thread
to sleep to prevent other socket related failures until the final
iscsi_post_login_handler() call is able to complete.
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 747cadeb10 ]
commit 4351c294b8 upstream.
The current "mask" policy option matches files opened as MAY_READ,
MAY_WRITE, MAY_APPEND or MAY_EXEC. This patch extends the "mask"
option to match files opened containing one of these modes. For
example, "mask=^MAY_READ" would match files opened read-write.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 139069eff7 ]
The new "euid" policy condition measures files with the specified
effective uid (euid). In addition, for CAP_SETUID files it measures
files with the specified uid or suid.
Changelog:
- fixed checkpatch.pl warnings
- fixed avc denied {setuid} messages - based on Roberto's feedback
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Dr. Greg Wettstein <gw@idfusion.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 007d038bdf ]
This patch fixes a regression introduced with the following commit
in v4.0-rc1 code, where an explicit iser-target logout would result
in ->tx_thread_active being incorrectly cleared by the logout post
handler, and subsequent TX kthread leak:
commit 88dcd2dab5
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date: Thu Feb 26 22:19:15 2015 -0800
iscsi-target: Convert iscsi_thread_set usage to kthread.h
To address this bug, change iscsit_logout_post_handler_closesession()
and iscsit_logout_post_handler_samecid() to only cmpxchg() on
->tx_thread_active for traditional iscsi/tcp connections.
This is required because iscsi/tcp connections are invoking logout
post handler logic directly from TX kthread context, while iser
connections are invoking logout post handler logic from a seperate
workqueue context.
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 417c20a9bd ]
This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.
The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.
To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.
As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.
Reported-by: Sunilkumar Nadumuttlu <sjn@datera.io>
Cc: Sunilkumar Nadumuttlu <sjn@datera.io>
Cc: <stable@vger.kernel.org> # v3.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5c02a42065 ]
Since NULL is used as valid clock object on optional clocks we have to handle
this case in avr32 implementation as well.
Fixes: e1824dfe0d (net: macb: Adjust tx_clk when link speed changes)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4fabb59449 ]
Fixes: 3e0249f9c0 ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }
fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7cc03e4896 ]
The efi_info structure stores low 32 bits of memory map
in efi_memmap and high 32 bits in efi_memmap_hi.
While constructing pointer in the setup_e820(), need
to take into account all 64 bit of the pointer.
It is because on 64bit machine the function
efi_get_memory_map() may return full 64bit pointer and before
the patch that pointer was truncated.
The issue is triggered on Parallles virtual machine and
fixed with this patch.
Signed-off-by: Dmitry Skorodumov <sdmitry@parallels.com>
Cc: Denis V. Lunev <den@openvz.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aca3a0489a ]
Port link change with port in resume state should not be
reported to usbcore, as this is an internal state to be
handled by xhci driver. Reporting PLC to usbcore may
cause usbcore clearing PLC first and port change event irq
won't be generated.
Cc: <stable@vger.kernel.org>
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fac4271d11 ]
When the link is just waken, it's in Resume state, and driver sets PLS to
U0. This refers to Phase 1. Phase 2 refers to when the link has completed
the transition from Resume state to U0.
With the fix of xhci: report U3 when link is in resume state, it also
exposes an issue that usb3 roothub and controller can suspend right
after phase 1, and this causes a hard hang in controller.
To fix the issue, we need to prevent usb3 bus suspend if any port is
resuming in phase 1.
[merge separate USB2 and USB3 port resume checking to one -Mathias]
Cc: <stable@vger.kernel.org>
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 326124a027 ]
When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.
This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.
The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Campbell <bacam@z273.org.uk>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5fb2c782f4 ]
This device automatically switches itself to another mode (0x1405)
unless the specific access pattern of Windows is followed in its
initial mode. That makes a dirty unmount of the internal storage
devices inevitable if they are mounted. So the card reader of
such a device should be ignored, lest an unclean removal become
inevitable.
This replaces an earlier patch that ignored all LUNs of this device.
That patch was overly broad.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
CC: stable@vger.kernel.org
Reviewed-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5f6c2d2b7d ]
When a blkcg configuration is targeted to a partition rather than a
whole device, blkg_conf_prep fails with -EINVAL; unfortunately, it
forgets to put the gendisk ref in that case. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 968491709e ]
This patch fixes a problem in the usbtouchscreen driver for DMC TSC-30
touch screen. Due to a missing delay between the RESET and SET_RATE
commands, the touch screen may become unresponsive during system startup or
driver loading.
According to the DMC documentation, a delay is needed after the RESET
command to allow the chip to complete its internal initialization. As this
delay is not guaranteed, we had a system where the touch screen
occasionally did not send any touch data. There was no other indication of
the problem.
The patch fixes the problem by adding a 150ms delay between the RESET and
SET_RATE commands.
Cc: stable@vger.kernel.org
Suggested-by: Jakob Mustafa <jakob.mustafa@bytecmed.com>
Signed-off-by: Bernhard Bender <bernhard.bender@bytecmed.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 34cab6f420 ]
When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
read error to the caller.
However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device. However a spare which is rebuilding
would be non-faulty but so not the only working device.
So change the test from "!Faulty" to "In_sync". If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.
This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0
Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 76073054c9 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e91125ff3 ]
Support for 8BIT bus with was added some time ago to sdhci-esdhc but
then missed to remove the 8BIT from the reserved bit mask which made
8BIT non functional.
Fixes: 66b50a0099 ("mmc: esdhc: Add support for 8-bit bus width and..")
Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4479004e64 ]
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
Cc: stable@kernel.org
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0689a86ae8 ]
The Steinberg MI2 and MI4 interfaces are compatible with the USB class
audio spec, but the MIDI part of the devices is reported as a vendor
specific interface.
This patch adds entries to quirks-table.h to recognize the MIDI
endpoints. Audio functionality was already working and is unaffected by
this change.
Signed-off-by: Dominic Sacré <dominic.sacre@gmx.de>
Signed-off-by: Albert Huitsing <albert@huitsing.nl>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 75a06189fc ]
The resend mechanism happily calls the interrupt handler of interrupts
which are marked IRQ_NESTED_THREAD from softirq context. This can
result in crashes because the interrupt handler is not the proper way
to invoke the device handlers. They must be invoked via
handle_nested_irq.
Prevent the resend even if the interrupt has no valid parent irq
set. Its better to have a lost interrupt than a crashing machine.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f9c87a6f46 ]
If the kernel is compiled with gcc 5.1 and the XZ compression option
the decompress_kernel function calls _sclp_print_early in 64-bit mode
while the content of the upper register half of %r6 is non-zero.
This causes a specification exception on the servc instruction in
_sclp_servc.
The _sclp_print_early function saves and restores the upper registers
halves but it fails to clear them for the 31-bit code of the mini sclp
driver.
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 75a6f82a0d ]
Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course). However, there's one case where that
does *not* happen. Namely, if you open it by fhandle with cold dcache,
then unlink() and close().
In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal. In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()). The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in. As the result, we have the final iput() delayed
indefinitely. It's trivial to reproduce -
void flush_dcache(void)
{
system("mount -o remount,rw /");
}
static char buf[20 * 1024 * 1024];
main()
{
int fd;
union {
struct file_handle f;
char buf[MAX_HANDLE_SZ];
} x;
int m;
x.f.handle_bytes = sizeof(x);
chdir("/root");
mkdir("foo", 0700);
fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
close(fd);
name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
flush_dcache();
fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
unlink("foo/bar");
write(fd, buf, sizeof(buf));
system("df ."); /* 20Mb eaten */
close(fd);
system("df ."); /* should've freed those 20Mb */
flush_dcache();
system("df ."); /* should be the same as #2 */
}
will spit out something like
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 303843 1131 100% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 322023 283282 21692 93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).
Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cdbbeb69d4 ]
commit b064a8fa77 upstream.
Commit 73f7d1ca32 "ACPI / init: Run acpi_early_init() before
timekeeping_init()" moved the ACPI subsystem initialization,
including the ACPI mode enabling, to an earlier point in the
initialization sequence, to allow the timekeeping subsystem
use ACPI early. Unfortunately, that resulted in boot regressions
on some systems and the early ACPI initialization was moved toward
its original position in the kernel initialization code by commit
c4e1acbb35 "ACPI / init: Invoke early ACPI initialization later".
However, that turns out to be insufficient, as boot is still broken
on the Tyan S8812 mainboard.
To fix that issue, split the ACPI early initialization code into
two pieces so the majority of it still located in acpi_early_init()
and the part switching over the platform into the ACPI mode goes into
a new function, acpi_subsystem_init(), executed at the original early
ACPI initialization spot.
That fixes the Tyan S8812 boot problem, but still allows ACPI
tables to be loaded earlier which is useful to the EFI code in
efi_enter_virtual_mode().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
Fixes: 73f7d1ca32 "ACPI / init: Run acpi_early_init() before timekeeping_init()"
Reported-and-tested-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1ccdd6c6e9 ]
commit 8fcd461db7 upstream.
Currently, preprocess_stateid_op calls nfs4_check_olstateid which
verifies that the open stateid corresponds to the current filehandle in the
call by calling nfs4_check_fh.
If the stateid is a NFS4_DELEG_STID however, then no such check is done.
This could cause incorrect enforcement of permissions, because the
nfsd_permission() call in nfs4_check_file uses current the current
filehandle, but any subsequent IO operation will use the file descriptor
in the stateid.
Move the call to nfs4_check_fh into nfs4_check_file instead so that it
can be done for all stateid types.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
[bfields: moved fh check to avoid NULL deref in special stateid case]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ce40cd3fc7 ]
Malicious (or egregiously buggy) userspace can trigger it, but it
should never happen in normal operation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3c00cb5e68 ]
This function can leak kernel stack data when the user siginfo_t has a
positive si_code value. The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.
copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.
This fixes the following information leaks:
x86: 8 bytes leaked when sending a signal from a 32-bit process to
itself. This leak grows to 16 bytes if the process uses x32.
(si_code = __SI_CHLD)
x86: 100 bytes leaked when sending a signal from a 32-bit process to
a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
64-bit process. (si_code = any)
parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process. These bugs are also fixed for consistency.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c08a75d950 ]
commit 26135022f8 upstream.
This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.
Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3ead7c52bd ]
This function may copy the si_addr_lsb field to user mode when it hasn't
been initialized, which can leak kernel stack data to user mode.
Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals. This is solved by
checking the value of si_signo in addition to si_code.
Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ecf5fc6e96 ]
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:
PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync"
#0 __schedule at ffffffff815ab152
#1 schedule at ffffffff815ab76e
#2 schedule_timeout at ffffffff815ae5e5
#3 io_schedule_timeout at ffffffff815aad6a
#4 bit_wait_io at ffffffff815abfc6
#5 __wait_on_bit at ffffffff815abda5
#6 wait_on_page_bit at ffffffff8111fd4f
#7 shrink_page_list at ffffffff81135445
#8 shrink_inactive_list at ffffffff81135845
#9 shrink_lruvec at ffffffff81135ead
#10 shrink_zone at ffffffff811360c3
#11 shrink_zones at ffffffff81136eff
#12 do_try_to_free_pages at ffffffff8113712f
#13 try_to_free_mem_cgroup_pages at ffffffff811372be
#14 try_charge at ffffffff81189423
#15 mem_cgroup_try_charge at ffffffff8118c6f5
#16 __add_to_page_cache_locked at ffffffff8112137d
#17 add_to_page_cache_lru at ffffffff81121618
#18 pagecache_get_page at ffffffff8112170b
#19 grow_dev_page at ffffffff811c8297
#20 __getblk_slow at ffffffff811c91d6
#21 __getblk_gfp at ffffffff811c92c1
#22 ext4_ext_grow_indepth at ffffffff8124565c
#23 ext4_ext_create_new_leaf at ffffffff81246ca8
#24 ext4_ext_insert_extent at ffffffff81246f09
#25 ext4_ext_map_blocks at ffffffff8124a848
#26 ext4_map_blocks at ffffffff8121a5b7
#27 mpage_map_one_extent at ffffffff8121b1fa
#28 mpage_map_and_submit_extent at ffffffff8121f07b
#29 ext4_writepages at ffffffff8121f6d5
#30 do_writepages at ffffffff8112c490
#31 __filemap_fdatawrite_range at ffffffff81120199
#32 filemap_flush at ffffffff8112041c
#33 ext4_alloc_da_blocks at ffffffff81219da1
#34 ext4_rename at ffffffff81229b91
#35 ext4_rename2 at ffffffff81229e32
#36 vfs_rename at ffffffff811a08a5
#37 SYSC_renameat2 at ffffffff811a3ffc
#38 sys_renameat2 at ffffffff811a408e
#39 sys_rename at ffffffff8119e51e
#40 system_call_fastpath at ffffffff815afa89
Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.
The heuristic was introduced by commit e62e384e9d ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified. The code has been changed by c3b94f44fc ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code. But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.
ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio. Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.
Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback. The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.
As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem. Moreover he notes:
: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.
Cc: stable@vger.kernel.org # 3.9+
[tytso@mit.edu: corrected the control flow]
Fixes: c3b94f44fc ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5f867db634 ]
Commit 66507c7bc8 ("mtd: nand: Add support to use nand_base
poi databuf as bounce buffer") added a flag NAND_USE_BOUNCE_BUFFER
using the same bit value as the existing NAND_BUSWIDTH_AUTO.
Cc: Kamal Dasu <kdasu.kdev@gmail.com>
Fixes: 66507c7bc8 ("mtd: nand: Add support to use nand_base
poi databuf as bounce buffer")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 653cdc13a3 ]
Tests with a Sierra Wireless MC7355 have shown that 1199:9041 devices
also require the option_send_setup() code to be used on the USB
interface for the AT port to make unsolicited response codes work
correctly. Move these devices from the qcserial driver to the option
driver like it has been done for the 1199:68c0 devices in commit
d80c0d1418 ("USB: qcserial/option: make
AT URCs work for Sierra Wireless MC73xx").
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c41b776767 ]
The p_interval should be less if the 'bInterval' at the descriptor
is larger, eg, if 'bInterval' is 5 for HS, the p_interval should be
8000 / 16 = 500.
It fixes the patch 9bb87f1689 ("usb: gadget: f_uac2: send
reasonably sized packets")
Cc: <stable@vger.kernel.org> # v3.18+
Fixes: 9bb87f1689 ("usb: gadget: f_uac2: send reasonably sized packets")
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 423f04d63c ]
raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.
This should probably be part of
Commit: 34cab6f420 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.
Fixes: 76073054c9 ("md/raid1: clean up read_balance.")
Cc: stable@vger.kernel.org (v3.0+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c9ddbac9c8 ]
09a2c73ddf ("PCI: Remove unused PCI_MSIX_FLAGS_BIRMASK definition")
removed PCI_MSIX_FLAGS_BIRMASK from an exported header because it was
unused in the kernel. But that breaks user programs that were using it
(QEMU in particular).
Restore the PCI_MSIX_FLAGS_BIRMASK definition.
[bhelgaas: changelog]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.13+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 209f7512d0 ]
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:
ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres. During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing. And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit de54b9ac25 ]
A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859b3 ("ipc/mqueue: improve performance of send/recv").
That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue. Before, this field
reflected the size of the user-data in the queue. Since, it also takes
kernel data structures into account. For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.
There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):
The pseudofiles in the mqueue filesystem (usually mounted at
/dev/mqueue) expose fields with metadata describing a message
queue. One of these fields, QSIZE, as originally implemented,
showed the total number of bytes of user data in all messages in
the message queue, and this feature was documented from the
beginning in the mq_overview(7) page. In 3.5, some other (useful)
work happened to break the user-space API in a couple of places,
including the value exposed via QSIZE, which now includes a measure
of kernel overhead bytes for the queue, a figure that renders QSIZE
useless for its original purpose, since there's no way to deduce
the number of overhead bytes consumed by the implementation.
(The other user-space breakage was subsequently fixed.)
This patch removes the accounting of kernel data structures in the
queue. Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above). Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.
It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation). Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.
Signed-off-by: Marcus Gelderie <redmnic@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: John Duffy <jb_duffy@btinternet.com>
Cc: Arto Bendiken <arto@bendiken.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 18f5ed365d ]
Fireworks uses TSB43CB43(IceLynx-Micro) as its IEC 61883-1/6 interface.
This chip includes ARM7 core, and loads and runs program. The firmware
is stored in on-board memory and loaded every powering-on from it.
Echo Audio ships several versions of firmwares for each model. These
firmwares have each quirk and the quirk changes a sequence of packets.
As long as I investigated, AudioFire2/AudioFire4/AudioFirePre8 have a
quirk to transfer a first packet with 0x02 in its dbc field. This causes
ALSA Fireworks driver to detect discontinuity. In this case, firmware
version 5.7.0, 5.7.3 and 5.8.0 are used.
Payload CIP CIP
quadlets header1 header2
02 00050002 90ffffff <-
42 0005000a 90013000
42 00050012 90014400
42 0005001a 90015800
02 0005001a 90ffffff
42 00050022 90019000
42 0005002a 9001a400
42 00050032 9001b800
02 00050032 90ffffff
42 0005003a 9001d000
42 00050042 9001e400
42 0005004a 9001f800
02 0005004a 90ffffff
(AudioFire2 with firmware version 5.7.)
$ dmesg
snd-fireworks fw1.0: Detect discontinuity of CIP: 00 02
These models, AudioFire8 (since Jul 2009 ) and Gibson Robot Interface
Pack series uses the same ARM binary as their firmware. Thus, this
quirk may be observed among them.
This commit adds a new member for AMDTP structure. This member represents
the value of dbc field in a first AMDTP packet. Drivers can set it with
a preferred value according to model's quirk.
Tested-by: Johannes Oertei <johannes.oertel@uni-due.de>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 44008f0896 ]
Smatch complains that we have nested checks for "spdif_present". It
turns out the current behavior isn't correct, we should remove the first
check and keep the second.
Fixes: 1077a02481 ('ALSA: hda - Use generic parser for Cirrus codec driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e053f96b1a ]
Since commit 3d42a379b6
("can: flexcan: add 2nd clock to support imx53 and newer")
the can driver requires a dt nodes to have a second clock.
Add them to imx35 to fix probing the flex can driver on the
respective platforms.
Signed-off-by: Denis Carikli <denis@eukrea.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2761713d35 ]
For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.
rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run. Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:
<obj_request-1/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
rbd_obj_request_complete()
rbd_img_obj_copyup_callback()
rbd_img_obj_callback()
<obj_request-2/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
for_each_obj_request(obj_request->img_request) {
rbd_img_obj_end_request(obj_request-1/2)
rbd_img_obj_end_request(obj_request-2/2) <--
}
Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0. We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request). So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().
Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().
Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 3.18
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 17fb874dee ]
The kthread_run() function can return two different error values
but the hwrng core only checks for -ENOMEM. If the other error
value -EINTR is returned it is assigned to hwrng_fill and later
used on a kthread_stop() call which naturally crashes.
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 30b03d05e0 ]
While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d90d06680f ]
commit e50b1e06b7 upstream.
The DAPM lock must be held when accessing the DAPM graph status through
sysfs or debugfs, otherwise concurrent changes to the graph can result in
undefined behaviour.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c1bfa985de ]
All of the keystone devices have a separate register to hold post
divider value for main pll clock. Currently the fixed-postdiv
value used for k2hk/l/e SoCs works by sheer luck as u-boot happens to
use a value of 2 for this. Now that we have fixed this in the pll
clock driver change the dt bindings for the same.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 02fdfd708f ]
Main PLL controller has post divider bits in a separate register in
pll controller. Use the value from this register instead of fixed
divider when available.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 44922150d8 ]
If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:
ETRAP
ETRAP
VIS_ENTRY(fprs=0x4)
VIS_EXIT
RTRAP (kernel FPU restore with fpu_saved=0x4)
RTRAP
We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.
Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.
This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.
But this is not how trap returns from kernel to kernel operate.
The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.
Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.
Longer term we need to do something smarter to reinstate the partial
save optimizations. Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state. Instead,
the VISEntry et al. calls should be doing that work.
This bug is about two decades old.
Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 33afeac21b ]
commit b6878d9e03 upstream.
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".
5769 file = kmalloc(sizeof(*file), GFP_NOIO);
5770 if (!file)
5771 return -ENOMEM;
This structure is copied to user space at the end of the function.
5786 if (err == 0 &&
5787 copy_to_user(arg, file, sizeof(*file)))
5788 err = -EFAULT
But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.
5775 /* bitmap disabled, zero the first byte and copy out */
5776 if (!mddev->bitmap_info.file)
5777 file->pathname[0] = '\0';
Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 56301df6bc ]
A construct like:
if (pm_runtime_suspended(twl->dev))
pm_runtime_get_sync(twl->dev);
is against the spirit of the runtime_pm interface as it
makes the internal refcounting useless.
In this case it is also racy, particularly as 'put_autosuspend'
is used to drop a reference.
When that happens a timer is started and the device is
runtime-suspended after the timeout.
If the above code runs in this window, the device will not be
found to be suspended so no pm_runtime reference is taken.
When the timer expires the device will be suspended, which is
against the intention of the code.
So be more direct is taking and dropping references.
If twl->linkstat is VBUS_VALID or ID_GROUND, then hold a
pm_runtime reference, otherwise don't.
Define "cable_present()" to test for this condition.
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c93e64e912 ]
This patch fixes a bug in the error pathway of
usb_add_gadget_udc_release() in udc-core.c. If the udc registration
fails, the gadget registration is not fully undone; there's a
put_device(&gadget->dev) call but no device_del().
CC: <stable@vger.kernel.org>
Acked-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7447223323 ]
Add support for the Sierra Wireless AR8550 device with
USB descriptor 0x1199, 0x68AB.
It is common with MC879x modules 1199:683c/683d which
also are composite devices with 7 interfaces (0..6)
and also MDM62xx based as the AR8550.
The major difference are only the interface attributes
02/02/01 on interfaces 3 and 4 on the AR8550. They are
vendor specific ff/ff/ff on MC879x modules.
lsusb reports:
Bus 001 Device 004: ID 1199:68ab Sierra Wireless, Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1199 Sierra Wireless, Inc.
idProduct 0x68ab
bcdDevice 0.06
iManufacturer 3 Sierra Wireless, Incorporated
iProduct 2 AR8550
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 198
bNumInterfaces 7
bConfigurationValue 1
iConfiguration 1 Sierra Configuration
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x85 EP 5 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 4
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x87 EP 7 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 5
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x88 EP 8 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x89 EP 9 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 6
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8a EP 10 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8b EP 11 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x07 EP 7 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0001
Self Powered
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Lars Melin <larsm17@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7895086afd ]
We need to check that a TRB is part of the current segment
before calculating its DMA address.
Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.
Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.
This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0
The trb-end address is one outside the end-seg address.
Cc: <stable@vger.kernel.org>
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3f1c058131 ]
Fixes another signed / unsigned array indexing bug in the ipr driver.
Currently, when hrrq_index wraps, it becomes a negative number. We
do the modulo, but still have a negative number, so we end up indexing
backwards in the array. Given where the hrrq array is located in memory,
we probably won't actually reference memory we don't own, but nonetheless
ipr is still looking at data within struct ipr_ioa_cfg and interpreting it as
struct ipr_hrr_queue data, so bad things could certainly happen.
Each ipr adapter has anywhere from 1 to 16 HRRQs. By default, we use 2 on new
adapters. Let's take an example:
Assume ioa_cfg->hrrq_index=0x7fffffffe and ioa_cfg->hrrq_num=4:
The atomic_add_return will then return -1. We mod this with 3 and get -2, add
one and get -1 for an array index.
On adapters which support more than a single HRRQ, we dedicate HRRQ to adapter
initialization and error interrupts so that we can optimize the other queues
for fast path I/O. So all normal I/O uses HRRQ 1-15. So we want to spread the
I/O requests across those HRRQs.
With the default module parameter settings, this bug won't hit, only when
someone sets the ipr.number_of_msix parameter to a value larger than 3 is when
bad things start to happen.
Cc: <stable@vger.kernel.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 42639ba554 ]
Apparently been in there since forever and fairly easy to hit when
hotplugging really fast. I can do that since my mst hub has a manual
button to flick the hpd line for reprobing. The resulting WARNING spam
isn't pretty.
Cc: Dave Airlie <airlied@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8f2f3eb59d ]
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.
Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list. This method
is safe even when entries from the list can disappear once we drop the
lock.
Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 46011e6ea3 ]
On MIPS the GLOBAL bit of the PTE must have the same value in any
aligned pair of PTEs. These pairs of PTEs are referred to as
"buddies". In a SMP system is is possible for two CPUs to be calling
set_pte() on adjacent PTEs at the same time. There is a race between
setting the PTE and a different CPU setting the GLOBAL bit in its
buddy PTE.
This race can be observed when multiple CPUs are executing
vmap()/vfree() at the same time.
Make setting the buddy PTE's GLOBAL bit an atomic operation to close
the race condition.
The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not*
handled.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: <stable@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10835/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1e77863a51 ]
The show_stack() function deals exclusively with kernel contexts, but if
it gets called in user context with EVA enabled, show_stacktrace() will
attempt to access the stack using EVA accesses, which will either read
other user mapped data, or more likely cause an exception which will be
handled by __get_user().
This is easily reproduced using SysRq t to show all task states, which
results in the following stack dump output:
Stack : (Bad stack address)
Fix by setting the current user access mode to kernel around the call to
show_stacktrace(). This causes __get_user() to use normal loads to read
the kernel stack.
Now we get the correct output, like this:
Stack : 00000000 80168960 00000000 004a0000 00000000 00000000 8060016c 1f3abd0c
1f172cd8 8056f09c 7ff1e450 8014fc3c 00000001 806dd0b0 0000001d 00000002
1f17c6a0 1f17c804 1f17c6a0 8066f6e0 00000000 0000000a 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 0110e800 1f3abd6c 1f17c6a0
...
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Patchwork: https://patchwork.linux-mips.org/patch/10778/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 55c723e181 ]
If a machine check exception is raised in kernel mode, user context,
with EVA enabled, then the do_mcheck handler will attempt to read the
code around the EPC using EVA load instructions, i.e. as if the reads
were from user mode. This will either read random user data if the
process has anything mapped at the same address, or it will cause an
exception which is handled by __get_user, resulting in this output:
Code: (Bad address in epc)
Fix by setting the current user access mode to kernel if the saved
register context indicates the exception was taken in kernel mode. This
causes __get_user to use normal loads to read the kernel code.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.15+
Patchwork: https://patchwork.linux-mips.org/patch/10777/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 106eccb4d2 ]
On Malta, since commit a87ea88d8f ("MIPS: Malta: initialise the RTC at
boot"), the RTC is reinitialised and forced into binary coded decimal
(BCD) mode during init, even if the bootloader has already initialised
it, and may even have already put it into binary mode (as YAMON does).
This corrupts the current time, can result in the RTC seconds being an
invalid BCD (e.g. 0x1a..0x1f) for up to 6 seconds, as well as confusing
YAMON for a while after reset, enough for it to report timeouts when
attempting to load from TFTP (it actually uses the RTC in that code).
Therefore only initialise the RTC to the extent that is necessary so
that Linux avoids interfering with the bootloader setup, while also
allowing it to estimate the CPU frequency without hanging, without a
bootloader necessarily having done anything with the RTC (for example
when the kernel is loaded via EJTAG).
The divider control is configured for a 32KHZ reference clock if
necessary, and the SET bit of the RTC_CONTROL register is cleared if
necessary without changing any other bits (this bit will be set when
coming out of reset if the battery has been disconnected).
Fixes: a87ea88d8f ("MIPS: Malta: initialise the RTC at boot")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.14+
Patchwork: https://patchwork.linux-mips.org/patch/10739/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2a34c0872a ]
hctx->tags has to be set as NULL in case that it is to be unmapped
no matter if set->tags[hctx->queue_num] is NULL or not in blk_mq_map_swqueue()
because shared tags can be freed already from another request queue.
The same situation has to be considered during handling CPU online too.
Unmapped hw queue can be remapped after CPU topo is changed, so we need
to allocate tags for the hw queue in blk_mq_map_swqueue(). Then tags
allocation for hw queue can be removed in hctx cpu online notifier, and it
is reasonable to do that after mapping is updated.
Cc: <stable@vger.kernel.org>
Reported-by: Dongsu Park <dongsu.park@profitbricks.com>
Tested-by: Dongsu Park <dongsu.park@profitbricks.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 35e9a9f939 ]
This works around a issue with qnap iscsi targets not handling large IOs
very well.
The target returns:
VPD INQUIRY: Block limits page (SBC)
Maximum compare and write length: 1 blocks
Optimal transfer length granularity: 1 blocks
Maximum transfer length: 4294967295 blocks
Optimal transfer length: 4294967295 blocks
Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
Maximum unmap LBA count: 8388607
Maximum unmap block descriptor count: 1
Optimal unmap granularity: 16383
Unmap granularity alignment valid: 0
Unmap granularity alignment: 0
Maximum write same length: 0xffffffff blocks
Maximum atomic transfer length: 0
Atomic alignment: 0
Atomic transfer length granularity: 0
and it is *sometimes* able to handle at least one IO of size up to 8 MB. We
have seen in traces where it will sometimes work, but other times it
looks like it fails and it looks like it returns failures if we send
multiple large IOs sometimes. Also it looks like it can return 2 different
errors. It will sometimes send iscsi reject errors indicating out of
resources or it will send invalid cdb illegal requests check conditions.
And then when it sends iscsi rejects it does not seem to handle retries
when there are command sequence holes, so I could not just add code to
try and gracefully handle that error code.
The problem is that we do not have a good contact for the company,
so we are not able to determine under what conditions it returns
which error and why it sometimes works.
So, this patch just adds a new black list flag to set targets like this to
the old max safe sectors of 1024. The max_hw_sectors changes added in 3.19
caused this regression, so I also ccing stable.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f957724b9 ]
The firmware class uevent function accessed the "fw_priv->buf" buffer
without the proper locking and testing for NULL. This is an old bug
(looks like it goes back to 2012 and commit 1244691c73: "firmware
loader: introduce firmware_buf"), but for some reason it's triggering
only now in 4.2-rc1.
Shuah Khan is trying to bisect what it is that causes this to trigger
more easily, but in the meantime let's just fix the bug since others are
hitting it too (at least Ingo reports having seen it as well).
Reported-and-tested-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fd28f5d439 ]
The current pmd_huge() and pud_huge() functions simply check if the table
bit is not set and reports the entries as huge in that case. This is
counter-intuitive as a clear pmd/pud cannot also be a huge pmd/pud, and
it is inconsistent with at least arm and x86.
To prevent others from making the same mistake as me in looking at code
that calls these functions and to fix an issue with KVM on arm64 that
causes memory corruption due to incorrect page reference counting
resulting from this mistake, let's change the behavior.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Fixes: 084bd29810 ("ARM64: mm: HugeTLB support.")
Cc: <stable@vger.kernel.org> # 3.11+
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a84b69cb6e ]
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d683cc49da ]
When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.
Fixes: ee5dc7732b ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 530c11d432 ]
The omap watchdog has the annoying behaviour that writes to most
registers don't have any effect when the watchdog is already running.
Quoting the AM335x reference manual:
To modify the timer counter value (the WDT_WCRR register),
prescaler ratio (the WDT_WCLR[4:2] PTV bit field), delay
configuration value (the WDT_WDLY[31:0] DLY_VALUE bit field), or
the load value (the WDT_WLDR[31:0] TIMER_LOAD bit field), the
watchdog timer must be disabled by using the start/stop sequence
(the WDT_WSPR register).
Currently the timer is stopped in the .probe callback but still there
are possibilities that yield to a situation where omap_wdt_start is
entered with the timer running (e.g. when /dev/watchdog is closed
without stopping and then reopened). In such a case programming the
timeout silently fails!
To circumvent this stop the timer before reprogramming.
Assuming one of the first things the watchdog user does is setting the
timeout explicitly nothing too bad should happen because this explicit
setting works fine.
Fixes: 7768a13c25 ("[PATCH] OMAP: Add Watchdog driver support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c8fff7bc5b ]
Node 0 might be offline as well as any other numa node,
in this case kernel cannot handle memory allocation and crashes.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 0c3f061c19 ("of: implement of_node_to_nid as a weak function")
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3f2cee73b6 ]
The usbfs API has a peculiar hole: Users are not allowed to reap their
URBs after the device has been disconnected. There doesn't seem to be
any good reason for this; it is an ad-hoc inconsistency.
The patch allows users to issue the USBDEVFS_REAPURB and
USBDEVFS_REAPURBNDELAY ioctls (together with their 32-bit counterparts
on 64-bit systems) even after the device is gone. If no URBs are
pending for a disconnected device then the ioctls will return -ENODEV
rather than -EAGAIN, because obviously no new URBs will ever be able
to complete.
The patch also adds a new capability flag for
USBDEVFS_GET_CAPABILITIES to indicate that the reap-after-disconnect
feature is supported.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Chris Dickens <christopher.a.dickens@gmail.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b8830a4e71 ]
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ab499db80f ]
There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.
This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.
I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.
Fixes: 8d1f7ecd2a ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d194e5d666 ]
The final version of commit 637241a900 ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:
- open of /dev/kmsg checks syslog access permissions by using
check_syslog_permissions() where security_syslog() is not called if
dmesg_restrict is set.
- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
can be executed twice (inside check_syslog_permissions() and then
directly in do_syslog())
With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.
Fixes: 637241a900 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2528a8b8f4 ]
bitmap_parselist("", &mask, nmaskbits) will erroneously set bit zero in
the mask. The same bug is visible in cpumask_parselist() since it is
layered on top of the bitmask code, e.g. if you boot with "isolcpus=",
you will actually end up with cpu zero isolated.
The bug was introduced in commit 4b060420a5 ("bitmap, irq: add
smp_affinity_list interface to /proc/irq") when bitmap_parselist() was
generalized to support userspace as well as kernelspace.
Fixes: 4b060420a5 ("bitmap, irq: add smp_affinity_list interface to /proc/irq")
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 29535f7b79 ]
The current handler of MMC_BLK_CMD_ERR in mmc_blk_issue_rw_rq function
may cause new coming request permanent missing when the ongoing
request (previoulsy started) complete end.
The problem scenario is as follows:
(1) Request A is ongoing;
(2) Request B arrived, and finally mmc_blk_issue_rw_rq() is called;
(3) Request A encounters the MMC_BLK_CMD_ERR error;
(4) In the error handling of MMC_BLK_CMD_ERR, suppose mmc_blk_cmd_err()
end request A completed and return zero. Continue the error handling,
suppose mmc_blk_reset() reset device success;
(5) Continue the execution, while loop completed because variable ret
is zero now;
(6) Finally, mmc_blk_issue_rw_rq() return without processing request B.
The process related to the missing request may wait that IO request
complete forever, possibly crashing the application or hanging the system.
Fix this issue by starting new request when reset success.
Signed-off-by: Ding Wang <justin.wang@spreadtrum.com>
Fixes: 67716327ee ("mmc: block: add eMMC hardware reset support")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4a579da258 ]
Before we reach to connection established we may get an
error event. In this case the core won't teardown this
connection (never established it), so we take care of freeing
it ourselves.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 88dcd2dab5 ]
This patch converts iscsi-target code to use modern kthread.h API
callers for creating RX/TX threads for each new iscsi_conn descriptor,
and releasing associated RX/TX threads during connection shutdown.
This is done using iscsit_start_kthreads() -> kthread_run() to start
new kthreads from within iscsi_post_login_handler(), and invoking
kthread_stop() from existing iscsit_close_connection() code.
Also, convert iscsit_logout_post_handler_closesession() code to use
cmpxchg when determing when iscsit_cause_connection_reinstatement()
needs to sleep waiting for completion.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 497b4050e0 ]
We were allocating memory with memdup_user() but we were never releasing
that memory. This affected pretty much every call to the ioctl, whether
it deduplicated extents or not.
This issue was reported on IRC by Julian Taylor and on the mailing list
by Marcel Ritter, credit goes to them for finding the issue.
Reported-by: Julian Taylor <jtaylor.debian@googlemail.com>
Reported-by: Marcel Ritter <ritter.marcel@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c3f4a1685b ]
The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:
/*
* (...)
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
So better be safe and use kmem_cache_free().
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4e02361232 ]
Warning like this:
drivers/md/md.c: In function "update_array_info":
drivers/md/md.c:6394:26: warning: logical not is only applied
to the left hand side of comparison [-Wlogical-not-parentheses]
!mddev->persistent != info->not_persistent||
Fix it as Neil Brown said:
mddev->persistent != !info->not_persistent ||
Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6224beb12e ]
Fengguang Wu's tests triggered a bug in the branch tracer's start up
test when CONFIG_DEBUG_PREEMPT set. This was because that config
adds some debug logic in the per cpu field, which calls back into
the branch tracer.
The branch tracer has its own recursive checks, but uses a per cpu
variable to implement it. If retrieving the per cpu variable calls
back into the branch tracer, you can see how things will break.
Instead of using a per cpu variable, use the trace_recursion field
of the current task struct. Simply set a bit when entering the
branch tracing and clear it when leaving. If the bit is set on
entry, just don't do the tracing.
There's also the case with lockdep, as the local_irq_save() called
before the recursion can also trigger code that can call back into
the function. Changing that to a raw_local_irq_save() will protect
that as well.
This prevents the recursion and the inevitable crash that follows.
Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com
Cc: stable@vger.kernel.org # 3.10+
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b4875bbe7e ]
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71d126fd28 ]
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5101a1850b ]
To prevent offline stripping of existing file xattrs and relabeling of
them at runtime, EVM allows only newly created files to be labeled. As
pseudo filesystems are not persistent, stripping of xattrs is not a
concern.
Some LSMs defer file labeling on pseudo filesystems. This patch
permits the labeling of existing files on pseudo files systems.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
commit ca4da5dd1f upstream.
__key_link_end is not freeing the associated array edit structure
and this leads to a 512 byte memory leak each time an identical
existing key is added with add_key().
The reason the add_key() system call returns okay is that
key_create_or_update() calls __key_link_begin() before checking to see
whether it can update a key directly rather than adding/replacing - which
it turns out it can. Thus __key_link() is not called through
__key_instantiate_and_link() and __key_link_end() must cancel the edit.
CVE-2015-1333
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c9cd9b18da)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 01447e9f04 ]
legacy setcrtc ioctl does take a 32 bit value which might indeed
overflow
the checks of crtc_req->x > INT_MAX and crtc_req->y > INT_MAX aren't
needed any more with this
v2: -polish the annotation according to Daniel's comment
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c7518794a ]
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3496810663 ]
virt_dev->num_cached_rings counts on freed ring and is not updated
correctly. In xhci_free_or_cache_endpoint_ring() function, the free ring
is added into cache and then num_rings_cache is incremented as below:
virt_dev->ring_cache[rings_cached] =
virt_dev->eps[ep_index].ring;
virt_dev->num_rings_cached++;
here, free ring pointer is added to a current index and then
index is incremented.
So current index always points to empty location in the ring cache.
For getting available free ring, current index should be decremented
first and then corresponding ring buffer value should be taken from ring
cache.
But In function xhci_endpoint_init(), the num_rings_cached index is
accessed before decrement.
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
virt_dev->num_rings_cached--;
This is bug in manipulating the index of ring cache.
And it should be as below:
virt_dev->num_rings_cached--;
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
Cc: <stable@vger.kernel.org>
Signed-off-by: Aman Deep <aman.deep@samsung.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit be9d39881f ]
Currently, we're calling musb_start() twice for DRD ports
in some situations. This has been observed to cause enumeration
issues after suspend/resume cycles with AM335x.
In order to fix the problem, we just have to fix the check
on musb_has_gadget() so that it only returns true if
current mode is Host and ignore the fact that we have or
not a gadget driver loaded.
Fixes: ae44df2e21 (usb: musb: call musb_start() only once in OTG mode)
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.11+
Tested-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 83ed07c5db ]
Static checkers complain that the current condition is never true. It
seems pretty likely that it's a typo and "URB" was intended instead of
"USB".
Fixes: 3d97ff63f8 ('usbdevfs: Use scatter-gather lists for large bulk transfers')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aebda61871 ]
This fixes an issue introduced in commit b23c843992 (usb: dwc3:
gadget: fix DEPSTARTCFG for non-EP0 EPs) that made sure we would
only use DEPSTARTCFG once per SetConfig.
The trick is that we should use one DEPSTARTCFG per SetConfig *OR*
SetInterface. SetInterface was completely missed from the original
patch.
This problem became aparent after commit 76e838c9f7 (usb: dwc3:
gadget: return error if command sent to DEPCMD register fails)
added checking of the return status of device endpoint commands.
'Set Endpoint Transfer Resource' command was caught failing
occasionally. This is because the Transfer Resource
Index was not getting reset during a SET_INTERFACE request.
Finally, to fix the issue, was we have to do is make sure that
our start_config_issued flag gets reset whenever we receive a
SetInterface request.
To verify the problem (and its fix), all we have to do is run
test 9 from testusb with 'testusb -t 9 -s 2048 -a -c 5000'.
Tested-by: Huang Rui <ray.huang@amd.com>
Tested-by: Subbaraya Sundeep Bhatta <subbaraya.sundeep.bhatta@xilinx.com>
Fixes: b23c843992 (usb: dwc3: gadget: fix DEPSTARTCFG for non-EP0 EPs)
Cc: <stable@vger.kernel.org> # v3.2+
Signed-off-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d531be2ca2 ]
I have a ST4000DM000 disk. If Linux is booted while the disk is spun down,
the command that sets transfer mode causes the disk to spin up. The
spin-up takes longer than the default 5s timeout, so the command fails and
timeout is reported.
Fix this by increasing the timeout to 15s, which is enough for the disk to
spin up.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 08c85d2a59 ]
Enabling AA on HP 250GB SATA disk VB0250EAVER causes errors:
[ 3.788362] ata3.00: failed to enable AA (error_mask=0x1)
[ 3.789243] ata3.00: failed to enable AA (error_mask=0x1)
Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk.
tj: Collected FPDMA_AA entries and updated comment.
Signed-off-by: Aleksei Mamlin <mamlinav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 474ff0ae23 ]
My static checker complains that:
sound/soc/fsl/imx-wm8962.c:196 imx_wm8962_probe() warn:
we tested 'ret' before and it was 'false'
The intent was that we use "ret" to check imx_audmux_v2_configure_port().
Fixes: 8de2ae2a7f ('ASoC: fsl: add imx-wm8962 machine driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Otherwise, Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 815983e9af ]
The DT-Property "atmel,adc-startup-time" is stored in an u8 for a microsecond
value. When trying to increase the value of STARTUP in Register AT91_ADC_MR
some higher values can't be reached.
Change the type in function parameter and private structure field from u8 to
u32.
Signed-off-by: Jan Leupold <leupold@rsi-elektrotechnik.de>
[nicolas.ferre@atmel.com: change commit message, increase u16 to u32 for startup time]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f451957daf ]
only SAMP_FREQ is writable
Will lead to SAMP_FREQ being written by any attempt to write
to the other exported attributes and hence a rather unexpected
result!
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09f4dcc944 ]
The value sent on the SPI bus is shifted by an erroneous number of bits.
The shift value was already computed in the iio_chan_spec structure and
hence subtracting this argument to 16 yields an erroneous data position
in the SPI stream.
Signed-off-by: JM Friedt <jmfriedt@femto-st.fr>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7444a072c3 ]
ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8974fec7d7 ]
Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file. This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.
# assmuing 4k block size ext4, with delalloc enabled
# skip the first block and write to the second block
xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
# converting to indirect-mapped file, which would move the data blocks
# to the beginning of the file, but extent status cache still marks
# that region as a hole
chattr -e /mnt/ext4/testfile
# delayed allocation writes to the "hole", reclaim the same data block
# again, results in i_blocks corruption
xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
umount /mnt/ext4
e2fsck -nf /dev/sda6
...
Inode 53, i_blocks is 16, should be 8. Fix? no
...
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d6f123a929 ]
Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:
a) delayed allocated extents could bypass the check on eh->eh_entries
and eh->eh_depth
This can be demonstrated by this script
xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
where testfile has two extents but still be converted to non-extent
based file format.
b) only extent length is checked but not the offset, which would result
in data lose (delalloc) or fs corruption (nodelalloc), because
non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
blocks
This can be demostrated by
xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
chattr -e /mnt/ext4/testfile
sync
If delalloc is enabled, dmesg prints
EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
EXT4-fs (dm-4): This should not happen!! Data will be lost
If delalloc is disabled, e2fsck -nf shows corruption
Inode 53, i_size is 5497558142976, should be 4096. Fix? no
Fix the two issues by
a) forcing all delayed allocation blocks to be allocated before checking
eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9705acd63b ]
On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.
However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:
EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks
This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.
Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.
This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.
[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ea6055c46e ]
Since commit 39a6ac11df ("spi/pl022: Devicetree support w/o platform data")
the 'num-cs' parameter cannot be passed through platform data when probing
with devicetree. Instead, it's a required devicetree property.
Fix the binding documentation so the property is properly specified.
Fixes: 39a6ac11df ("spi/pl022: Devicetree support w/o platform data")
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit db0e2aa47f0a45fc56592c25ad370f8836cbb128 ]
Hypercalls submitted by user space tools via the privcmd driver can
take a long time (potentially many 10s of seconds) if the hypercall
has many sub-operations.
A fully preemptible kernel may deschedule such as task in any upcall
called from a hypercall continuation.
However, in a kernel with voluntary or no preemption, hypercall
continuations in Xen allow event handlers to be run but the task
issuing the hypercall will not be descheduled until the hypercall is
complete and the ioctl returns to user space. These long running
tasks may also trigger the kernel's soft lockup detection.
Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
bracket hypercalls that may be preempted. Use these in the privcmd
driver.
When returning from an upcall, call xen_maybe_preempt_hcall() which
adds a schedule point if if the current task was within a preemptible
hypercall.
Since _cond_resched() can move the task to a different CPU, clear and
set xen_in_preemptible_hcall around the call.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The commit 984ff7a3e0 is an upstream backport. In fact, it depends on
commit 395eea6ccf ("rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()")
which has not been backported in 3.18.y.
Before commit 395eea6ccf, rollback_registered_many() uses rtmsg_ifinfo().
The call to this function is done with dev->reg_state set to
NETREG_UNREGISTERING, thus testing this reg_state in rtmsg_ifinfo() is
wrong.
This patch partially reverts commit 984ff7a3e0.
Fixes: 984ff7a3e0 ("rtnl/bond: don't send rtnl msg for unregistered iface")
Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This reverts commit e73268af3e.
Seems like an incorrect merge on my end, will pick the upstream
version again next.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This reverts commit ed7f7f145e.
Reverting from stable tree as fix was found to be buggy. New fix pending.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 514d748f69 ]
Commit e4502c63f5 (ufs: deal with nfsd/iget races) made ufs
create inodes with I_NEW flag set. However ufs_mkdir() never cleared
this flag. Thus if someone ever tried to lookup the directory by inode
number, he would deadlock waiting for I_NEW to be cleared. Luckily this
mostly happens only if the filesystem is exported over NFS since
otherwise we have the inode attached to dentry and don't look it up by
inode number. In rare cases dentry can get freed without inode being
freed and then we'd hit the deadlock even without NFS export.
Fix the problem by clearing I_NEW before instantiating new directory
inode.
Fixes: e4502c63f5
Reported-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 12ecbb4b1d ]
Commit e4502c63f5 (ufs: deal with nfsd/iget races) introduced
unlock_new_inode() call into ufs_add_nondir(). However that function
gets called also from ufs_link() which hands it already initialized
inode and thus unlock_new_inode() complains. The problem is harmless but
annoying.
Fix the problem by opencoding necessary stuff in ufs_link()
Fixes: e4502c63f5
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c89d4319ae ]
commit ceeb0e5d39 upstream.
Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 431dae778a ]
Eric noticed problems with vhost-scsi and virtio-ccw: vhost-scsi
complained about overwriting values in the config space, which
was triggered by a broken implementation of virtio-ccw's config
get/set routines. It was probably sheer luck that we did not hit
this before.
When writing a value to the config space, the WRITE_CONF ccw will
always write from the beginning of the config space up to and
including the value to be set. If the config space up to the value
has not yet been retrieved from the device, however, we'll end up
overwriting values. Keep track of the known config space and update
if needed to avoid this.
Moreover, READ_CONF will only read the number of bytes it has been
instructed to retrieve, so we must not copy more than that to the
buffer, or we might overwrite trailing values.
Reported-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Tested-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9fc2b4b436 ]
Before calling into the filesystem, vfs_setxattr calls
security_inode_setxattr, which ends up calling selinux_inode_setxattr in
our case. That returns -EOPNOTSUPP whenever SBLABEL_MNT is not set.
SBLABEL_MNT was supposed to be set by sb_finish_set_opts, which sets it
only if selinux_is_sblabel_mnt returns true.
The selinux_is_sblabel_mnt logic was broken by eadcabc697 "SELinux: do
all flags twiddling in one place", which didn't take into the account
the SECURITY_FS_USE_NATIVE behavior that had been introduced for nfs
with eb9ae68650 "SELinux: Add new labeling type native labels".
This caused setxattr's of security labels over NFSv4.2 to fail.
Cc: stable@kernel.org # 3.13
Cc: Eric Paris <eparis@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Reported-by: Richard Chan <rc556677@outlook.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the stable dependency]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f14e9ad17f ]
ffs_closed can race with configfs_rmdir which will call config_item_release, so
add an extra check to avoid calling the unregister_gadget_item with an null
gadget item.
Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 04c17341b4 ]
When building the kernel with 32-bit binutils built with support
only for the i386 target, we get the following warning:
arch/x86/kernel/head_32.S:66: Warning: shift count out of range (32 is not between 0 and 31)
The problem is that in that case, binutils' internal type
representation is 32-bit wide and the shift range overflows.
In order to fix this, manipulate the shift expression which
creates the 4GiB constant to not overflow the shift count.
Suggested-by: Michael Matz <matz@suse.de>
Reported-and-tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 93e3bce628 ]
The warning message in prepend_path is unclear and outdated. It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead. Unfortunately the warning reads like a general warning,
making it unclear what to do with it.
Remove the warning. The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.
Cc: stable@vger.kernel.org
Reported-by: Ivan Delalande <colona@arista.com>
Reported-by: Omar Sandoval <osandov@osandov.com>
Fixes: f48cfddc67 ("vfs: In d_path don't call d_dname on a mount point")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 42720138b0 ]
Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
so I'm including this one for stable@ as well. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 85e84ba310 ]
On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.
On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.
If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.
The same condition exists when trapping to enable VFP for the guest.
The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.
The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.
Reported-by: Vikram Sethi <vikrams@codeaurora.org>
Cc: stable@kernel.org # v3.9+
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0b3fff54bc ]
Make sure that we are skipping over large PTEs while walking
the page-table tree.
Cc: stable@kernel.org
Fixes: 5c34c403b7 ("iommu/amd: Fix memory leak in free_pagetable")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 286fb1cc32 ]
Some of the macros defined in kvm_arm.h are useful in assembly files, but are
not compatible with the assembler. Change any C language integer constant
definitions using appended U, UL, or ULL to the UL() preprocessor macro. Also,
add a preprocessor include of the asm/memory.h file which defines the UL()
macro.
Fixes build errors like these when using kvm_arm.h in assembly
source files:
Error: unexpected characters following instruction at operand 3 -- `and x0,x1,#((1U<<25)-1)'
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f104765b4f ]
If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 02590fd855 ]
commit 5f5bc6b1e2 upstream.
Replacing a xattr consists of doing a lookup for its existing value, delete
the current value from the respective leaf, release the search path and then
finally insert the new value. This leaves a time window where readers (getxattr,
listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
so this has security implications.
This change also fixes 2 other existing issues which were:
*) Deleting the old xattr value without verifying first if the new xattr will
fit in the existing leaf item (in case multiple xattrs are packed in the
same item due to name hash collision);
*) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
exist but we have have an existing item that packs muliple xattrs with
the same name hash as the input xattr. In this case we should return ENOSPC.
A test case for xfstests follows soon.
Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
implementation.
Reported-by: Alexandre Oliva <oliva@gnu.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d6b6cb1d3e ]
If there's an existing base chain, we have to allow to change the
default policy without indicating the hook information.
However, if the chain doesn't exists, we have to enforce the presence of
the hook attribute.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 749177ccc7 ]
ip6tables extensions check for this flag to restrict match/target to a
given protocol. Without this flag set, SYNPROXY6 returns an error.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 78146572b9 ]
nfnl_cthelper_parse_tuple() is called from nfnl_cthelper_new(),
nfnl_cthelper_get() and nfnl_cthelper_del(). In each case they pass
a pointer to an nf_conntrack_tuple data structure local variable:
struct nf_conntrack_tuple tuple;
...
ret = nfnl_cthelper_parse_tuple(&tuple, tb[NFCTH_TUPLE]);
The problem is that this local variable is not initialized, and
nfnl_cthelper_parse_tuple() only initializes two fields: src.l3num and
dst.protonum. This leaves all other fields with undefined values
based on whatever is on the stack:
tuple->src.l3num = ntohs(nla_get_be16(tb[NFCTH_TUPLE_L3PROTONUM]));
tuple->dst.protonum = nla_get_u8(tb[NFCTH_TUPLE_L4PROTONUM]);
The symptom observed was that when the rpc and tns helpers were added
then traffic to port 1536 was being sent to user-space.
Signed-off-by: Ian Wilson <iwilson@brocade.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2cf30dc180 ]
When the following filter is used it causes a warning to trigger:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
Modules linked in: bnep lockd grace bluetooth ...
CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
Call Trace:
[<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
[<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
[<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
[<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81159065>] replace_preds+0x3c5/0x990
[<ffffffff811596b2>] create_filter+0x82/0xb0
[<ffffffff81159944>] apply_event_filter+0xd4/0x180
[<ffffffff81152bbf>] event_filter_write+0x8f/0x120
[<ffffffff811db2a8>] __vfs_write+0x28/0xe0
[<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
[<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
[<ffffffff811dc408>] vfs_write+0xb8/0x1b0
[<ffffffff811dc72f>] SyS_write+0x4f/0xb0
[<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
---[ end trace e11028bd95818dcd ]---
Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.
This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression
And give no kernel warning.
Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: stable@vger.kernel.org # 2.6.31+
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2426f39100 ]
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.
Fix the bug by checking actual mode bits before setting S_NOSEC.
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 36c01245eb ]
As reported by Manfred Schlaegl here
http://marc.info/?l=linux-netdev&m=143482089824232&w=2
commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
As net timestamping is influenced by several players (netstamp_needed and
netdev_tstamp_prequeue) Manfred missed a proper timestamp which leads to
CAN frame loss.
As skb timestamping became now mandatory for CAN related skbs this patch
makes sure that received CAN skbs always have a proper timestamp set.
Maybe there's a better solution in the future but this patch fixes the
CAN frame loss so far.
Reported-by: Manfred Schlaegl <manfred.schlaegl@gmx.at>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b18c5d15e8 ]
The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):
CC [M] net/netfilter/nfnetlink_cthelper.o
net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
memcpy(&help->data, nla_data(attr), help->helper->data_len);
^
In file included from include/linux/string.h:17:0,
from include/uapi/linux/uuid.h:25,
from include/linux/uuid.h:23,
from include/linux/mod_devicetable.h:12,
from ./arch/parisc/include/asm/hardware.h:4,
from ./arch/parisc/include/asm/processor.h:15,
from ./arch/parisc/include/asm/spinlock.h:6,
from ./arch/parisc/include/asm/atomic.h:21,
from include/linux/atomic.h:4,
from ./arch/parisc/include/asm/bitops.h:12,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/module.h:9,
from net/netfilter/nfnetlink_cthelper.c:11:
./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
void * memcpy(void * dest,const void *src,size_t count);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8c6cf9cc82 ]
Ignore an existing mount if the locked readonly, nodev or atime
attributes are less permissive than the desired attributes
of the new mount.
On success ensure the new mount locks all of the same readonly, nodev and
atime attributes as the old mount.
The nosuid and noexec attributes are not checked here as this change
is destined for stable and enforcing those attributes causes a
regression in lxc and libvirt-lxc where those applications will not
start and there are no known executables on sysfs or proc and no known
way to create exectuables without code modifications
Cc: stable@vger.kernel.org
Fixes: e51db73532 ("userns: Better restrictions on when proc and sysfs can be mounted")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8687634b79 ]
In RS485 mode, we may want to set the delay_rts_after_send value to 0.
In the datasheet, the 0 value is said to "disable" the Transmitter Timeguard but
this is exactly the expected behavior if we want no delay...
Moreover, if the value was set to non-zero value by device-tree or earlier
ioctl command, it was impossible to change it back to zero.
Reported-by: Sami Pietikäinen <Sami.Pietikainen@wapice.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1b852bceb0 ]
Fresh mounts of proc and sysfs are a very special case that works very
much like a bind mount. Unfortunately the current structure can not
preserve the MNT_LOCK... mount flags. Therefore refactor the logic
into a form that can be modified to preserve those lock bits.
Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
of the filesystem be fully visible in the current mount namespace,
before the filesystem may be mounted.
Move the logic for calling fs_fully_visible from proc and sysfs into
fs/namespace.c where it has greater access to mount namespace state.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ upstream commit aa8165f914 ]
In commit 5491ce3f79 ("mmc: sdhci-pxav3: add support for the Armada
38x SDHCI controller"), the sdhci-pxav3 driver was extended to include
support for the SDHCI controller found in the Armada 38x
processor. This mainly involved adding some MBus window related
configuration.
However, this configuration is currently done too early in ->probe():
it is done before clocks are enabled, while this configuration
involves touching the registers of the controller, which will hang the
SoC if the clock is disabled. It wasn't noticed until now because the
bootloader typically leaves gatable clocks enabled, but in situations
where we have a deferred probe (due to a CD GPIO that cannot be taken,
for example), then the probe will be re-tried later, after a clock
disable has been done in the exit path of the failed probe attempt of
the device. This second probe() will hang the system due to the clock
being disabled.
This can for example be produced on Armada 385 GP, which has a CD GPIO
connected to an I2C PCA9555. If the driver for the PCA9555 is not
compiled into the kernel, then we will have the following sequence of
events:
1. The SDHCI probes
2. It does the MBus configuration (which works, because the clock is
left enabled by the bootloader)
3. It enables the clock
4. It tries to get the CD GPIO, which fails due to the driver being
missing, so -EPROBE_DEFER is returned.
5. Before returning -EPROBE_DEFER, the driver cleans up what was
done, which includes disabling the clock.
6. Later on, the SDHCI probe is tried again.
7. It does the MBus configuration, which hangs because the clock is
no longer enabled.
This commit does the obvious fix of doing the MBus configuration after
the clock has been enabled by the driver.
Fixes: 5491ce3f79 ("mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[jogo: rebased onto 3.18.17]
Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eb686231fc ]
When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 488a9b48e3 ]
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.
Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2c51a97f76 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0 ("neigh: RCU conversion of struct neighbour")
Fixes: a263b30936 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 468479e604 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f60069 ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f98f4514d0 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417 ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f60069 ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2dab80a8b4 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e9 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2d45a02d01 ]
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67 ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fb05e7a89f ]
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f76858
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.
This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.
alloc_skb_with_frags is the same.
The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.
V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer
Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1a040eaca1 ]
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758 ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 671d773297 ]
Since it is possible for vnet_event_napi to end up doing
vnet_control_pkt_engine -> ... -> vnet_send_attr ->
vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
(i.e., in softirq context), kzalloc() should be called with
GFP_ATOMIC from ldc_alloc_exp_dring.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 347d7e45bd ]
The mcp3021 scaling code is dividing the VDD (full-scale) value in
millivolts by the A2D resolution to obtain the scaling factor. When VDD
is 3300mV (the standard value) and the resolution is 12-bit (4096
divisions), the result is a scale factor of 3300/4096, which is always
one. Effectively, the raw A2D reading is always being returned because
no scaling is applied.
This patch fixes the issue and simplifies the register-to-volts
calculation, removing the unneeded "output_scale" struct member.
Signed-off-by: Nick Stevens <Nick.Stevens@digi.com>
Cc: stable@vger.kernel.org # v3.10+
[Guenter Roeck: Dropped unnecessary value check]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit feaff8e5b2 ]
We had a report of a crash while stress testing the NFS client:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
IP: [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task: ffff880036aea7c0 ti: ffff8800791f4000 task.ti: ffff8800791f4000
RIP: 0010:[<ffffffff8127b698>] [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
RSP: 0018:ffff8800791f7c00 EFLAGS: 00010293
RAX: ffff8800791f7c40 RBX: ffff88001f2ad8c0 RCX: ffffe8ffffc80305
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8800791f7c88 R08: ffff88007fc971d8 R09: 279656d600000000
R10: 0000034a01000000 R11: 279656d600000000 R12: ffff88001f2ad918
R13: ffff88001f2ad8c0 R14: 0000000000000000 R15: 0000000100e73040
FS: 0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000150 CR3: 0000000001c0b000 CR4: 00000000000407e0
Stack:
ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
Call Trace:
[<ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
[<ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
[<ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
[<ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
[<ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
[<ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
[<ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
[<ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
[<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
[<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
[<ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
[<ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
[<ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
[<ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<ffffffff810baa1b>] process_one_work+0x1bb/0x410
[<ffffffff810bacc3>] worker_thread+0x53/0x480
[<ffffffff810bac70>] ? process_one_work+0x410/0x410
[<ffffffff810bac70>] ? process_one_work+0x410/0x410
[<ffffffff810c0b38>] kthread+0xd8/0xf0
[<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
[<ffffffff817a1aa2>] ret_from_fork+0x42/0x70
[<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
Jean says:
"Running locktests with a large number of iterations resulted in a
client crash. The test run took a while and hasn't finished after close
to 2 hours. The crash happened right after I gave up and killed the test
(after 107m) with Ctrl+C."
The crash happened because a NULL inode pointer got passed into
locks_get_lock_context. The call chain indicates that file_inode(filp)
returned NULL, which means that f_inode was NULL. Since that's zeroed
out in __fput, that suggests that this filp pointer outlived the last
reference.
Looking at the code, that seems possible. We copy the struct file_lock
that's passed in, but if the task is signalled at an inopportune time we
can end up trying to use that file_lock in rpciod context after the process
that requested it has already returned (and possibly put its filp
reference).
Fix this by taking an extra reference to the filp when we allocate the
lock info, and put it in nfs4_lock_release.
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0ad0b3255a ]
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.
[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Cc: <stable@vger.kernel.org> #v2.6.31+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5a60e87603 ]
rbd_obj_request_create() is called on the main I/O path, so we need to
use GFP_NOIO to make sure allocation doesn't blow back on us. Not all
callers need this, but I'm still hardcoding the flag inside rather than
making it a parameter because a) this is going to stable, and b) those
callers shouldn't really use rbd_obj_request_create() and will be fixed
in the future.
More memory allocation fixes will follow.
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b65657fc24 ]
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.
This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f522a975a8 ]
The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.
This patch introduces a new compatible string "marvell,armada-xp-neta".
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2ba8d1bb8f ]
In order for hibernation to reliably work we need to properly turn
off the SDMA block, sadly after numerous attemps i haven't not found
proper sequence for clean and full shutdown. So simply reset both
SDMA block, this makes hibernation works reliably on sea island GPU
family (CI)
Hibernation and suspend to ram were tested (several times) on :
Bonaire
Hawaii
Mullins
Kaveri
Kabini
Cc: stable@vger.kernel.org
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 161569deaa ]
In order for hibernation to reliably work we need to cleanup more
thoroughly the compute ring. Hibernation is different from suspend
resume as when we resume from hibernation the hardware is first
fully initialize by regular kernel then freeze callback happens
(which correspond to a suspend inside the radeon kernel driver)
and turn off each of the block. It turns out we were not cleanly
shutting down the compute ring. This patch fix that.
Hibernation and suspend to ram were tested (several times) on :
Bonaire
Hawaii
Mullins
Kaveri
Kabini
Changed since v1:
- Factor the ring stop logic into a function taking ring as arg.
Cc: stable@vger.kernel.org
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4df3fd1700 ]
Fujitsu Lifebook E780 sets the sequence number 0x0f to only only of
the two headphones, thus the driver tries to assign another as the
line-out, and this results in the inconsistent mapping between the
created jack ctl and the actual I/O. Due to this, PulseAudio doesn't
handle it properly and gets the silent output.
The fix is to ignore the non-HP sequencer checks.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=99681
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 56a94f1391 ]
Whilst testing cpu hotplug events on kernel configured with
DEBUG_PREEMPT and DEBUG_ATOMIC_SLEEP we get following BUG message,
caused by calling request_irq() and free_irq() in the context of
hotplug notification (which is in this case atomic context).
[ 40.785859] CPU1: Software reset
[ 40.786660] BUG: sleeping function called from invalid context at mm/slub.c:1241
[ 40.786668] in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/1
[ 40.786678] Preemption disabled at:[< (null)>] (null)
[ 40.786681]
[ 40.786692] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc4-00024-g7dca860 #36
[ 40.786698] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 40.786728] [<c0014a00>] (unwind_backtrace) from [<c0011980>] (show_stack+0x10/0x14)
[ 40.786747] [<c0011980>] (show_stack) from [<c0449ba0>] (dump_stack+0x70/0xbc)
[ 40.786767] [<c0449ba0>] (dump_stack) from [<c00c6124>] (kmem_cache_alloc+0xd8/0x170)
[ 40.786785] [<c00c6124>] (kmem_cache_alloc) from [<c005d6f8>] (request_threaded_irq+0x64/0x128)
[ 40.786804] [<c005d6f8>] (request_threaded_irq) from [<c0350b8c>] (exynos4_local_timer_setup+0xc0/0x13c)
[ 40.786820] [<c0350b8c>] (exynos4_local_timer_setup) from [<c0350ca8>] (exynos4_mct_cpu_notify+0x30/0xa8)
[ 40.786838] [<c0350ca8>] (exynos4_mct_cpu_notify) from [<c003b330>] (notifier_call_chain+0x44/0x84)
[ 40.786857] [<c003b330>] (notifier_call_chain) from [<c0022fd4>] (__cpu_notify+0x28/0x44)
[ 40.786873] [<c0022fd4>] (__cpu_notify) from [<c0013714>] (secondary_start_kernel+0xec/0x150)
[ 40.786886] [<c0013714>] (secondary_start_kernel) from [<40008764>] (0x40008764)
Interrupts cannot be requested/freed in the CPU_STARTING/CPU_DYING
notifications which run on the hotplugged cpu with interrupts and
preemption disabled.
To avoid the issue, request the interrupts for all possible cpus in
the boot code. The interrupts are marked NO_AUTOENABLE to avoid a racy
request_irq/disable_irq() sequence. The flag prevents the
request_irq() code from enabling the interrupt immediately.
The interrupt is then enabled in the CPU_STARTING notifier of the
hotplugged cpu and again disabled with disable_irq_nosync() in the
CPU_DYING notifier.
[ tglx: Massaged changelog to match the patch ]
Fixes: 7114cd749a ("clocksource: exynos_mct: use (request/free)_irq calls for local timer registration")
Reported-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Signed-off-by: Damian Eppel <d.eppel@samsung.com>
Cc: m.szyprowski@samsung.com
Cc: kyungmin.park@samsung.com
Cc: daniel.lezcano@linaro.org
Cc: kgene@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1435324984-7328-1-git-send-email-d.eppel@samsung.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6b88f44e16 ]
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3c8e5105e7 ]
The REGSET_VX_LOW ELF notes should contain the lower 64 bit halfes of the
first sixteen 128 bit vector registers. Unfortunately currently we copy
the upper halfes.
Fix this and correctly copy the lower halfes.
Fixes: a62bc07392 ("s390/kdump: add support for vector extension")
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8a8c35fadf ]
Beginning at commit d52d3997f8 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:
===============================
[ INFO: suspicious RCU usage. ]
4.1.0-rc7-next-20150612 #1 Not tainted
-------------------------------
kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by systemd/1:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
#1: (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
#2: (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
stack backtrace:
CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014
Call Trace:
dump_stack+0x4c/0x6e
lockdep_rcu_suspicious+0xe7/0x120
___might_sleep+0x1d5/0x1f0
__might_sleep+0x4d/0x90
kmem_cache_alloc+0x47/0x250
create_object+0x39/0x2e0
kmemleak_alloc_percpu+0x61/0xe0
pcpu_alloc+0x370/0x630
Additional backtrace lines are truncated. In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs. As suggested by Martin KaFai Lau,
these are the clue to the fix. Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org> [3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c5f3b1a51a ]
The kmemleak scanning thread can run for minutes. Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock. Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ... -> __delete_object() function until the lock is
released in scan_object().
When a kmemleak error occurs (e.g. it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects. If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()). This leads to kmemleak_scan potentially triggering a
page fault.
This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed. The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2576c28e3f ]
- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
Since ARCv2 only provides load/load, store/store and all/all, we need
the full barrier
- LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
values were lacking the explicit smp barriers.
- Non LLOCK/SCOND varaints don't need the explicit barriers since that
is implicity provided by the spin locks used to implement the
critical section (the spin lock barriers in turn are also fixed in
this commit as explained above
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d57f727264 ]
When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
away some of the desired LDs.
| do {
| new = old = *ipi_data_ptr;
| new |= 1U << msg;
| } while (cmpxchg(ipi_data_ptr, old, new) != old);
was generating to below
| 8015cef8: ld r2,[r4,0] <-- First LD
| 8015cefc: bset r1,r2,r1
|
| 8015cf00: llock r3,[r4] <-- atomic op
| 8015cf04: brne r3,r2,8015cf10
| 8015cf08: scond r1,[r4]
| 8015cf0c: bnz 8015cf00
|
| 8015cf10: brne r3,r2,8015cf00 <-- Branch doesn't go to orig LD
Although this was fixed by adding a ACCESS_ONCE in this call site, it
seems safer (for now at least) to add compiler barrier to LLSC based
cmpxchg
Reported-by: Chuck Jordan <cjordan@synopsys,com>
Cc: <stable@vger.kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fff3b16d27 ]
Many harddisks (mostly WD ones) have firmware problems and take too
long, more than 10 seconds, to resume from suspend. And this often
exceeds the default DPM watchdog timeout (12 seconds), resulting in a
kernel panic out of sudden.
Since most distros just take the default as is, we should give a bit
more safer value. This patch increases the default value from 12
seconds to one minute, which has been confirmed to be long enough for
such problematic disks.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=91921
Fixes: 70fea60d88 (PM / Sleep: Detect device suspend/resume lockup and log event)
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f1590670ce ]
Current implementation of descriptor init procedure only takes
care about setting/clearing ownership flag in "des0"/"des1"
fields while it is perfectly possible to get unexpected bits
set because of the following factors:
[1] On driver probe underlying memory allocated with
dma_alloc_coherent() might not be zeroed and so
it will be filled with garbage.
[2] During driver operation some bits could be set by SD/MMC
controller (for example error flags etc).
And unexpected and/or randomly set flags in "des0"/"des1"
fields may lead to unpredictable behavior of GMAC DMA block.
This change addresses both items above with:
[1] Use of dma_zalloc_coherent() instead of simple
dma_alloc_coherent() to make sure allocated memory is
zeroed. That shouldn't affect performance because
this allocation only happens once on driver probe.
[2] Do explicit zeroing of both "des0" and "des1" fields
of all buffer descriptors during initialization of
DMA transfer.
And while at it fixed identation of dma_free_coherent()
counterpart as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: arc-linux-dev@synopsys.com
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 294240ffe7 ]
When kzalloc() is called under spin_lock(), GFP_ATOMIC should be
used to avoid sleeping allocation.
The call tree is:
of_pci_range_to_resource()
--> pci_register_io_range() <-- takes spin_lock(&io_range_lock);
--> kzalloc()
Signed-off-by: Jingoo Han <jingoohan1@gmail.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9eb1e57f56 ]
If we are doing an MST transaction and we've gotten HPD and we
lookup the device from the incoming msg, we should take the mgr
lock around it, so that mst_primary and mstb->ports are valid.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 292db1bc6c ]
ext4 isn't willing to map clusters to a non-extent file. Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever. Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.
(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2ac56d3d4b ]
If we create a CRC filesystem, mount it, and create a symlink with
a path long enough that it can't live in the inode, we get a very
strange result upon remount:
# ls -l mnt
total 4
lrwxrwxrwx. 1 root root 929 Jun 15 16:58 link -> XSLM
XSLM is the V5 symlink block header magic (which happens to be
followed by a NUL, so the string looks terminated).
xfs_readlink_bmap() advanced cur_chunk by the size of the header
for CRC filesystems, but never actually used that pointer; it
kept reading from bp->b_addr, which is the start of the block,
rather than the start of the symlink data after the header.
Looks like this problem goes back to v3.10.
Fixing this gets us reading the proper link target, again.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e748c8d09 ]
KVM guest kernels for trap & emulate run in user mode, with a modified
set of kernel memory segments. However the fixmap address is still in
the normal KSeg3 region at 0xfffe0000 regardless, causing problems when
cache alias handling makes use of them when handling copy on write.
Therefore define FIXADDR_TOP as 0x7ffe0000 in the guest kernel mapped
region when CONFIG_KVM_GUEST is defined.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # v3.10+
Patchwork: https://patchwork.linux-mips.org/patch/9887/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 89d96a6f8e ]
Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device. So
try to force them out to memory before calling invalidate_bdev().
This fixes a warning triggered by generic/081:
WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f1a6ae87c ]
When building the kernel with a bare-metal (ELF) toolchain, the -shared
option may not be passed down to collect2, resulting in silent corruption
of the vDSO image (in particular, the DYNAMIC section is omitted).
The effect of this corruption is that the dynamic linker fails to find
the vDSO symbols and libc is instead used for the syscalls that we
intended to optimise (e.g. gettimeofday). Functionally, there is no
issue as the sigreturn trampoline is still intact and located by the
kernel.
This patch fixes the problem by explicitly passing -shared to the linker
when building the vDSO.
Cc: <stable@vger.kernel.org>
Reported-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Reported-by: James Greenlaigh <james.greenhalgh@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c70701131f ]
If a write attempt fails, and the write is queued up for resending to
the server, as opposed to being dropped, then we need to set the
appropriate flag so that nfs_file_fsync() does the right thing.
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1ca018d28d ]
pnfs_do_write() expects the call to pnfs_write_through_mds() to free the
pgio header and to release the layout segment before exiting. The problem
is that nfs_pgio_data_destroy() doesn't actually do this; it only frees
the memory allocated by nfs_generic_pgio().
Ditto for pnfs_do_read()...
Fix in both cases is to add a call to hdr->release(hdr).
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3d9fecf6bf ]
We enable _CRS on all systems from 2008 and later. On older systems, we
ignore _CRS and assume the whole physical address space (excluding RAM and
other devices) is available for PCI devices, but on systems that support
physical address spaces larger than 4GB, it's doubtful that the area above
4GB is really available for PCI.
After d56dbf5bab ("PCI: Allocate 64-bit BARs above 4G when possible"), we
try to use that space above 4GB *first*, so we're more likely to put a
device there.
On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
and card reader devices unassigned (but only after Windows had been
booted). Only the sound device had a 64-bit BAR, so it was the only device
placed above 4GB, and hence the only device that didn't work.
Keep _CRS enabled even on pre-2008 systems if they support physical address
space larger than 4GB.
Fixes: d56dbf5bab ("PCI: Allocate 64-bit BARs above 4G when possible")
Reported-and-tested-by: Juan Dayer <jdayer@outlook.com>
Reported-and-tested-by: Alan Horsfield <alan@hazelgarth.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4839ddc27b ]
Commit fd1d0ddf2a (KVM: arm/arm64: check IRQ number on userland
injection) rightly limited the range of interrupts userspace can
inject in a guest, but failed to consider the (unlikely) case where
a guest is configured with 1024 interrupts.
In this case, interrupts ranging from 1020 to 1023 are unuseable,
as they have a special meaning for the GIC CPU interface.
Make sure that these number cannot be used as an IRQ. Also delete
a redundant (and similarily buggy) check in kvm_set_irq.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6096d91af0 ]
The metadata space map has a simplified 'bootstrap' mode that is
operational when extending the space maps. Whilst in this mode it's
possible for some refcount decrement operations to become queued (eg, as
a result of shadowing one of the bitmap indexes). These decrements were
not being applied when switching out of bootstrap mode.
The effect of this bug was the leaking of a 4k metadata block. This is
detected by the latest version of thin_check as a non fatal error.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b9bcc91993 ]
The memmap freeing code in free_unused_memmap() computes the end of
each memblock by adding the memblock size onto the base. However,
if SPARSEMEM is enabled then the value (start) used for the base
may already have been rounded downwards to work out which memmap
entries to free after the previous memblock.
This may cause memmap entries that are in use to get freed.
In general, you're not likely to hit this problem unless there
are at least 2 memblocks and one of them is not aligned to a
sparsemem section boundary. Note that carve-outs can increase
the number of memblocks by splitting the regions listed in the
device tree.
This problem doesn't occur with SPARSEMEM_VMEMMAP, because the
vmemmap code deals with freeing the unused regions of the memmap
instead of requiring the arch code to do it.
This patch gets the memblock base out of the memblock directly when
computing the block end address to ensure the correct value is used.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 46b0567c85 ]
Commit 6c81fe7925 ("arm64: enable context tracking") did not
update el0_sp_pc to use ct_user_exit, but this appears to have been
unintentional. In commit 6ab6463aeb ("arm64: adjust el0_sync so
that a function can be called") we made x0 available, and in the return
to userspace we call ct_user_enter in the kernel_exit macro.
Due to this, we currently don't correctly inform RCU of the user->kernel
transition, and may erroneously account for time spent in the kernel as
if we were in an extended quiescent state when CONFIG_CONTEXT_TRACKING
is enabled.
As we do record the kernel->user transition, a userspace application
making accesses from an unaligned stack pointer can demonstrate the
imbalance, provoking the following warning:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 3660 at kernel/context_tracking.c:75 context_tracking_enter+0xd8/0xe4()
Modules linked in:
CPU: 2 PID: 3660 Comm: a.out Not tainted 4.1.0-rc7+ #8
Hardware name: ARM Juno development board (r0) (DT)
Call trace:
[<ffffffc000089914>] dump_backtrace+0x0/0x124
[<ffffffc000089a48>] show_stack+0x10/0x1c
[<ffffffc0005b3cbc>] dump_stack+0x84/0xc8
[<ffffffc0000b3214>] warn_slowpath_common+0x98/0xd0
[<ffffffc0000b330c>] warn_slowpath_null+0x14/0x20
[<ffffffc00013ada4>] context_tracking_enter+0xd4/0xe4
[<ffffffc0005b534c>] preempt_schedule_irq+0xd4/0x114
[<ffffffc00008561c>] el1_preempt+0x4/0x28
[<ffffffc0001b8040>] exit_files+0x38/0x4c
[<ffffffc0000b5b94>] do_exit+0x430/0x978
[<ffffffc0000b614c>] do_group_exit+0x40/0xd4
[<ffffffc0000c0208>] get_signal+0x23c/0x4f4
[<ffffffc0000890b4>] do_signal+0x1ac/0x518
[<ffffffc000089650>] do_notify_resume+0x5c/0x68
---[ end trace 963c192600337066 ]---
This patch adds the missing ct_user_exit to the el0_sp_pc entry path,
correcting the context tracking for this case.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Fixes: 6c81fe7925 ("arm64: enable context tracking")
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e2d997366d ]
According to the PSCI specification and the SMC/HVC calling
convention, PSCI function_ids that are not implemented must
return NOT_SUPPORTED as return value.
Current KVM implementation takes an unhandled PSCI function_id
as an error and injects an undefined instruction into the guest
if PSCI implementation is called with a function_id that is not
handled by the resident PSCI version (ie it is not implemented),
which is not the behaviour expected by a guest when calling a
PSCI function_id that is not implemented.
This patch fixes this issue by returning NOT_SUPPORTED whenever
the kvm PSCI call is executed for a function_id that is not
implemented by the PSCI kvm layer.
Cc: <stable@vger.kernel.org> # 3.18+
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 469d7d22ce ]
The i2c_master_recv() uses readsize to receive data from i2c but compares
to size of rdbuf which is always 27. This would cause problem when the
max_fingers is not 5. Change the comparison value to readsize instead.
Fixes: 36874c7e21 ("Input: pixcir_i2c_ts - support up to 5 fingers and
hardware tracking IDs:)
Cc: stable@vger.kernel.org
Signed-off-by: Frodo Lai <frodo_lai@bcmcom.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9d75f08946 ]
tpm_ibmvtpm_probe() calls ibmvtpm_reset_crq(ibmvtpm) without having yet
set the virtual device in the ibmvtpm structure. So in ibmvtpm_reset_crq,
the phype call contains empty unit addresses, ibmvtpm->vdev->unit_address.
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com>
Reviewed-by: Ashley Lai <ashley@ahsleylai.com>
Cc: <stable@vger.kernel.org>
Fixes: 132f762947 ("drivers/char/tpm: Add new device driver to support IBM vTPM")
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 921cc29473 ]
The way the mask is generated in regmap_field_init() is wrong.
Indeed, a field initialized with msb = 31 and lsb = 0 provokes a shift
overflow while calculating the mask field.
On some 32 bits architectures, such as x86, the generated mask is 0,
instead of the expected 0xffffffff.
This patch uses GENMASK() to fix the problem, as this macro is already safe
regarding shift overflow.
Signed-off-by: Maxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4b200b4604 ]
This fixes a several year old regression that I found while trying
to get the Yoga 3 11 to work. The ideapad_rfk_set function is meant
to send a command to the embedded controller through ACPI, but
as of c1f73658ed, it sends the index of the rfkill device instead
of the command, and ignores the opcode field.
This changes it back to the original behavior, which indeed
flips the rfkill state as seen in the debugfs interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c1f73658ed ("ideapad: pass ideapad_priv as argument (part 2)")
Cc: stable@vger.kernel.org # v2.6.38+
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f6a6fda29 ]
If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case. In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts. So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first. And only after recovering
journal it can do the new updates.
The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.htmlhttp://comments.gmane.org/gmane.comp.file-systems.ext4/48841
[ Fixed bug in patch which allowed a non-negative error return from
jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
was causing xfstests ext4/306 to fail. -- Ted ]
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 15b8d2c41f ]
In big endian mode regmap_bulk_read gives incorrect data
for byte reads.
This is because memcpy of a single byte from an address
after full word read gives different results when
endianness differs. ie. we get little-end in LE and big-end in BE.
Signed-off-by: Arun Chandran <achandran@mvista.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3dc196eae1 ]
Fix the hbm power gating state machine so it will wait till it receives
confirmation interrupt for the PG_ISOLATION_EXIT message.
In process of the suspend flow the devices first have to exit from the
power gating state (runtime pm resume).
If we do not handle the confirmation interrupt after sending
PG_ISOLATION_EXIT message, we may receive it already after the suspend
flow has changed the device state and interrupt will be interpreted as a
spurious event, consequently link reset will be invoked which will
prevent the device from completing the suspend flow
kernel: [6603] mei_reset:136: mei_me 0000:00:16.0: powering down: end of reset
kernel: [476] mei_me_irq_thread_handler:643: mei_me 0000:00:16.0: function called after ISR to handle the interrupt processing.
kernel: mei_me 0000:00:16.0: FW not ready: resetting
Cc: <stable@vger.kernel.org> #3.18+
Cc: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=86241
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770397
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2fb22a8042 ]
Disable write buffering on the Toshiba ToPIC95 if it is enabled by
somebody (it is not supposed to be a power-on default according to
the datasheet). On the ToPIC95, practically no 32-bit Cardbus card
will work under heavy load without locking up the whole system if
this is left enabled. I tried about a dozen. It does not affect
16-bit cards. This is similar to the O2 bugs in early controller
revisions it seems.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=55961
Cc: <stable@vger.kernel.org>
Signed-off-by: Ryan C. Underwood <nemesis@icequake.net>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bdf96838ae ]
The commit cf108bca46: "ext4: Invert the locking order of page_lock
and transaction start" caused __ext4_journalled_writepage() to drop
the page lock before the page was written back, as part of changing
the locking order to jbd2_journal_start -> page_lock. However, this
introduced a potential race if there was a truncate racing with the
data=journalled writeback mode.
Fix this by grabbing the page lock after starting the journal handle,
and then checking to see if page had gotten truncated out from under
us.
This fixes a number of different warnings or BUG_ON's when running
xfstests generic/086 in data=journalled mode, including:
jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
c0, 164), jh->b_transaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0
- and -
kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
...
Call Trace:
[<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
[<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
[<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
[<c027d883>] ? lock_buffer+0x36/0x36
[<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
[<c0229139>] do_invalidatepage+0x22/0x26
[<c0229198>] truncate_inode_page+0x5b/0x85
[<c022934b>] truncate_inode_pages_range+0x156/0x38c
[<c0229592>] truncate_inode_pages+0x11/0x15
[<c022962d>] truncate_pagecache+0x55/0x71
[<c02b913b>] ext4_setattr+0x4a9/0x560
[<c01ca542>] ? current_kernel_time+0x10/0x44
[<c026c4d8>] notify_change+0x1c7/0x2be
[<c0256a00>] do_truncate+0x65/0x85
[<c0226f31>] ? file_ra_state_init+0x12/0x29
- and -
WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
irty_metadata+0x14a/0x1ae()
...
Call Trace:
[<c01b879f>] ? console_unlock+0x3a1/0x3ce
[<c082cbb4>] dump_stack+0x48/0x60
[<c0178b65>] warn_slowpath_common+0x89/0xa0
[<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
[<c0178bef>] warn_slowpath_null+0x14/0x18
[<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
[<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
[<c02b2f44>] write_end_fn+0x40/0x53
[<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
[<c02b59e7>] ext4_writepage+0x354/0x3b8
[<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
[<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
[<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
[<c02b5a5b>] __writepage+0x10/0x2e
[<c0225956>] write_cache_pages+0x22d/0x32c
[<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
[<c02b6ee8>] ext4_writepages+0x102/0x607
[<c019adfe>] ? sched_clock_local+0x10/0x10e
[<c01a8a7c>] ? __lock_is_held+0x2e/0x44
[<c01a8ad5>] ? lock_is_held+0x43/0x51
[<c0226dff>] do_writepages+0x1c/0x29
[<c0276bed>] __writeback_single_inode+0xc3/0x545
[<c0277c07>] writeback_sb_inodes+0x21f/0x36d
...
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 565630d503 ]
After secondary CPU boot or hotplug, the active_mm of the idle thread is
&init_mm. The init_mm.pgd (swapper_pg_dir) is only meant for TTBR1_EL1
and must not be set in TTBR0_EL1. Since when active_mm == &init_mm the
TTBR0_EL1 is already set to the reserved value, there is no need to
perform any context reset.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9136291f1d ]
This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).
The cleanup function will free descriptors based on the ownership bit in
the descriptors.
Fixes: ff7b04796d ("dmaengine: DMA engine driver for Marvell XOR engine")
Signed-off-by: Lior Amsalem <alior@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 93563a6a71 ]
For TX transactions, the TXCOMP bit in the Status Register is cleared
when the first data is written into the Transmit Holding Register.
In the lines from at91_do_twi_transfer():
at91_twi_write_data_dma(dev);
at91_twi_write(dev, AT91_TWI_IER, AT91_TWI_TXCOMP);
the TXCOMP interrupt may be enabled before the DMA controller has
actually started to write into the THR. In such a case, the TXCOMP bit
is still set into the Status Register so the interrupt is triggered
immediately. The driver understands that a transaction completion has
occurred but this transaction hasn't started yet. Hence the TXCOMP
interrupt is no longer enabled by at91_do_twi_transfer() but instead
by at91_twi_write_data_dma_callback().
Also, the TXCOMP bit in the Status Register in not a clear on read flag
but a snapshot of the transmission state at the time the Status
Register is read.
When a NACK error is dectected by the I2C controller, the TXCOMP, NACK
and TXRDY bits are set together to 1 in the SR. If enabled, the TXCOMP
interrupt is triggered at the same time. Also setting the TXRDY to 1
triggers the DMA controller to write the next data into the THR. Such
a write resets the TXCOMP bit to 0 in the SR. So depending on when the
interrupt handler reads the SR, it may fail to detect the NACK error
if it relies on the TXCOMP bit. The NACK bit and its interrupt should
be used instead.
For RX transactions, the TXCOMP bit in the Status Register is cleared
when the START bit is set into the Control Register. However to unify
the management of the TXCOMP bit when the DMA controller is used, the
TXCOMP interrupt is now enabled by the DMA callbacks for both TX and
RX transfers.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Cc: stable@vger.kernel.org #3.10 and later
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 27e7cd0165 ]
The pinctrl_gpio_range[] array described a first bank of 32 GPIOs and
a second one of 27 GPIOs. However, since there is a total of 60 MPP
pins that can be muxed as GPIOs, the second bank really has 28 GPIOs.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.15+
Fixes: ca6d9a084b ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 380/385")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 80b3d04fea ]
The latest version of the Armada XP datasheet no longer documents the
VDD cpu_pd functions, which might indicate they are not working and/or
not supported. This commit ensures the pinctrl driver matches the
datasheet.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.7+
Fixes: 463e270f76 ("pinctrl: mvebu: add pinctrl driver for Armada XP")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bc99357f36 ]
After updating to a more recent version of the Armada XP datasheet, we
realized that some of the pins documented as having a NAND-related
functionality in fact did not have such functionality. This commit
updates the pinctrl driver accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.7+
Fixes: 463e270f76 ("pinctrl: mvebu: add pinctrl driver for Armada XP")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e5447d2609 ]
After updating to a more recent version of the Armada 375, we realized
that some of the pins documented as having a NAND-related
functionality in fact did not have such functionality. This commit
updates the pinctrl driver accordingly.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.15+
Fixes: ce3ed59dcd ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 375")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 438881dfdd ]
Due to a mistake, the CS0 and CS1 SPI0 functions were incorrectly
named "spi0-1" instead of just "spi0". This commit fixes that.
This DT binding change does not affect any of the in-tree users.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.7+
Fixes: 5f597bb2be ("pinctrl: mvebu: add pinctrl driver for Armada 370")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 331642fbf2 ]
A new revision of the Marvell Armada 38x hardware datasheet unveiled
that the definition of some of the PCIe functions were not
correct. This commit fixes the pinctrl driver accordingly.
Some PCIe functions simply do not exist, some of the PCIe functions in
fact were corresponding to other functions, and some PCIe functions
have been added.
Note: the seemingly unrelated removal of spi(cs2) on MPP47 is related:
this function is in fact implemented on MPP43, instead of a PCIe
function.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.15+
Fixes: ca6d9a084b ("pinctrl: mvebu: add pin-muxing driver for the Marvell Armada 380/385")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1dace0116d ]
The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign
resources correctly without the information from _CRS that tells us which
address ranges are claimed by which bridge. In the bugs mentioned below,
we incorrectly assign a sound card address (this example is from 1033299):
bus: 00 index 2 [mem 0x80000000-0xfcffffffff]
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f])
pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xbfefffff] (ignored)
pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff] (ignored)
pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored)
ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
pci_root PNP0A08:01: host bridge window [mem 0xbff00000-0xbfffffff] (ignored)
pci 0000:80:01.0: [1106:3288] type 0 class 0x000403
pci 0000:80:01.0: reg 10: [mem 0xbfffc000-0xbfffffff 64bit]
pci 0000:80:01.0: address space collision: [mem 0xbfffc000-0xbfffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff]
pci 0000:80:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit]
BUG: unable to handle kernel paging request at ffffc90000378000
IP: [<ffffffffa0345f63>] azx_create+0x37c/0x822 [snd_hda_intel]
We assigned 0xfd_0000_0000, but that is not in any of the host bridge
windows, and the sound card doesn't work.
Turn on pci=use_crs automatically for this system.
Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368
Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3d56402d3f ]
Add missing invocation of pm_generic_complete() to
acpi_subsys_complete() to allow ->complete callbacks provided
by the drivers of devices using the ACPI PM domain to be executed
during system resume.
Fixes: f25c0ae2b4 (ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend)
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a5dd4b4b05 ]
The commit referenced below deferred waiting for command completion until
the start of the next command, allowing hardware to do the latching
asynchronously. Unfortunately, being ready to accept a new command is the
only indication we have that the previous command is completed. In cases
where we need that state change to be enabled, we must still wait for
completion. For instance, pciehp_reset_slot() attempts to disable anything
that might generate a surprise hotplug on slots that support presence
detection. If we don't wait for those settings to latch before the
secondary bus reset, we negate any value in attempting to prevent the
spurious hotplug.
Create a base function with optional wait and helper functions so that
pcie_write_cmd() turns back into the "safe" interface which waits before
and after issuing a command and add pcie_write_cmd_nowait(), which
eliminates the trailing wait for asynchronous completion. The following
functions are returned to their previous behavior:
pciehp_power_on_slot
pciehp_power_off_slot
pcie_disable_notification
pciehp_reset_slot
The rationale is that pciehp_power_on_slot() enables the link and therefore
relies on completion of power-on. pciehp_power_off_slot() and
pcie_disable_notification() need a wait because data structures may be
freed after these calls and continued signaling from the device would be
unexpected. And, of course, pciehp_reset_slot() needs to wait for the
scenario outlined above.
Fixes: 3461a06866 ("PCI: pciehp: Wait for hotplug command completion lazily")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.17+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09f39a9505 ]
Once the data is sent, we need to preserve the full frame for
the ndlc state machine. If the NDLC ACK is not received in time,
the ndlc layer will resend the same frame.
Having the header byte pulled will corrupt the frame.
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 300f77c08d ]
AR93xx and newer needs to stop rx before tx to avoid getting the DMA
engine or MAC into a stuck state.
This should reduce/fix the occurence of "Failed to stop Tx DMA" logspam.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2fa19535ca ]
If objects are moved back from system memory to VRAM (and spice id
created again) memory is already initialized so we need to set flag
to not clear memory.
If you don't do it after a while using desktop many images turns to
black or transparents.
Signed-off-by: Frediano Ziglio <fziglio@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 64ad6c4889 ]
Since commit bafc9b754f ("vfs: More precise tests in d_invalidate"),
mounted subvolumes can be deleted because d_invalidate() won't fail.
However, we run into problems when we attempt to delete the default
subvolume while it is mounted as the root filesystem:
# btrfs subvol list /
ID 257 gen 306 top level 5 path rootvol
ID 267 gen 334 top level 5 path snap1
# btrfs subvol get-default /
ID 267 gen 334 top level 5 path snap1
# btrfs inspect-internal rootid /
267
# mount -o subvol=/ /dev/vda1 /mnt
# btrfs subvol del /mnt/snap1
Delete subvolume (no-commit): '/mnt/snap1'
ERROR: cannot delete '/mnt/snap1' - Operation not permitted
# findmnt /
findmnt: can't read /proc/mounts: No such file or directory
# ls /proc
#
Markus reported that this same scenario simply led to a kernel oops.
This happens because in btrfs_ioctl_snap_destroy(), we call
d_invalidate() before we check may_destroy_subvol(), which means that we
detach the submounts and drop the dentry before erroring out. Instead,
we should only invalidate the dentry once the deletion has succeeded.
Additionally, the shrink_dcache_sb() isn't necessary; d_invalidate()
will prune the dcache for the deleted subvolume.
Cc: <stable@vger.kernel.org>
Fixes: bafc9b754f ("vfs: More precise tests in d_invalidate")
Reported-by: Markus Schauler <mschauler@gmail.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 45c44b5ff9 ]
Increase the default init stage change timeout from 15 seconds to 30 seconds.
This resolves issues we have seen with some adapters not transitioning
to the first init stage within 15 seconds, which results in adapter
initialization failures.
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 61e749d7e1 ]
The CrystalCove GPIO irqchip doesn't have irq_set_wake callback defined
so we should set IRQCHIP_SKIP_SET_WAKE for it or it would cause an irq
desc's wake_depth unbalanced warning during system resume phase from the
gpio_keys driver, which is the driver for the power button of the ASUS
T100 laptop.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 72e349f112 ]
When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.
If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().
Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:
0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--52.35%-- 0
| |
| |--46.39%-- __hrtimer_start_range_ns
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | |
| | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
| | | |
| | | --100.00%-- 0x7e714
| | | 0x7e714
Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.
Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:
0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--53.83%-- 0
| |
| |--44.73%-- hrtimer_try_to_cancel
| | kvmppc_start_thread
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | __ioctl
| | 0x7e714
| | 0x7e714
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e8d975e73e ]
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fb6d1f7df5 ]
Fix USB 3.0 devices lost in NOTATTACHED state after a hub port reset.
Dissolve the function hub_port_finish_reset() completely and divide the
actions to be taken into those which need to be done after each reset
attempt and those which need to be done after the full procedure is
complete, and place them in the appropriate places in hub_port_reset().
Also, remove an unneeded forward declaration of hub_port_reset().
Verbose Problem Description:
USB 3.0 devices may be "lost for good" during a hub port reset.
This makes Linux unable to boot from USB 3.0 devices in certain
constellations of host controllers and devices, because the USB device is
lost during initialization, preventing the rootfs from being mounted.
The underlying problem is that in the affected constellations, during the
processing inside hub_port_reset(), the hub link state goes from 0 to
SS.inactive after the initial reset, and back to 0 again only after the
following "warm" reset.
However, hub_port_finish_reset() is called after each reset attempt and
sets the state the connected USB device based on the "preliminary" status
of the hot reset to USB_STATE_NOTATTACHED due to SS.inactive, yet when
the following warm reset is complete and hub_port_finish_reset() is
called again, its call to set the device to USB_STATE_DEFAULT is blocked
by usb_set_device_state() which does not allow taking USB devices out of
USB_STATE_NOTATTACHED state.
Thanks to Alan Stern for guiding me to the proper solution and how to
submit it.
Link: http://lkml.kernel.org/r/trinity-25981484-72a9-4d46-bf17-9c1cf9301a31-1432073240136%20()%203capp-gmx-bs27
Signed-off-by: Robert Schlabbach <robert_s@gmx.net>
Cc: stable <stable@vger.kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e18b7975c8 ]
In case of non-Isochronous transfers, we don't
want to clear DWC3_EP_BUSY flag until XferComplete
event. That's because XferInProgress was only enabled
so we can recycle TRBs and usb_requests quicker, but
there are still other pending requests being transferred.
In order to make sure we don't allow for another StartTransfer
command while the HW is still processing other transfers,
we must keep DWC3_EP_BUSY flag set and this what this patch
does.
Fixes: f3af36511e (usb: dwc3: gadget: always enable IOC on
bulk/interrupt transfers)
Cc: <stable@vger.kernel.org> # v3.15+
Reported-by: sundeep subbaraya <sundeep.lkml@gmail.com>
Tested-by: sundeep subbaraya <sundeep.lkml@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6e91f8cb13 ]
If, at the time __rcu_process_callbacks() is invoked, there are callbacks
in Tiny RCU's callback list, but none of them are ready to be invoked,
the current list-management code will knit the non-ready callbacks out
of the list. This can result in hangs and possibly worse. This commit
therefore inserts a check for there being no callbacks that can be
invoked immediately.
This bug is unlikely to occur -- you have to get a new callback between
the time rcu_sched_qs() or rcu_bh_qs() was called, but before we get to
__rcu_process_callbacks(). It was detected by the addition of RCU-bh
testing to rcutorture, which in turn was instigated by Iftekhar Ahmed's
mutation testing. Although this bug was made much more likely by
915e8a4fe4 (rcu: Remove fastpath from __rcu_process_callbacks()), this
did not cause the bug, but rather made it much more probable. That
said, it takes more than 40 hours of rcutorture testing, on average,
for this bug to appear, so this fix cannot be considered an emergency.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 891b1dc022 ]
We need to return error to caller if command is not sent to
controller succesfully.
Signed-off-by: Subbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Fixes: b09bb64239 (usb: dwc3: gadget: implement Global Command support)
Cc: <stable@vger.kernel.org> #v3.5+
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1277fa2ab2 ]
Several of these drivers have there TX randomly blocked for 3~5 seconds while
measuring tx throughput (iperf). The root couse happens in rtl_pci_flush().
The function uses a while-loop to wait for TX queue length to decrease to 0.
The TX queue length counts the number of packets that are queued in the driver.
The driver relys on the TX OK interrupt to return skb and reduce TX queue length.
The interrupt subroutine disables interupts, reads the interrupt registers, and
then clears the registers in the beginning of _rtl_pci_interrupt(). After all
interupts process are finished, the driver invokes enable_interrupt() to enable
interupts. This behavior is normal for an interrupt subroutine.
But enable_interrupt() invokes clear_interrupt() again. This unexpected interrupt
clearing may cleari me fresh TX OK interrupts. These missing interrupts cause TX
queue length to never reduce to 0i, which causes rtl_pci_flush() to be stuck in
unterminated while-loop.
This patch removes clear_interrupt() in enable_interrupt() to avoid this behavior.
Signed-off-by: Vincent Fann <vincent_fann@realtek.com>
Signed-off-by: Shao Fu <shaofu@realtek.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [3.18+]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ce2f6ea1cb ]
The commit df59fa7f4b "spi: orion: support armada extended baud
rates" was too optimistic for the maximum baud rate that the Armada
SoCs can support. According to the hardware datasheet the maximum
frequency supported by the Armada 370 SoC is tclk/4. But for the
Armada XP, Armada 38x and Armada 39x SoCs the limitation is 50MHz and
for the Armada 375 it is tclk/15.
Currently the armada-370-spi compatible is only used by the Armada 370
and the Armada XP device tree. On Armada 370, tclk cannot be higher
than 200MHz. In order to be able to handle both SoCs, we can take the
minimum of 50MHz and tclk/4.
A proper solution is adding a compatible string for each SoC, but it
can't be done as a fix for compatibility reason (we can't modify
device tree that have been already released) and it will be part of a
separate patch.
Fixes: df59fa7f4b (spi: orion: support armada extended baud rates)
Reported-by: Kostya Porotchkin <kostap@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f7134eea05 ]
A temperature conversion can take 750 ms and when possible the
w1_therm slave driver drops the bus_mutex to allow other bus
operations, but that includes operations such as a periodic slave
search, which can remove this slave when it is no longer detected.
If that happens the sl->family_data will be freed and set to NULL
causing w1_slave_show to crash when it wakes up.
Signed-off-by: David Fries <David@Fries.net>
Reported-By: Thorsten Bschorr <thorsten@bschorr.de>
Tested-by: Thorsten Bschorr <thorsten@bschorr.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bde1b29420 ]
Clearly specify that "Cc: stable@vger.kernel.org" is strongly preferred so
that developers understand that the other options should only be used when
absolutely required.
Also specify how upstream commit ids should be referenced in patches
submitted directly to stable (I gathered this from looking at the stable
archives), and specify that any modified patches for stable should be
clearly documented in the patch description.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f2b3dee484 ]
The call to asymmetric_key_hex_to_key_id() from ca_keys_setup()
silently fails with -ENOMEM. Instead of dynamically allocating
memory from a __setup function, this patch defines a variable
and calls __asymmetric_key_hex_to_key_id(), a new helper function,
directly.
This bug was introduced by 'commit 46963b774d ("KEYS: Overhaul
key identification when searching for asymmetric keys")'.
Changelog:
- for clarification, rename hexlen to asciihexlen in
asymmetric_key_hex_to_key_id()
- add size argument to __asymmetric_key_hex_to_key_id() - David Howells
- inline __asymmetric_key_hex_to_key_id() - David Howells
- remove duplicate strlen() calls
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.18
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit be34c62ddf ]
Introduce the helper function srp_wait_for_queuecommand().
Move the definition of scsi_request_fn_active(). Add a comment
above srp_wait_for_queuecommand() that support for scsi-mq needs
to be added.
This patch does not change any functionality. A second call to
srp_wait_for_queuecommand() will be introduced in the next patch.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.13
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5dbb4c6167 ]
41f8bba7f5 ("of/pci: Add pci_register_io_range() and
pci_pio_to_address()") added support for systems with several I/O ranges
described by OF bindings. It modified pci_address_to_pio() look up the
io_range for a given CPU physical address, but the conversion was wrong.
Fix the conversion of address to I/O port.
[bhelgaas: changelog]
Fixes: 41f8bba7f5 ("of/pci: Add pci_register_io_range() and pci_pio_to_address()")
Signed-off-by: Zhichang Yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Liviu Dudau <Liviu.Dudau@arm.com>
CC: stable@vger.kernel.org # v3.18+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ebb6ad73e6 ]
VMID Control 0 BIT[2:1] is VMID Divider Enable and Select
00 = VMID disabled (for OFF mode)
01 = 2 x 50kΩ divider (for normal operation)
10 = 2 x 250kΩ divider (for low power standby)
11 = 2 x 5kΩ divider (for fast start-up)
So WM8903_VMID_RES_250K should be 2 << 1, which is 4.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 14ba3ec1de ]
According to the datasheet:
R10 (0Ah) VMID Impedance Control
BIT 3:2 VMIDSEL DEFAULT 00
DESCRIPTION: VMID impedance selection control
00: 75kΩ output
01: 300kΩ output
10: 2.5kΩ output
WM8737_VMIDSEL_MASK is 0xC (VMIDSEL - [3:2]),
so it needs to left shift WM8737_VMIDSEL_SHIFT bits for setting these bits.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 084609bf72 ]
Setting a dev_pm_ops suspend/resume pair of callbacks but not a set of
hibernation callbacks means those pm functions will not be
called upon hibernation - that leads to system crash on ARM during
freezing if gpio-led is used in combination with CPU led trigger.
It may happen after freeze_noirq stage (GPIO is suspended)
and before syscore_suspend stage (CPU led trigger is suspended)
- usually when disable_nonboot_cpus() is called.
Log:
PM: noirq freeze of devices complete after 1.425 msecs
Disabling non-boot CPUs ...
^ system may crash or stuck here with message (TI AM572x)
WARNING: CPU: 0 PID: 3100 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x22c/0x370()
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in Supervisor mode during Functional access
CPU1: shutdown
^ or here
Fix this by using SIMPLE_DEV_PM_OPS, which appropriately
assigns the suspend and hibernation callbacks and move
led_suspend/led_resume under CONFIG_PM_SLEEP to avoid
build warnings.
Fixes: 73e1ab41a8 (leds: Convert led class driver from legacy pm ops to dev_pm_ops)
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Acked-by: Jacek Anaszewski <j.anaszewski@samsung.com>
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0dd23f9425 ]
Commit 007bea098b (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it. It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.
Fixes: 007bea098b (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: Joe Konno <joe.konno@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 073db4a51e ]
On A MIPS 32-cores machine a BUG_ON was triggered because some acesses to
mtd->usecount were done without taking mtd_table_mutex.
kernel: Call Trace:
kernel: [<ffffffff80401818>] __put_mtd_device+0x20/0x50
kernel: [<ffffffff804086f4>] blktrans_release+0x8c/0xd8
kernel: [<ffffffff802577e0>] __blkdev_put+0x1a8/0x200
kernel: [<ffffffff802579a4>] blkdev_close+0x1c/0x30
kernel: [<ffffffff8022006c>] __fput+0xac/0x250
kernel: [<ffffffff80171208>] task_work_run+0xd8/0x120
kernel: [<ffffffff8012c23c>] work_notifysig+0x10/0x18
kernel:
kernel:
Code: 2442ffff ac8202d8 000217fe <00020336> dc820128 10400003
00000000 0040f809 00000000
kernel: ---[ end trace 080fbb4579b47a73 ]---
Fixed by taking the mutex in blktrans_open and blktrans_release.
Note that this locking is already suggested in
include/linux/mtd/blktrans.h:
struct mtd_blktrans_ops {
...
/* Called with mtd_table_mutex held; no race with add/remove */
int (*open)(struct mtd_blktrans_dev *dev);
void (*release)(struct mtd_blktrans_dev *dev);
...
};
But we weren't following it.
Originally reported by (and patched by) Zhang and Giuseppe,
independently. Improved and rewritten.
Cc: stable@vger.kernel.org
Reported-by: Zhang Xingcai <zhangxingcai@huawei.com>
Reported-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Tested-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e76ef88f6 ]
Fix a race (with some kernel configurations) where a queued
master->pump_messages runs and frees dummy_tx/rx before
spi_unmap_msg is running (or is finished).
This results in the following messages:
BUG: Bad page state in process
page:db7ba030 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x200(arch_1)
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
...
Reported-by: Noralf Trønnes <noralf@tronnes.org>
Suggested-by: Noralf Trønnes <noralf@tronnes.org>
Tested-by: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4d48edb3c3 ]
Commit 7232398abc ("ARM: tegra: Convert PMC to a driver") changed tegra_resume()
location storing from late to early and, as a result, broke suspend on Tegra20.
PMC scratch register 41 is used by tegra LP1 resume code for retrieving stored
physical memory address of common resume function and in the same time used by
tegra20_cpu_shutdown() (shared by Tegra20 cpuidle driver and platform SMP code),
which is storing CPU1 "resettable" status. It implies strict order of scratch
register usage, otherwise resume function address is lost on Tegra20 after
disabling non-boot CPU's on suspend. Fix it by storing "resettable" status in
IRAM instead of PMC scratch register.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: 7232398abc (ARM: tegra: Convert PMC to a driver)
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a18f8e97fe ]
Events defined as watchpoints on nodes must have their config values
converted so that they apply to the respective node's XP. The
function setting new values was using wrong mask for the "port" field,
resulting in corrupted value. Fixed now.
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 82e3b88b67 ]
The maximum size for a DiSEqC command is 6, according to the
userspace API. However, the code allows to write up much more values:
drivers/media/dvb-frontends/cx24116.c:983 cx24116_send_diseqc_msg() error: buffer overflow 'd->msg' 6 <= 23
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d7b76c91f4 ]
If userspace sends an invalid bandwidth, it should either return
EINVAL or switch to auto mode.
This driver will go past an array and program the hardware on a
wrong way if this happens.
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1fa2337a31 ]
The maximum size for a DiSEqC command is 6, according to the
userspace API. However, the code allows to write up much more values:
drivers/media/dvb-frontends/cx24116.c:983 cx24116_send_diseqc_msg() error: buffer overflow 'd->msg' 6 <= 23
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 12f4543f5d ]
The maximum size for a DiSEqC command is 6, according to the
userspace API. However, the code allows to write up to 7 values:
drivers/media/dvb-frontends/s5h1420.c:193 s5h1420_send_master_cmd() error: buffer overflow 'cmd->msg' 6 <= 7
Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f1b6b7d9a ]
When receiving a new iser connect request we serialize
the pending requests by adding the newly created iser connection
to the np accept list and let the login thread process the connect
request one by one (np_accept_wait).
In case we received a disconnect request before the iser_conn
has begun processing (still linked in np_accept_list) we should
detach it from the list and clean it up and not have the login
thread process a stale connection. We do it only when the connection
state is not already terminating (initiator driven disconnect) as
this might lead us to access np_accept_mutex after the np was released
in live shutdown scenarios.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6dfd197283 ]
Laptop with Turks/Thames GPU will freeze if dpm is enabled. It seems
the SMC engine is relying on some state inside the CP engine. CP needs
to chew at least one packet for it to get in good state for dynamic
power management.
This patch simply disabled and re-enable DPM after the ring test which
is enough to avoid the freeze.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e96998fc20 ]
According to the Armada 38x datasheet, the window base address
registers value is set in bits [31:4] of the register and corresponds
to the transaction address bits [47:20].
Therefore, the 32bit base address value should be shifted right by
20bits and left by 4bits, resulting in 16 bit shift right.
The bug as not been noticed yet because if the memory available on
the platform is less than 2GB, then the base address is zero.
[gregory.clement@free-electrons.com: add extra-explanation]
Fixes: a3464ed2f1 (ata: ahci_mvebu: new driver for Marvell Armada 380
AHCI interfaces)
Signed-off-by: Nadav Haklai <nadavh@marvell.com>
Reviewed-by: Omri Itach <omrii@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9c5a18a31b ]
Until recently, mac80211 overwrote all the statistics it could
provide when getting called, but it now relies on the struct
having been zeroed by the caller. This was always the case in
nl80211, but wext used a static struct which could even cause
values from one device leak to another.
Using a static struct is OK (as even documented in a comment)
since the whole usage of this function and its return value is
always locked under RTNL. Not clearing the struct for calling
the driver has always been wrong though, since drivers were
free to only fill values they could report, so calling this
for one device and then for another would always have leaked
values from one to the other.
Fix this by initializing the structure in question before the
driver method call.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=99691
Cc: stable@vger.kernel.org
Reported-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Reported-by: Alexander Kaltsas <alexkaltsas@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 85bd839983 ]
Izumi found the following oops when hot re-adding a node:
BUG: unable to handle kernel paging request at ffffc90008963690
IP: __wake_up_bit+0x20/0x70
Oops: 0000 [#1] SMP
CPU: 68 PID: 1237 Comm: rs:main Q:Reg Not tainted 4.1.0-rc5 #80
Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 Series BIOS Version 1.87 04/28/2015
task: ffff880838df8000 ti: ffff880017b94000 task.ti: ffff880017b94000
RIP: 0010:[<ffffffff810dff80>] [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
RSP: 0018:ffff880017b97be8 EFLAGS: 00010246
RAX: ffffc90008963690 RBX: 00000000003c0000 RCX: 000000000000a4c9
RDX: 0000000000000000 RSI: ffffea101bffd500 RDI: ffffc90008963648
RBP: ffff880017b97c08 R08: 0000000002000020 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a0797c73800
R13: ffffea101bffd500 R14: 0000000000000001 R15: 00000000003c0000
FS: 00007fcc7ffff700(0000) GS:ffff880874800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90008963690 CR3: 0000000836761000 CR4: 00000000001407e0
Call Trace:
unlock_page+0x6d/0x70
generic_write_end+0x53/0xb0
xfs_vm_write_end+0x29/0x80 [xfs]
generic_perform_write+0x10a/0x1e0
xfs_file_buffered_aio_write+0x14d/0x3e0 [xfs]
xfs_file_write_iter+0x79/0x120 [xfs]
__vfs_write+0xd4/0x110
vfs_write+0xac/0x1c0
SyS_write+0x58/0xd0
system_call_fastpath+0x12/0x76
Code: 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8d 47 48 <48> 39 47 48 48 c7 45 e8 00 00 00 00 48 c7 45 f0 00 00 00 00 48
RIP [<ffffffff810dff80>] __wake_up_bit+0x20/0x70
RSP <ffff880017b97be8>
CR2: ffffc90008963690
Reproduce method (re-add a node)::
Hot-add nodeA --> remove nodeA --> hot-add nodeA (panic)
This seems an use-after-free problem, and the root cause is
zone->wait_table was not set to *NULL* after free it in
try_offline_node.
When hot re-add a node, we will reuse the pgdat of it, so does the zone
struct, and when add pages to the target zone, it will init the zone
first (including the wait_table) if the zone is not initialized. The
judgement of zone initialized is based on zone->wait_table:
static inline bool zone_is_initialized(struct zone *zone)
{
return !!zone->wait_table;
}
so if we do not set the zone->wait_table to *NULL* after free it, the
memory hotplug routine will skip the init of new zone when hot re-add
the node, and the wait_table still points to the freed memory, then we
will access the invalid address when trying to wake up the waiting
people after the i/o operation with the page is done, such as mentioned
above.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Reviewed by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 392bceedb1 ]
The driver configures the IDLE condition to interrupt the SDMA engine.
Since the SDMA UART ROM script doesn't clear the IDLE bit itself, this
caused repeated 1-byte DMA transfers, regardless of available data in the
RX FIFO. Also, when returning due to the IDLE condition, the UART ROM
script already increased its counter, causing residue to be off by one.
This patch clears the IDLE condition to avoid repeated 1-byte DMA transfers
and decreases count by when the DMA transfer was aborted due to the IDLE
condition, fixing serial transfers using DMA on i.MX6Q.
Reported-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3f5f1554ee ]
Passive DP->DVI/HDMI dongles on DP++ ports show up to the system as HDMI
devices, as they do not have a sink device in them to respond to any AUX
traffic. When probing these dongles over the DDC, sometimes they will
NAK the first attempt even though the transaction is valid and they
support the DDC protocol. The retry loop inside of
drm_do_probe_ddc_edid() would normally catch this case and try the
transaction again, resulting in success.
That, however, was thwarted by the fix for [1]:
commit 9292f37e1f
Author: Eugeni Dodonov <eugeni.dodonov@intel.com>
Date: Thu Jan 5 09:34:28 2012 -0200
drm: give up on edid retries when i2c bus is not responding
This added code to exit immediately if the return code from the
i2c_transfer function was -ENXIO in order to reduce the amount of time
spent in waiting for unresponsive or disconnected devices. That was
possible because the underlying i2c bit banging algorithm had retries of
its own (which, of course, were part of the reason for the bug the
commit fixes).
Since its introduction in
commit f899fc64cd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Jul 20 15:44:45 2010 -0700
drm/i915: use GMBUS to manage i2c links
we've been flipping back and forth enabling the GMBUS transfers, but
we've settled since then. The GMBUS implementation does not do any
retries, however, bailing out of the drm_do_probe_ddc_edid() retry loop
on first encounter of -ENXIO. This, combined with Eugeni's commit, broke
the retry on -ENXIO.
Retry GMBUS once on -ENXIO on first message to mitigate the issues with
passive adapters.
This patch is based on the work, and commit message, by Todd Previte
<tprevite@gmail.com>.
[1] https://bugs.freedesktop.org/show_bug.cgi?id=41059
v2: Don't retry if using bit banging.
v3: Move retry within gmbux_xfer, retry only on first message.
v4: Initialize GMBUS0 on retry (Ville).
v5: Take index reads into account (Ville).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85924
Cc: Todd Previte <tprevite@gmail.com>
Cc: stable@vger.kernel.org
Tested-by: Oliver Grafe <oliver.grafe@ge.com> (v2)
Tested-by: Jim Bride <jim.bride@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e058c945e0 ]
According to the HSW b-spec we need to try clock divisors of 63
and 72, each 3 or more times, when attempting DP AUX channel
communication on a server chipset. This actually wasn't happening
due to a short-circuit that only checked the DP_AUX_CH_CTL_DONE bit
in status rather than checking that the operation was done and
that DP_AUX_CH_CTL_TIME_OUT_ERROR was not set.
[v2] Implemented alternate solution suggested by Jani Nikula.
Cc: stable@vger.kernel.org
Signed-off-by: Jim Bride <jim.bride@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 425be5679f ]
The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart. It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.
Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count. Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code. The new scheme should be much
clearer to future readers.
While we're at it, improve the comments and rename the array and
common code.
Binutils may start relaxing jumps to non-weak labels. If so,
this change will fix our build, and we may need to backport this
change.
Before, on x86_64:
0000000000000000 <early_idt_handlers>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 00 00 00 00 jmpq 9 <early_idt_handlers+0x9>
5: R_X86_64_PC32 early_idt_handler-0x4
...
48: 66 90 xchg %ax,%ax
4a: 6a 08 pushq $0x8
4c: e9 00 00 00 00 jmpq 51 <early_idt_handlers+0x51>
4d: R_X86_64_PC32 early_idt_handler-0x4
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: e9 00 00 00 00 jmpq 120 <early_idt_handler>
11c: R_X86_64_PC32 early_idt_handler-0x4
After:
0000000000000000 <early_idt_handler_array>:
0: 6a 00 pushq $0x0
2: 6a 00 pushq $0x0
4: e9 14 01 00 00 jmpq 11d <early_idt_handler_common>
...
48: 6a 08 pushq $0x8
4a: e9 d1 00 00 00 jmpq 120 <early_idt_handler_common>
4f: cc int3
50: cc int3
...
117: 6a 00 pushq $0x0
119: 6a 1f pushq $0x1f
11b: eb 03 jmp 120 <early_idt_handler_common>
11d: cc int3
11e: cc int3
11f: cc int3
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4d66e5e9b6 ]
=================================
[ INFO: inconsistent lock state ]
4.1.0-rc7+ #217 Tainted: G O
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(ext_devt_lock){+.?...}, at: [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810bf6b1>] __lock_acquire+0x461/0x1e70
[<ffffffff810c1947>] lock_acquire+0xb7/0x290
[<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
[<ffffffff8143a07d>] blk_alloc_devt+0x6d/0xd0 <-- take the lock in process context
[..]
[<ffffffff810bf64e>] __lock_acquire+0x3fe/0x1e70
[<ffffffff810c00ad>] ? __lock_acquire+0xe5d/0x1e70
[<ffffffff810c1947>] lock_acquire+0xb7/0x290
[<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
[<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
[<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
[<ffffffff8143a60c>] blk_free_devt+0x3c/0x70 <-- take the lock in softirq
[<ffffffff8143bfec>] part_release+0x1c/0x50
[<ffffffff8158edf6>] device_release+0x36/0xb0
[<ffffffff8145ac2b>] kobject_cleanup+0x7b/0x1a0
[<ffffffff8145aad0>] kobject_put+0x30/0x70
[<ffffffff8158f147>] put_device+0x17/0x20
[<ffffffff8143c29c>] delete_partition_rcu_cb+0x16c/0x180
[<ffffffff8143c130>] ? read_dev_sector+0xa0/0xa0
[<ffffffff810e0e0f>] rcu_process_callbacks+0x2ff/0xa90
[<ffffffff810e0dcf>] ? rcu_process_callbacks+0x2bf/0xa90
[<ffffffff81067e2e>] __do_softirq+0xde/0x600
Neil sees this in his tests and it also triggers on pmem driver unbind
for the libnvdimm tests. This fix is on top of an initial fix by Keith
for incorrect usage of mutex_lock() in this path: 2da78092dd "block:
Fix dev_t minor allocation lifetime". Both this and 2da78092dd are
candidates for -stable.
Fixes: 2da78092dd ("block: Fix dev_t minor allocation lifetime")
Cc: <stable@vger.kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5f0ee9d17a ]
Make the check to skip the rate check more lax, so that it applies
to all hw_version 4 models.
This fixes the touchpad not being detected properly on Asus PU551LA
laptops.
Cc: stable@vger.kernel.org
Reported-and-tested-by: David Zafra Gómez <dezeta@klo.es>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1ef9f05835 ]
Fix this from the logs:
usb 7-1: New USB device found, idVendor=046d, idProduct=08ca
...
usb 7-1: Warning! Unlikely big volume range (=3072), cval->res is probably wrong.
usb 7-1: [5] FU [Mic Capture Volume] ch = 1, val = 4608/7680/1
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c2a8b623a0 ]
We unfortunately can't use ~0UL for the scan mask to indicate that the
only valid scan mask is all channels selected. The IIO core needs the exact
mask to work correctly and not a super-set of it. So calculate the masked
based on the channels that are available for a particular device.
Signed-off-by: Paul Cercueil <paul.cercueil@analog.com>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Fixes: 5eda3550a3 ("staging:iio:adis16400: Preallocate transfer message")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7323d59862 ]
Previously, the two voltage channels had the same ID, which didn't cause
conflicts in sysfs only because one channel is named and the other isn't;
this is still violating the spec though, two indexed channels should never
have the same index.
Signed-off-by: Paul Cercueil <paul.cercueil@analog.com>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
Based on 08adb7dabd upstream.
We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
was expected.
We tested it through the recvmsg01 testcase come from LTP testsuit. It set
msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
unexpected (errno 22 is expected):
recvmsg01 4 TFAIL : invalid socket length ; returned -1 (expected -1),
errno 14 (expected 22)
Linux mainline has no this bug for commit 08adb7dab fixes it accidentally.
However, it is too large and complex to be backported to LTS 3.10.
Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
error if msg_sys->msg_namelen was negative, which changed the behaviors
of recvmsg and sendmsg syscall in a lib32 system:
Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
was invalid and then syscall returned -EINVAL, which is correct.
And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
fail and wants to return -EINVAL, however, the outer syscall will return
-EFAULT directly, which is unexpected.
This patch gets the return value of get_compat_msghdr() as well as
copy_msghdr_from_user(), then returns this expected value if
get_compat_msghdr() fails.
Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Hanbing Xu <xuhanbing@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5ca74d43d9)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ee52e56e7b ]
The mute-LED mode control has the fixed on/off states that are
supposed to remain on/off regardless of the master switch. However,
this doesn't work actually because the vmaster hook is called in the
vmaster code itself.
This patch fixes it by calling the hook indirectly after checking the
mute LED mode.
Reported-and-tested-by: Pali Rohár <pali.rohar@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 28423ad283 ]
While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae ("softirq: Use hotplug thread
infrastructure"):
[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
...but despite this note, the patch still calls RCU with IRQs disabled.
This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.
Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 90f91b1298 ]
It seems Broadcom released two devices with conflicting device id. There
are for sure 14e4:4321 PCI devices with BCM4321 (N-PHY) chipset, they
can be found in routers, e.g. Netgear WNR834Bv2. However, according to
Broadcom public sources 0x4321 is also used for 5 GHz BCM4306 (G-PHY).
It's unsure if they meant PCI device id, or "virtual" id (from SPROM).
To distinguish these devices lets check PHY type (G vs. N).
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: <stable@vger.kernel.org> # 3.16+
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9253e667ab ]
Since commit "2426bd456a6 target: Report correct response ..."
we might get a command with data_size that does not fit to
the number of allocated data sg elements. Given that we rely on
cmd t_data_nents which might be different than the data_size,
we sometimes receive local length error completion. The correct
approach would be to take the command data_size into account when
constructing the ib sg_list.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6ab42ff448 ]
On a HP Envy TouchSmart laptop, there are 2 speakers (main speaker
and subwoofer speaker), 1 headphone and 2 DACs, without this fixup,
the headphone will be assigned to a DAC and the 2 speakers will be
assigned to another DAC, this assignment makes the surround-2.1
channels invalid.
To fix it, here using a DAC/pin preference map to bind the main
speaker to 1 DAC and the subwoofer speaker will be assigned to another
DAC.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6c7b03e1ae ]
The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Jonas Andersson <jonas@microbit.se>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 412c98c1be ]
The hwrng output buffers (2) are cast inside of a a struct (caam_rng_ctx)
allocated in one DMA-tagged region. While the kernel's heap allocator
should place the overall struct on a cacheline aligned boundary, the 2
buffers contained within may not necessarily align. Consenquently, the ends
of unaligned buffers may not fully flush, and if so, stale data will be left
behind, resulting in small repeating patterns.
This fix aligns the buffers inside the struct.
Note that not all of the data inside caam_rng_ctx necessarily needs to be
DMA-tagged, only the buffers themselves require this. However, a fix would
incur the expense of error-handling bloat in the case of allocation failure.
Cc: stable@vger.kernel.org
Signed-off-by: Steve Cornelius <steve.cornelius@freescale.com>
Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Hash value passed as argument into rhashtable_lookup_compare could be
computed using different hash table than rhashtable_lookup_compare sees.
This patch passes key into rhashtable_lookup_compare() instead of hash and
compures hash value right in place using the same table as for lookup.
Also it adds comment for rhashtable_hashfn and rhashtable_obj_hashfn:
user must prevent concurrent insert/remove otherwise returned hash value
could be invalid.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Link: http://lkml.kernel.org/r/20150514042151.GA5482@gondor.apana.org.au
Cc: Stable <stable@vger.kernel.org> (v3.17 .. v3.19)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 68a49e51a4 ]
Add vid/pid for the SMK branded third-party PS3 Bluetooth remote and enable
support in the hid-sony driver.
Signed-off-by: Frank Praznik <frank.praznik@oh.rr.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9441159344 ]
Haswell significantly improved the performance of sampler_c messages,
but the optimization appears to be off by default. Later platforms
remove this bit, and apparently always enable the optimization.
Improves performance in "Counter Strike: Global Offensive" by 18%
at default settings on Iris Pro.
This may break sampling of paletted formats (P8/A8P8/P8A8). It's
unclear whether it affects sampling of paletted formats in general,
or just the sample_c message (which is never used).
While libva does have support for using paletted formats (primarily
for OSDs), that support appears to have been broken for at least a
year, so I couldn't observe a regression from this:
I tried to get libva-intel to use paletted formats, and observe a
regression...but the only thing I found that used it was mplayer's OSD
(on screen display). Even without my patch, the colors were totally
wrong with that, and it's according to a few distro wikis, that's been
the case for over a year.
If libva's code for paletted formats /is/ broken, they could always
add code to disable this bit using the command validator when fixing
it.
Further investigation from Haihao shows that libva mplayer OSD seems
to work at least on his setup (still unclear what's wron with Ken's),
and that it's not affected by this patch. Quoting the discussion
between Haihao and Ken:
> > > If you use "-vo gl" or "-vo xv", the OSD is solid white text with a black
> > > border around it. I presume that it's supposed to be white with vaapi as
> > > well, but I guess I'm not entirely sure.
> > >
> > > It's possible that the optimization doesn't affect the palette as long as
> > > you never use sample_c with the paletted textures.
> >
> > I verified the palette takes effect in the following way:
> >
> > 1. Only support P8A8 format in the driver
> >
> > 2. ran the above command and I saw white OSD text
> >
> > 3. Only support P4A4 format in the driver and don't use
> > 3DSTATE_SAMPLER_PALETTE_LOAD0 to load the value to the texture palette,
> > so the palette keeps unchanged.
> >
> > 4. ran the above command and I saw black OSD text.
> >
> > 5. Load the right value to the texture palette and ran the above command
> > again, I saw white OSD text.
> >
> > Hence I think sample_c with the paletted textures is used in the driver.
>
> That sounds like the palette is actually working, then. Great :)
>
> I doubt that libva would use sample_c - sampling with a shadow comparison?
> It looks like it just uses sample and sample+killpix.
You are right, libva driver doesn't use sample_c message.
> I'm pretty sure the sample_c optimization just uses the palette memory as
> storage for some stuff, so it's quite possible it just works if you're
> only using sample and sample+killpix.
Thanks for the explanation, it makes sense to me.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
[danvet: Add wa name from Ville's review to the comment and copypaste
the explanation why we don't care about libva (already broken) from
Ken. Also add conclusion from libva devs that&why this is all fine.]
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: "Xiang, Haihao" <haihao.xiang@intel.com>
Cc: libva@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dd36929720 ]
The use of READ_ONCE() causes lots of warnings witht he pending paravirt
spinlock fixes, because those ends up having passing a member to a
'const' structure to READ_ONCE().
There should certainly be nothing wrong with using READ_ONCE() with a
const source, but the helper function __read_once_size() would cause
warnings because it would drop the 'const' qualifier, but also because
the destination would be marked 'const' too due to the use of 'typeof'.
Use a union of types in READ_ONCE() to avoid this issue.
Also make sure to use parenthesis around the macro arguments to avoid
possible operator precedence issues.
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 085e68eeaf ]
The host's decision to enable machine check exceptions should remain
in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
Tested: Built.
On earlier version, tested by injecting machine check
while a guest is spinning.
Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.
Signed-off-by: Ben Serebrin <serebrin@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d0ec8b6f4 ]
Some controller drivers have no need of this callback (spi-altera even
causes a NULL pointer dereference because it doesn't register the callback,
falsely assuming that it is already optional).
Fixes: 30af9b558a ("spi/bitbang: Drop empty setup() functions")
Signed-off-by: Pelle Nilsson <per.nilsson@xelmo.com>
Reviewed-by: Ezequiel Garcia <ezequiel.garcia@vanguardiasur.com.ar>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d3beab16d ]
This driver uses '#ifdef CONFIG_ARCH_SHMOBILE' and '#ifdef CONFIG_ARM'
interchangeably in its sh_dmae_probe function, which causes a build
warning when building for ARM without also enabling shmobile:
dma/sh/shdmac.c: In function sh_dmae_probe:
dma/sh/shdmac.c:696:6: warning: unused variable errirq [-Wunused-variable]
dma/sh/shdmac.c:695:16: warning: unused variable irqflags [-Wunused-variable]
dma/sh/shdmac.c: At top level:
dma/sh/shdmac.c:447:20: warning: sh_dmae_err defined but not used [-Wunused-function]
This changes all the #ifdef to test for CONFIG_ARCH_SHMOBILE to
avoid that warning. An earlier patch from Laurent had fixed the warning
for non-ARM case, but it still remained present in ARM randconfig builds.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 52d6a5ee10 ("DMA: shdma: Fix warnings due to declared but unused symbols")
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fad9dfefea ]
tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.
Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate
Fixes: 977cb0ecf8 ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 98b228f550 ]
Until now, the EFI stub was only setting the 32 bit cmd_line_ptr in
the setup_header structure, so on 64 bit platforms this could be truncated.
This patch adds setting the upper bits of the buffer address in
ext_cmd_line_ptr. This case was likely never hit, as the allocation
for this buffer is done at the lowest available address. Only
x86_64 kernels have this problem, as the 1-1 mapping mandated
by EFI ensures that all memory is 32 bit addressable on 32 bit
platforms. The EFI stub does not support mixed mode, so the
32 bit kernel on 64 bit firmware case does not need to be handled.
Signed-off-by: Roy Franz <roy.franz@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c57dcb566d ]
Some buggy firmware implementations update VariableNameSize on success
such that it does not include the final NUL character which results in
garbage in the efivarfs name entries. Use kzalloc on the efivar_entry
(as is done in efivars.c) to ensure that the name is always
NUL-terminated.
The buggy firmware is:
BIOS Information
Vendor: Intel Corp.
Version: S1200RP.86B.02.02.0005.102320140911
Release Date: 10/23/2014
BIOS Revision: 4.6
System Information
Manufacturer: Intel Corporation
Product Name: S1200RP_SE
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Matthew Garrett <mjg59@coreos.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d43698e8ab ]
Commit 2473238eac ("ihex: add support for CS:IP/EIP records") removes
the "default:" statement in the switch block, making the "return
usage();" line dead code and ihex2fw silently ignoring unknown options.
Restore this statement.
This bug was found by building with HOSTCC=clang and adding
-Wunreachable-code-return to HOSTCFLAGS.
Fixes: 2473238eac ("ihex: add support for CS:IP/EIP records")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 59d2d18cc4 ]
If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical
memory is more than 4GB, dma_map_page may return a valid memory
address which greater than 0xffffffff. As a result, the mlx5 device page
allocator RB tree will be initialized with valid addresses greater than
0xfffffff.
However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's
impossible for the function, free_4k, to release the pages whose
addresses greater than 4GB. Memory leaks. And mlx5_ib module can't
release the pages when user try to remove the module, as a result,
system hang.
[root@rdma05 root]# dmesg | grep addr | head
addr = 3fe384000
addr & PAGE_MASK = fe384000
[root@rdma05 root]# rmmod mlx5_ib <---- hang on
---------------------- cosnole log -----------------
mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X
alloc irq_desc for 139 on node -1
alloc kstat_irqs on node -1
mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
---------------------- cosnole log -----------------
Fixes: bf0bf77f65 ('mlx5: Support communicating arbitrary host page size to firmware')
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f53e579751 ]
lookup_mountpoint can return either NULL or an error value.
Update the test in __detach_mounts to test for an error value
to avoid pathological cases causing a NULL pointer dereferences.
The callers of __detach_mounts should prevent it from ever being
called on an unlinked dentry but don't take any chances.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ebe96e641d ]
AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.
Fixes: 0c5f9b8830 ("RDS: Documentation")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9b174527e7 ]
In the media drivers, the v4l2 core knows about all submodules
and calls into them from a common function. However this cannot
work if the modules that get called are loadable and the
core is built-in. In that case we get
drivers/built-in.o: In function `set_type':
drivers/media/v4l2-core/tuner-core.c:301: undefined reference to `tea5767_attach'
drivers/media/v4l2-core/tuner-core.c:307: undefined reference to `tea5761_attach'
drivers/media/v4l2-core/tuner-core.c:349: undefined reference to `tda9887_attach'
drivers/media/v4l2-core/tuner-core.c:405: undefined reference to `xc4000_attach'
This was working previously, until the IS_ENABLED() macro was used
to replace the construct like
#if defined(CONFIG_DVB_CX24110) || (defined(CONFIG_DVB_CX24110_MODULE) && defined(MODULE))
with the difference that the new code no longer checks whether it is being
built as a loadable module itself.
To fix this, this new patch adds an 'IS_REACHABLE' macro, which evaluates
true in exactly the condition that was used previously. The downside
of this is that this trades an obvious link error for a more subtle
runtime failure, but it is clear that the change that introduced the
link error was unintentional and it seems better to revert it for
now. Also, a similar change was originally created by Trent Piepho
and then reverted by teh change to the IS_ENABLED macro.
Ideally Kconfig would be used to avoid the case of a broken dependency,
or the code restructured in a way to turn around the dependency, but either
way would require much larger changes here.
Fixes: 7b34be71db ("[media] use IS_ENABLED() macro")
See-also: c5dec9fb24 ("V4L/DVB (4751): Fix DBV_FE_CUSTOMISE for card drivers compiled into kernel")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 092a29a40b ]
When the kernel deleted a vti6 interface, this interface was not removed from
the tunnels list. Thus, when the ip6_vti module was removed, this old interface
was found and the kernel tried to delete it again. This was leading to a kernel
panic.
Fixes: 61220ab349 ("vti6: Enable namespace changing")
Signed-off-by: Yao Xiwei <xiwei.yao@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2e7056c433 ]
Looking over the implementation for jhash2 and comparing it to jhash_3words
I realized that the two hashes were in fact very different. Doing a bit of
digging led me to "The new jhash implementation" in which lookup2 was
supposed to have been replaced with lookup3.
In reviewing the patch I noticed that jhash2 had originally initialized a
and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and
c were initialized to initval + (length << 2) + JHASH_INITVAL. However the
changes in jhash_3words simply replaced the initialization of a and b with
JHASH_INITVAL.
This change corrects what I believe was an oversight so that a, b, and c in
jhash_3words all have the same value added consisting of initval + (length
<< 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the
same hash result given the same inputs.
Fixes: 60d509c823 ("The new jhash implementation")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7071b71587 ]
Release references to buffer-heads if ext4_journal_start() fails.
Fixes: 5b61de7575 ("ext4: start handle at least possible moment when renaming files")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit afb7718016 ]
While originally only being intended for outgoing traffic, commit
a00e76349f ("netfilter: x_tables: allow to use cgroup match for
LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
as well, in order to allow for nfacct accounting.
Besides being currently limited to early demuxes only, commit
a00e76349f forgot to add a check if we deal with full sockets,
i.e. in this case not with time wait sockets. TCP time wait sockets
do not have the same memory layout as full sockets, a lower memory
footprint and consequently also don't have a sk_classid member;
probing for sk_classid member there could potentially lead to a
crash.
Fixes: a00e76349f ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks")
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1d0ab25387 ]
We have many places where we want to check if a socket is
not a timewait or request socket. Use a helper to avoid
hard coding this.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 10feb428a5 ]
TCP_SYN_RECV state is currently used by fast open sockets.
Initial TCP requests (the pseudo sockets created when a SYN is received)
are not yet associated to a state. They are attached to their parent,
and the parent is in TCP_LISTEN state.
This commit adds TCP_NEW_SYN_RECV state, so that we can convert
TCP stack to a different schem gradually.
This state is not exported to user space.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e1aeea52f ]
Enable APM X-Gene SoC serial port functionality when using ACPI table to
initialize serial port.
Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b31384fa5d ]
Add a uniform interface by which device drivers can request device
properties from the platform firmware by providing a property name
and the corresponding data type. The purpose of it is to help to
write portable code that won't depend on any particular platform
firmware interface.
The following general helper functions are added:
device_property_present()
device_property_read_u8()
device_property_read_u16()
device_property_read_u32()
device_property_read_u64()
device_property_read_string()
device_property_read_u8_array()
device_property_read_u16_array()
device_property_read_u32_array()
device_property_read_u64_array()
device_property_read_string_array()
The first one allows the caller to check if the given property is
present. The next 5 of them allow single-valued properties of
various types to be retrieved in a uniform way. The remaining 5 are
for reading properties with multiple values (arrays of either numbers
or strings).
The interface covers both ACPI and Device Trees.
This change set includes material from Mika Westerberg and Aaron Lu.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ffdcd955c3 ]
Device Tree is used in many embedded systems to describe the system
configuration to the OS. It supports attaching properties or name-value
pairs to the devices it describe. With these properties one can pass
additional information to the drivers that would not be available
otherwise.
ACPI is another configuration mechanism (among other things) typically
seen, but not limited to, x86 machines. ACPI allows passing arbitrary
data from methods but there has not been mechanism equivalent to Device
Tree until the introduction of _DSD in the recent publication of the
ACPI 5.1 specification.
In order to facilitate ACPI usage in systems where Device Tree is
typically used, it would be beneficial to standardize a way to retrieve
Device Tree style properties from ACPI devices, which is what we do in
this patch.
If a given device described in ACPI namespace wants to export properties it
must implement _DSD method (Device Specific Data, introduced with ACPI 5.1)
that returns the properties in a package of packages. For example:
Name (_DSD, Package () {
ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
Package () {
Package () {"name1", <VALUE1>},
Package () {"name2", <VALUE2>},
...
}
})
The UUID reserved for properties is daffd814-6eba-4d8c-8a91-bc9bbf4aa301
and is documented in the ACPI 5.1 companion document called "_DSD
Implementation Guide" [1], [2].
We add several helper functions that can be used to extract these
properties and convert them to different Linux data types.
The ultimate goal is that we only have one device property API that
retrieves the requested properties from Device Tree or from ACPI
transparent to the caller.
[1] http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-toplevel.htm
[2] http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 18436afdc1 ]
Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
domains") prevents certain options for devices with RMRRs. This even
prevents those devices from getting a 1:1 mapping with 'iommu=pt',
because we don't have the code to handle *preserving* the RMRR regions
when moving the device between domains.
There's already an exclusion for USB devices, because we know the only
reason for RMRRs there is a misguided desire to keep legacy
keyboard/mouse emulation running in some theoretical OS which doesn't
have support for USB in its own right... but which *does* enable the
IOMMU.
Add an exclusion for graphics devices too, so that 'iommu=pt' works
there. We should be able to successfully assign graphics devices to
guests too, as long as the initial handling of stolen memory is
reconfigured appropriately. This has certainly worked in the past.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 849eca7b9d ]
Like other KVM switches, the Aten DVI KVM switch needs a quirk to avoid spewing
errors:
[791759.606542] usb 1-5.4: input irq status -75 received
[791759.614537] usb 1-5.4: input irq status -75 received
[791759.622542] usb 1-5.4: input irq status -75 received
Add it.
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d6ea2f88ac ]
The raphnet.net 4nes4snes and 2nes2snes multi-joystick adapters use a single
HID report descriptor with one report ID per controller. This has the effect
that the inputs of otherwise independent game controllers get packed in one
large joystick device.
With this patch each controller gets its own /dev/input/jsX device, which is
more natural and less confusing than having all inputs going to the same place.
Signed-off-by: Raphael Assenat <raph@raphnet.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 43faadfe96 ]
The device exists with two device IDs instead of one as previously
believed.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 003e817a9e ]
During a stress test these mice kept dropping and reappearing
in runlevel 1 as opposed to 5.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f31c52529 ]
Logitech devices use a vendor protocol to communicate various
information with the device. This protocol is called HID++,
and an exerpt can be found here:
https://drive.google.com/folderview?id=0BxbRzx7vEV7eWmgwazJ3NUFfQ28&usp=shar
The main difficulty which is related to this protocol is that
it is a synchronous protocol using the input reports.
So when we want to get some information from the device, we need
to wait for a matching input report.
This driver introduce this capabilities to be able to support
the multitouch mode of the Logitech Wireless Touchpad T651
(the bluetooth one). The multitouch data is available directly
from the mouse input reports, and we just need to query the device
on connect about its caracteristics.
HID++ and the touchpad features has a specific reporting mode
which uses pure HID++ reports, but Logitech told us not to use
it for this specific device. During QA, they detected that
some bluetooth input reports where lost, and so the only supported
mode is the pointer mode.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Tested-by: Andrew de los Reyes <adlr@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ef567cf9dd ]
Microsoft Natural Wireless Ergonomic Keyboard 7000 has special My
Favorites 1..5 keys which are handled through a vendor-defined usage
page (0xff05).
Apply MS_ERGONOMY quirks handling to USB PID 0x071d (Microsoft Microsoft
2.4GHz Transceiver V1.0) so that the My Favorites 1..5 keys are reported
as KEY_F14..18 events.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=52841
Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 70b69cfb88 ]
Based on a patch from: Nikolai Kondrashov <Nikolai.Kondrashov@redhat.com>
Most of the tablets handled by hid-uclogic already use MULTI_INPUT.
For the ones which are not quirked in usbhid/hidquirks, they have a
custom report descriptor which contains only one report per HID
interface. For those tablets HID_QUIRK_MULTI_INPUT is transparent.
According to https://github.com/DIGImend/tablets, the only problematic
tablet currently handled by hid-uclogic is the TWHA60 v3. This tablet
presents different report descriptors from the ones currently quirked.
This is not a problem per se, given that this tablet is not supported
currently in this version (it needs the same command as a Huion to
start forwarding events).
Reviewed-by: Nikolai Kondrashov <spbnick@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 260463d492 ]
Commit 03e9f0cac5 (pinctrl: clean up after enable refactoring) updated the
documentation to remove mention of disable(), and rename enable() to set_mux().
One in-text mention was forgotten. Fix this.
Fixes: 03e9f0cac5 ('pinctrl: clean up after enable refactoring')
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 78420b5dca ]
Set the SYSCIER as per the values indicated in the documentation.
The value previously used appears to been copied from the r8a7779
implementation but on closer inspection is not correct for the r8a7791.
Fixes: 5f6108bb96 ("ARM: shmobile: r8a7791 SYSC setup code")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ee72f6adfd ]
Set the SYSCIER as per the values indicated in the documentation.
The value previously used appears to been copied from the r8a7779
implementation but on closer inspection is not correct for the r8a7790.
Fixes: a48f165509 ("ARM: shmobile: r8a7790 SYSC setup code")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 91555ce901 ]
The writeable bits in the USR2 register are all "write 1 to
clear" so only write the bits that actually should be cleared.
Fixes: f1f836e420 ("serial: imx: Add Rx Fifo overrun error message")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f026d6b7c ]
Other than enable Receiver Overrun Interrupt Enable (UCR4_OREN)
in start_tx interface, UCR4_OREN should be enabled before enable
of Receiver.
Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4949009eb8 ]
LCD driver is always built for the XTFPGA platform, but its base address
is not configurable, and is wrong for ML605/KC705. Its initialization
locks up KC705 board hardware.
Make the whole driver optional, and its base address and bus width
configurable. Implement 4-bit bus access method.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2ea0d4ec39 ]
There appears to have been some inconsistency and confusion here as on
the r8a7790 these clocks are referred to as SD(HI)1 and SD(HI)2 while on
the r8a7791 and r8a7794 they are referred to as SD(HI)2 and SD(HI)3.
Fixes: 59e79895b9 ("ARM: shmobile: r8a7791: Add clocks")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 8e76d4eecf upstream.
Jovi Zhangwei reported the following problem
Below kernel vm bug can be triggered by tcpdump which mmaped a lot of pages
with GFP_COMP flag.
[Mon May 25 05:29:33 2015] page:ffffea0015414000 count:66 mapcount:1 mapping: (null) index:0x0
[Mon May 25 05:29:33 2015] flags: 0x20047580004000(head)
[Mon May 25 05:29:33 2015] page dumped because: VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page))
[Mon May 25 05:29:33 2015] ------------[ cut here ]------------
[Mon May 25 05:29:33 2015] kernel BUG at mm/migrate.c:1661!
[Mon May 25 05:29:33 2015] invalid opcode: 0000 [#1] SMP
In this case it was triggered by running tcpdump but it's not necessary
reproducible on all systems.
sudo tcpdump -i bond0.100 'tcp port 4242' -c 100000000000 -w 4242.pcap
Compound pages cannot be migrated and it was not expected that such pages
be marked for NUMA balancing. This did not take into account that drivers
such as net/packet/af_packet.c may insert compound pages into userspace
with vm_insert_page. This patch tells the NUMA balancing protection
scanner to skip all VM_MIXEDMAP mappings which avoids the possibility that
compound pages are marked for migration.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Jovi Zhangwei <jovi@cloudflare.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jovi: Backported to 3.18: adjust context]
Signed-off-by: Jovi Zhangwei <jovi@cloudflare.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 45002267e8 ]
Crush temporary buffers are allocated as per replica size configured
by the user. When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash. Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.
Reflects ceph.git commits 6b4d1aa99718e3b367496326c1e64551330fabc0,
234b066ba04976783d15ff2abc3e81b6cc06fb10.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c4c832f89d ]
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e51000db4c ]
There are several places in the driver (all in control paths) where
coherent dma memory is being allocated using either dma_alloc_coherent()
or the deprecated pci_alloc_consistent(). All these calls should be
changed to use dma_zalloc_coherent() to avoid uninitialized fields in
data structures backed by this memory.
Reported-by: Joerg Roedel <jroedel@suse.de>
Tested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6e54030932 ]
421b3885bf "udp: ipv4: Add udp early
demux" introduced a regression that allowed sockets bound to INADDR_ANY
to receive packets from multicast groups that the socket had not joined.
For example a socket that had joined 224.168.2.9 could also receive
packets from 225.168.2.9 despite not having joined that group if
ip_early_demux is enabled.
Fix this by calling ip_check_mc_rcu() in udp_v4_early_demux() to verify
that the multicast packet is indeed ours.
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 31a418986a ]
When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).
In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.
A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).
Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.
The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...
A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9f950415e4 ]
Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.
This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.
In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.
Fixes: 55d8694fa8 ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit beb39db59d ]
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 71d9f6149c ]
br_multicast_query_expired() querier argument is a pointer to
a struct bridge_mcast_querier :
struct bridge_mcast_querier {
struct br_ip addr;
struct net_bridge_port __rcu *port;
};
Intent of the code was to clear port field, not the pointer to querier.
Fixes: 2cd4143192 ("bridge: memorize and export selected IGMP/MLD querier port")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Cc: Linus Lüssing <linus.luessing@web.de>
Cc: Steinar H. Gunderson <sesse@samfundet.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9302d7bb0c ]
sctp_v4_map_v6 was subtly writing and reading from members
of a union in a way the clobbered data it needed to read before
it read it.
Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning
that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the
result.
Reorder things to guarantee correct behaviour no matter what the
union layout is.
This impacts user space clients that open an IPv6 SCTP socket and
receive IPv4 connections. Prior to 299ee user space would see a
sockaddr with AF_INET and a correct address, after 299ee the sockaddr
is AF_INET6, but the address is wrong.
Fixes: 299ee123e1 (sctp: Fixup v4mapped behaviour to comply with Sock API)
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 86e363dc3b ]
For mq qdisc, we add per tx queue qdisc to root qdisc
for display purpose, however, that happens too early,
before the new dev->qdisc is finally set, this causes
q->list points to an old root qdisc which is going to be
freed right before assigning with a new one.
Fix this by moving ->attach() after setting dev->qdisc.
For the record, this fixes the following crash:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 975 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
list_del corruption. prev->next should be ffff8800d1998ae8, but was 6b6b6b6b6b6b6b6b
CPU: 1 PID: 975 Comm: tc Not tainted 4.1.0-rc4+ #1019
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000009 ffff8800d73fb928 ffffffff81a44e7f 0000000047574756
ffff8800d73fb978 ffff8800d73fb968 ffffffff810790da ffff8800cfc4cd20
ffffffff814e725b ffff8800d1998ae8 ffffffff82381250 0000000000000000
Call Trace:
[<ffffffff81a44e7f>] dump_stack+0x4c/0x65
[<ffffffff810790da>] warn_slowpath_common+0x9c/0xb6
[<ffffffff814e725b>] ? __list_del_entry+0x5a/0x98
[<ffffffff81079162>] warn_slowpath_fmt+0x46/0x48
[<ffffffff81820eb0>] ? dev_graft_qdisc+0x5e/0x6a
[<ffffffff814e725b>] __list_del_entry+0x5a/0x98
[<ffffffff814e72a7>] list_del+0xe/0x2d
[<ffffffff81822f05>] qdisc_list_del+0x1e/0x20
[<ffffffff81820cd1>] qdisc_destroy+0x30/0xd6
[<ffffffff81822676>] qdisc_graft+0x11d/0x243
[<ffffffff818233c1>] tc_get_qdisc+0x1a6/0x1d4
[<ffffffff810b5eaf>] ? mark_lock+0x2e/0x226
[<ffffffff817ff8f5>] rtnetlink_rcv_msg+0x181/0x194
[<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
[<ffffffff817ff72e>] ? rtnl_lock+0x17/0x19
[<ffffffff817ff774>] ? __rtnl_unlock+0x17/0x17
[<ffffffff81855dc6>] netlink_rcv_skb+0x4d/0x93
[<ffffffff817ff756>] rtnetlink_rcv+0x26/0x2d
[<ffffffff818544b2>] netlink_unicast+0xcb/0x150
[<ffffffff81161db9>] ? might_fault+0x59/0xa9
[<ffffffff81854f78>] netlink_sendmsg+0x4fa/0x51c
[<ffffffff817d6e09>] sock_sendmsg_nosec+0x12/0x1d
[<ffffffff817d8967>] sock_sendmsg+0x29/0x2e
[<ffffffff817d8cf3>] ___sys_sendmsg+0x1b4/0x23a
[<ffffffff8100a1b8>] ? native_sched_clock+0x35/0x37
[<ffffffff810a1d83>] ? sched_clock_local+0x12/0x72
[<ffffffff810a1fd4>] ? sched_clock_cpu+0x9e/0xb7
[<ffffffff810def2a>] ? current_kernel_time+0xe/0x32
[<ffffffff810b4bc5>] ? lock_release_holdtime.part.29+0x71/0x7f
[<ffffffff810ddebf>] ? read_seqcount_begin.constprop.27+0x5f/0x76
[<ffffffff810b6292>] ? trace_hardirqs_on_caller+0x17d/0x199
[<ffffffff811b14d5>] ? __fget_light+0x50/0x78
[<ffffffff817d9808>] __sys_sendmsg+0x42/0x60
[<ffffffff817d9838>] SyS_sendmsg+0x12/0x1c
[<ffffffff81a50e97>] system_call_fastpath+0x12/0x6f
---[ end trace ef29d3fb28e97ae7 ]---
For long term, we probably need to clean up the qdisc_graft() code
in case it hides other bugs like this.
Fixes: 95dc19299f ("pkt_sched: give visibility to mq slave qdiscs")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ce0e5c522d ]
Commit e9ce7cb6b1 ("xen-netback: Factor queue-specific data into queue
struct") introduced a regression when moving queue-specific data into
the queue struct by failing to set the credit_bytes field. This
prevented bandwidth limiting from working. Initialize the field as it
was done before multiqueue support was added.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b48732e4a4 ]
got a rare NULL pointer dereference in clear_bit
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit adbe088f6f ]
A pair of nested spin locks was introduced in commit 63502b8d0
"dp83640: Fix receive timestamp race condition".
Unfortunately the 'flags' parameter was reused for the inner lock,
clobbering the originally saved IRQ state. This patch fixes the issue
by changing the inner lock to plain spin_lock without irqsave.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a935865c82 ]
Callers of the ext_write function are supposed to hold a mutex that
protects the state of the dialed page, but one caller was missing the
lock from the very start, and over time the code has been changed
without following the rule. This patch cleans up the call sites in
violation of the rule.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 397a253af5 ]
Currently, the calibration function that corrects the initial offsets
among multiple devices only works the first time. If the function is
called more than once, the calibration fails and bogus offsets will be
programmed into the devices.
In a well hidden spot, the device documentation tells that trigger indexes
0 and 1 are special in allowing the TRIG_IF_LATE flag to actually work.
This patch fixes the issue by using one of the special triggers during the
recalibration method.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 47cc84ce0c ]
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.
This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.
This only happens when there is a MLDv2 querier in the network.
The fix will only break out of the loop when there is a failure adding a
multicast address.
The mdb before the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
After the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
Fixes: 08b202b672 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 44f6731d8b ]
The tx_curr_frame_payload field is u32. When we try to calculate a
small negative delta based on it, we end up with a positive integer
close to 2^32 instead. So the tx_bytes pointer increases by about
2^32 for every transmitted frame.
Fix by calculating the delta as a signed long.
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Florian Bruhin <me@the-compiler.org>
Fixes: 7a1e890e21 ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 381c759d99 ]
ip_error does not check if in_dev is NULL before dereferencing it.
IThe following sequence of calls is possible:
CPU A CPU B
ip_rcv_finish
ip_route_input_noref()
ip_route_input_slow()
inetdev_destroy()
dst_input()
With the result that a network device can be destroyed while processing
an input packet.
A crash was triggered with only unicast packets in flight, and
forwarding enabled on the only network device. The error condition
was created by the removal of the network device.
As such it is likely the that error code was -EHOSTUNREACH, and the
action taken by ip_error (if in_dev had been accessible) would have
been to not increment any counters and to have tried and likely failed
to send an icmp error as the network device is going away.
Therefore handle this weird case by just dropping the packet if
!in_dev. It will result in dropping the packet sooner, and will not
result in an actual change of behavior.
Fixes: 251da41301 ("ipv4: Cache ip_error() routes even when not forwarding.")
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Tested-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c78e1746d3 ]
Vijay reported that a loop as simple as ...
while true; do
tc qdisc add dev foo root handle 1: prio
tc filter add dev foo parent 1: u32 match u32 0 0 flowid 1
tc qdisc del dev foo root
rmmod cls_u32
done
... will panic the kernel. Moreover, he bisected the change
apparently introducing it to 78fd1d0ab0 ("netlink: Re-add
locking to netlink_lookup() and seq walker").
The removal of synchronize_net() from the netlink socket
triggering the qdisc to be removed, seems to have uncovered
an RCU resp. module reference count race from the tc API.
Given that RCU conversion was done after e341694e3e ("netlink:
Convert netlink_lookup() to use RCU protected hash table")
which added the synchronize_net() originally, occasion of
hitting the bug was less likely (not impossible though):
When qdiscs that i) support attaching classifiers and,
ii) have at least one of them attached, get deleted, they
invoke tcf_destroy_chain(), and thus call into ->destroy()
handler from a classifier module.
After RCU conversion, all classifier that have an internal
prio list, unlink them and initiate freeing via call_rcu()
deferral.
Meanhile, tcf_destroy() releases already reference to the
tp->ops->owner module before the queued RCU callback handler
has been invoked.
Subsequent rmmod on the classifier module is then not prevented
since all module references are already dropped.
By the time, the kernel invokes the RCU callback handler from
the module, that function address is then invalid.
One way to fix it would be to add an rcu_barrier() to
unregister_tcf_proto_ops() to wait for all pending call_rcu()s
to complete.
synchronize_rcu() is not appropriate as under heavy RCU
callback load, registered call_rcu()s could be deferred
longer than a grace period. In case we don't have any pending
call_rcu()s, the barrier is allowed to return immediately.
Since we came here via unregister_tcf_proto_ops(), there
are no users of a given classifier anymore. Further nested
call_rcu()s pointing into the module space are not being
done anywhere.
Only cls_bpf_delete_prog() may schedule a work item, to
unlock pages eventually, but that is not in the range/context
of cls_bpf anymore.
Fixes: 25d8c0d55f ("net: rcu-ify tcf_proto")
Fixes: 9888faefe1 ("net: sched: cls_basic use RCU")
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 33b4b015e1 ]
Commit <5cf3d46192fc> ("udp: Simplify__udp*_lib_mcast_deliver")
simplified the filter for incoming IPv6 multicast but removed
the check of the local socket address and the UDP destination
address.
This patch restores the filter to prevent sockets bound to a IPv6
multicast IP to receive other UDP traffic link unicast.
Signed-off-by: Henning Rogge <hrogge@gmail.com>
Fixes: 5cf3d46192 ("udp: Simplify__udp*_lib_mcast_deliver")
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 21858cd02d ]
commit 1d13a96c74 ("ipv6: tcp: fix flowlabel value in ACK messages
send from TIME_WAIT") added the flow label in the last TCP packets.
Unfortunately, it was not casted properly.
This patch replace the buggy shift with be32_to_cpu/cpu_to_be32.
Fixes: 1d13a96c74 ("ipv6: tcp: fix flowlabel value in ACK messages")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ed2a80ab7b ]
Before the patch, the command 'ip link add bond2 type bond mode 802.3ad'
causes the kernel to send a rtnl message for the bond2 interface, with an
ifindex 0.
'ip monitor' shows:
0: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 state DOWN group default
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
9: bond2@NONE: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default
link/ether ea:3e:1f:53:92:7b brd ff:ff:ff:ff:ff:ff
[snip]
The patch fixes the spotted bug by checking in bond driver if the interface
is registered before calling the notifier chain.
It also adds a check in rtmsg_ifinfo() to prevent this kind of bug in the
future.
Fixes: d4261e5650 ("bonding: create netlink event when bonding option is changed")
CC: Jiri Pirko <jiri@resnulli.us>
Reported-by: Julien Meunier <julien.meunier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7e14069651 ]
RGMII interfaces come in multiple flavors: RGMII with transmit or
receive internal delay, no delays at all, or delays in both direction.
This change extends the initial check for PHY_INTERFACE_MODE_RGMII to
cover all of these variants since EEE should be allowed for any of these
modes, since it is a property of the RGMII, hence Gigabit PHY capability
more than the RGMII electrical interface and its delays.
Fixes: a59a4d1921 ("phy: add the EEE support and the way to access to the MMD registers")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3f7352bf21 ]
x86 has variable length encoding. x86 JIT compiler is trying
to pick the shortest encoding for given bpf instruction.
While doing so the jump targets are changing, so JIT is doing
multiple passes over the program. Typical program needs 3 passes.
Some very short programs converge with 2 passes. Large programs
may need 4 or 5. But specially crafted bpf programs may hit the
pass limit and if the program converges on the last iteration
the JIT compiler will be producing an image full of 'int 3' insns.
Fix this corner case by doing final iteration over bpf program.
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 343f845b37 ]
FROM_BE16:
'ror %reg, 8' doesn't clear upper bits of the register,
so use additional 'movzwl' insn to zero extend 16 bits into 64
FROM_LE16:
should zero extend lower 16 bits into 64 bit
FROM_LE32:
should zero extend lower 32 bits into 64 bit
Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d66bf7dd27 ]
The code in __netdev_upper_dev_link() has an over-stringent
loop detection logic that actually prevents valid configurations
from working correctly.
In particular, the logic returns an error if an upper device
is already in the list of all upper devices for a given dev.
This particular check seems to be a overzealous as it disallows
perfectly valid configurations. For example:
# ip l a link eth0 name eth0.10 type vlan id 10
# ip l a dev br0 typ bridge
# ip l s eth0.10 master br0
# ip l s eth0 master br0 <--- Will fail
If you switch the last two commands (add eth0 first), then both
will succeed. If after that, you remove eth0 and try to re-add
it, it will fail!
It appears to be enough to simply check adj_list to keeps things
safe.
I've tried stacking multiple devices multiple times in all different
combinations, and either rx_handler registration prevented the stacking
of the device linking cought the error.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Two files that get included when building the multi_v7_defconfig target
fail to build when selecting THUMB2_KERNEL for this configuration.
In both cases, we can just build the file as ARM code, as none of its
symbols are exported to modules, so there are no interworking concerns.
In the iwmmxt.S case, add ENDPROC() declarations so the symbols are
annotated as functions, resulting in the linker to emit the appropriate
mode switches.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(cherry picked from commit 13d1b9575a)
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5ec45a192f ]
Fix this compile issue with gcc-4.4.4:
arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer
arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer
arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer
...
gcc-4.4.4 (at least) has issues when using anonymous unions in
initializers.
Fixes: edc90b7dc4 ("KVM: MMU: fix SMAP virtualization")
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a81157768a ]
The variable "sector" in "raid0_make_request()" was improperly updated
by a call to "sector_div()" which modifies its first argument in place.
Commit 47d68979cc restored this variable
after the call for later re-use. Unfortunetly the restore was done after
the referenced variable "bio" was advanced. This lead to the original
value and the restored value being different. Here we move this line to
the proper place.
One observed side effect of this bug was discarding a file though
unlinking would cause an unrelated file's contents to be discarded.
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: 47d68979cc ("md/raid0: fix bug with chunksize not a power of 2.")
Cc: stable@vger.kernel.org (any that received above backport)
URL: https://bugzilla.kernel.org/show_bug.cgi?id=98501
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
We get a NULL pointer dereference on omap3 for thumb2 compiled kernels:
Internal error: Oops: 80000005 [#1] SMP THUMB2
...
[<c046497b>] (_raw_spin_unlock_irqrestore) from [<c0024375>]
(omap3_enter_idle_bm+0xc5/0x178)
[<c0024375>] (omap3_enter_idle_bm) from [<c0374e63>]
(cpuidle_enter_state+0x77/0x27c)
[<c0374e63>] (cpuidle_enter_state) from [<c00627f1>]
(cpu_startup_entry+0x155/0x23c)
[<c00627f1>] (cpu_startup_entry) from [<c06b9a47>]
(start_kernel+0x32f/0x338)
[<c06b9a47>] (start_kernel) from [<8000807f>] (0x8000807f)
The power management related assembly on omaps needs to interact with
ARM mode bootrom code, so we need to keep most of the related assembly
in ARM mode.
Turns out this error is because of missing ENDPROC for assembly code
as suggested by Stephen Boyd <sboyd@codeaurora.org>. Let's fix the
problem by adding ENDPROC in two places to sleep34xx.S.
Let's also remove the now duplicate custom code for mode switching.
This has been unnecessary since commit 6ebbf2ce43 ("ARM: convert
all "mov.* pc, reg" to "bx reg" for ARMv6+").
And let's also remove the comments about local variables, they are
now just confusing after the ENDPROC.
The reason why ENDPROC makes a difference is it sets .type and then
the compiler knows what to do with the thumb bit as explained at:
https://wiki.ubuntu.com/ARM/Thumb2PortingHowto
Reported-by: Kevin Hilman <khilman@kernel.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
(cherry picked from commit d8a50941c9)
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ac37e2515c ]
dst_orig should be released on error. Function like __xfrm_route_forward()
expects that behavior.
Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(),
which expects the opposite.
Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be
done in case of error.
Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 932df43005 ]
In case of error, the function devm_ioremap() returns NULL
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Fixes: ecfe64d8c5 ("power: reset: Add AT91 reset driver")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 161f873b89 ]
We used to read file_handle twice. Once to get the amount of extra
bytes, and once to fetch the entire structure.
This may be problematic since we do size verifications only after the
first read, so if the number of extra bytes changes in userspace between
the first and second calls, we'll have an incoherent view of
file_handle.
Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 553452e5ff ]
In the case of a DMA mapping error on the last iteration of
the loop of the allocation of memory of the FW monitor we
indeed free the pages, but don't NULL out the page variable
thus allowing for the possibility of setting the FW monitor
variables with invalid data to use.
Fixes: c2d202017d ("iwlwifi: pcie: add firmware monitor capabilities")
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b9a5e5e18f ]
Since acpi_reserve_resources() is defined as a device_initcall(),
there's no guarantee that it will be executed in the right order
with respect to the rest of the ACPI initialization code. On some
systems this leads to breakage if, for example, the address range
that should be reserved for the ACPI fixed registers is given to
the PCI host bridge instead if the race is won by the wrong code
path.
Fix this by turning acpi_reserve_resources() into a void function
and calling it directly from within the ACPI initialization sequence.
Reported-and-tested-by: George McCollister <george.mccollister@gmail.com>
Link: http://marc.info/?t=143092384600002&r=1&w=2
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6e9eac2dce ]
If any memory allocation in resize_stripes fails we will return
-ENOMEM, but in some cases we update conf->pool_size anyway.
This means that if we try again, the allocations will be assumed
to be larger than they are, and badness results.
So only update pool_size if there is no error.
This bug was introduced in 2.6.17 and the patch is suitable for
-stable.
Fixes: ad01c9e375 ("[PATCH] md: Allow stripes to be expanded in preparation for expanding an array")
Cc: stable@vger.kernel.org (v2.6.17+)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5c1ac56b51 ]
In function dmi_present(), dmi_walk_early() calls dmi_table(), which
calls dmi_decode(), which ultimately calls dmi_save_uuid(). This last
function makes a decision based on the value of global variable
dmi_ver. The problem is that this variable is set right _after_
dmi_walk_early() returns. So dmi_save_uuid() always sees dmi_ver == 0
regardless of the actual version implemented.
This causes /sys/class/dmi/id/product_uuid to always use the old
ordering even on systems implementing DMI/SMBIOS 2.6 or later, which
should use the new ordering.
This is broken since kernel v3.8 for legacy DMI implementations and
since kernel v3.10 for SMBIOS 2 implementations. SMBIOS 3
implementations with the 64-bit entry point are not affected.
The first breakage does not matter much as in practice legacy DMI
implementations are always for versions older than 2.6, which is when
the UUID ordering changed. The second breakage is more problematic as
it affects the vast majority of x86 systems manufactured since 2009.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 9f9c9cbb60 ("drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists")
Fixes: 79bae42d51 ("dmi_scan: refactor dmi_scan_machine(), {smbios,dmi}_present()")
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Artem Savkov <artem.savkov@gmail.com>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: stable@vger.kernel.org [v3.10+]
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9507271d96 ]
In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary. Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.
The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ebe9cb3bb1 ]
If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value. Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b0dc2b9bb4 ]
NUMA balancing is meant to be disabled by default on UMA machines but
the check is using nr_node_ids (highest node) instead of
num_online_nodes (online nodes).
The consequences are that a UMA machine with a node ID of 1 or higher
will enable NUMA balancing. This will incur useless overhead due to
minor faults with the impact depending on the workload. These are the
impact on the stats when running a kernel build on a single node machine
whose node ID happened to be 1:
vanilla patched
NUMA base PTE updates 5113158 0
NUMA huge PMD updates 643 0
NUMA page range updates 5442374 0
NUMA hint faults 2109622 0
NUMA hint local faults 2109622 0
NUMA hint local percent 100 100
NUMA pages migrated 0 0
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d045c77c1a ]
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).
The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.
This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1b97937246 ]
Josh Stone reports:
I've discovered a case where both arm and arm64 will miss a ptrace
syscall-exit that they should report. If the syscall is entered
without TIF_SYSCALL_TRACE set, then it goes on the fast path. It's
then possible to have TIF_SYSCALL_TRACE added in the middle of the
syscall, but ret_fast_syscall doesn't check this flag again.
Fix this by always checking for a syscall trace in the fast exit path.
Reported-by: Josh Stone <jistone@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a29ef819f3 ]
According to the imx27 documentation, fec has a 4 Kbyte
memory space map. Moreover, the actual 16 Kbyte mapping
overlaps the SCC (Security Controller) memory register
space. So, we reduce the memory register space to 4 Kbyte.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 9f0749e3eb ("ARM i.MX27: Add devicetree support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 965278dcb8 ]
At boot time we round the memblock limit down to section size in an
attempt to ensure that we will have mapped this RAM with section
mappings prior to allocating from it. When mapping RAM we iterate over
PMD-sized chunks, creating these section mappings.
Section mappings are only created when the end of a chunk is aligned to
section size. Unfortunately, with classic page tables (where PMD_SIZE is
2 * SECTION_SIZE) this means that if a chunk is between 1M and 2M in
size the first 1M will not be mapped despite having been accounted for
in the memblock limit. This has been observed to result in page tables
being allocated from unmapped memory, causing boot-time hangs.
This patch modifies the memblock limit rounding to always round down to
PMD_SIZE instead of SECTION_SIZE. For classic MMU this means that we
will round the memblock limit down to a 2M boundary, matching the limits
on section mappings, and preventing allocations from unmapped memory.
For LPAE there should be no change as PMD_SIZE == SECTION_SIZE.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Stefan Agner <stefan@agner.ch>
Tested-by: Stefan Agner <stefan@agner.ch>
Acked-by: Laura Abbott <labbott@redhat.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0782e63bc6 ]
Ronny reported that the following scenario is not handled correctly:
T1 (prio = 10)
lock(rtmutex);
T2 (prio = 20)
lock(rtmutex)
boost T1
T1 (prio = 20)
sys_set_scheduler(prio = 30)
T1 prio = 30
....
sys_set_scheduler(prio = 10)
T1 prio = 30
The last step is wrong as T1 should now be back at prio 20.
Commit c365c292d0 ("sched: Consider pi boosting in setscheduler()")
only handles the case where a boosted tasks tries to lower its
priority.
Fix it by taking the new effective priority into account for the
decision whether a change of the priority is required.
Reported-by: Ronny Meeus <ronny.meeus@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Fixes: c365c292d0 ("sched: Consider pi boosting in setscheduler()")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7cded342c0 ]
Git commit 152125b7a8
"s390/mm: implement dirty bits for large segment table entries"
broke the pmd_pfn function, it changed the return value from
'unsigned long' to 'int'. This breaks all machine configurations
with memory above the 8TB line.
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 22d3a3c829 ]
No matter how the driver manages its NAPI context, there's no way
sending frames to it from a timer can be correct, since it would
corrupt the internal GRO lists.
To avoid that, always use the non-NAPI path when releasing frames
from the timer.
Cc: stable@vger.kernel.org
Reported-by: Jean Trivelly <jean.trivelly@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 47b4e1fc49 ]
Remove checking tailroom when adding IV as it uses only
headroom, and move the check to the ICV generation that
actually needs the tailroom.
In other case I hit such warning and datapath don't work,
when testing:
- IBSS + WEP
- ath9k with hw crypt enabled
- IPv6 data (ping6)
WARNING: CPU: 3 PID: 13301 at net/mac80211/wep.c:102 ieee80211_wep_add_iv+0x129/0x190 [mac80211]()
[...]
Call Trace:
[<ffffffff817bf491>] dump_stack+0x45/0x57
[<ffffffff8107746a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8107755a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc09ae109>] ieee80211_wep_add_iv+0x129/0x190 [mac80211]
[<ffffffffc09ae7ab>] ieee80211_crypto_wep_encrypt+0x6b/0xd0 [mac80211]
[<ffffffffc09d3fb1>] invoke_tx_handlers+0xc51/0xf30 [mac80211]
[...]
Cc: stable@vger.kernel.org
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f230e8ffc0 ]
This patch fixes an inverted return value of the gpio get_direction
function.
The wrong value causes the direction sysfs entry and GPIO debugfs file
to indicate incorrect GPIO direction settings. In some cases it also
prevents setting GPIO output values.
The problem is also present in all other stable kernel versions since
linux-3.12.
Cc: Stable <stable@vger.kernel.org> # v3.12+
Reported-by: Jochen Henneberg <jh@henneberg-systemdesign.com>
Signed-off-by: Michael Brunner <michael.brunner@kontron.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1e4df6b720 ]
Consider "(u64)insn1.imm << 32 | imm" in the arm64 JIT. Since imm is
signed 32-bit, it is sign-extended to 64-bit, losing the high 32 bits.
The fix is to convert imm to u32 first, which will be zero-extended to
u64 implicitly.
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: 30d3d94cc3 ("arm64: bpf: add 'load 64-bit immediate' instruction")
Signed-off-by: Xi Wang <xi.wang@gmail.com>
[will: removed non-arm64 bits and redundant casting]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09c5b4803a ]
When the LPM policy is set to ATA_LPM_MAX_POWER, the device might
generate a spurious PHY event that cuases errors on the link.
Ignore this event if it occured within 10s after the policy change.
The timeout was chosen observing that on a Dell XPS13 9333 these
spurious events can occur up to roughly 6s after the policy change.
Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dbfe8ef559 ]
Avoton AHCI occasionally sees drive probe timeouts at driver load time.
When this happens SCR_STATUS indicates device detected, but no D2H FIS
reception. Reset the internal link state machines by bouncing
port-enable in the PCS register when this occurs.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e531d0bceb ]
The journal revoke block recovery code does not check r_count for
sanity, which means that an evil value of r_count could result in
the kernel reading off the end of the revoke table and into whatever
garbage lies beyond. This could crash the kernel, so fix that.
However, in testing this fix, I discovered that the code to write
out the revoke tables also was not correctly checking to see if the
block was full -- the current offset check is fine so long as the
revoke table space size is a multiple of the record size, but this
is not true when either journal_csum_v[23] are set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f974865ff ]
The following commit introduced a bug when checking for zero length extent
5946d08 ext4: check for overlapping extents in ext4_valid_extent_entries()
Zero length extent could pass the check if lblock is zero.
Adding the explicit check for zero length back.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9d50659406 ]
Currently when journal restart fails, we'll have the h_transaction of
the handle set to NULL to indicate that the handle has been effectively
aborted. We handle this situation quietly in the jbd2_journal_stop() and just
free the handle and exit because everything else has been done before we
attempted (and failed) to restart the journal.
Unfortunately there are a number of problems with that approach
introduced with commit
41a5b91319 "jbd2: invalidate handle if jbd2_journal_restart()
fails"
First of all in ext4 jbd2_journal_stop() will be called through
__ext4_journal_stop() where we would try to get a hold of the superblock
by dereferencing h_transaction which in this case would lead to NULL
pointer dereference and crash.
In addition we're going to free the handle regardless of the refcount
which is bad as well, because others up the call chain will still
reference the handle so we might potentially reference already freed
memory.
Moreover it's expected that we'll get aborted handle as well as detached
handle in some of the journalling function as the error propagates up
the stack, so it's unnecessary to call WARN_ON every time we get
detached handle.
And finally we might leak some memory by forgetting to free reserved
handle in jbd2_journal_stop() in the case where handle was detached from
the transaction (h_transaction is NULL).
Fix the NULL pointer dereference in __ext4_journal_stop() by just
calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
the potential memory leak in jbd2_journal_stop() and use proper
handle refcounting before we attempt to free it to avoid use-after-free
issues.
And finally remove all WARN_ON(!transaction) from the code so that we do
not get random traces when something goes wrong because when journal
restart fails we will get to some of those functions.
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8f9cfeed3e ]
when gsmtty_remove put dlci, it will cause memory leak if dlci->port's refcount is zero.
So we do the cleanup work in .cleanup callback instead.
dlci will be last put in two call chains.
1) gsmld_close -> gsm_cleanup_mux -> gsm_dlci_release -> dlci_put
2) gsmld_remove -> dlci_put
so there is a race. the memory leak depends on the race.
In call chain 2. we hit the memory leak. below comment tells.
release_tty -> tty_driver_remove_tty -> gsmtty_remove -> dlci_put -> tty_port_destructor (WARN_ON(port->itty) and return directly)
|
tty->port->itty = NULL;
|
tty_kref_put ---> release_one_tty -> gsmtty_cleanup (added by our patch)
So our patch fix the memory leak by doing the cleanup work after tty core did.
Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com>
Fixes: dfabf7ffa3
Cc: stable <stable@vger.kernel.org> # 3.14+
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e95235ccd ]
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.
If they are bad, we die a few hundred instructions into boot.
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 82ee3aeb92 ]
Samsung has just released a portable USB3 SSD, coming in a very small
and nice form factor. It's USB ID is 04e8:8001, which unfortunately is
already used by the Palm Visor driver for the Samsung I330 phone cradle.
Having pl2303 or visor pick up this device ID results in conflicts with
the usb-storage driver, which handles the newly released portable USB3
SSD.
To work around this conflict, I've dug up a mailing list post [1] from a
long time ago, in which a user posts the full USB descriptor
information. The most specific value in this appears to be the interface
class, which has value 255 (0xff). Since usb-storage requires an
interface class of 0x8, I believe it's correct to disambiguate the two
devices by matching on 0xff inside visor.
[1] http://permalink.gmane.org/gmane.linux.usb.user/4264
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 18cc2f4cbb ]
Our event ring consists of only one segment, and we risk filling
the event ring in case we get isoc transfers with short intervals
such as webcams that fill a TD every microframe (125us)
With 64 TRB segment size one usb camera could fill the event ring in 8ms.
A setup with several cameras and other devices can fill up the
event ring as it is shared between all devices.
This has occurred when uvcvideo queues 5 * 32TD URBs which then
get cancelled when the video mode changes. The cancelled URBs are returned
in the xhci interrupt context and blocks the interrupt handler from
handling the new events.
A full event ring will block xhci from scheduling traffic and affect all
devices conneted to the xhci, will see errors such as Missed Service
Intervals for isoc devices, and and Split transaction errors for LS/FS
interrupt devices.
Increasing the TRB_PER_SEGMENT will also increase the default endpoint ring
size, which is welcome as for most isoc transfer we had to dynamically
expand the endpoint ring anyway to be able to queue the 5 * 32TDs uvcvideo
queues.
The default size used to be 64 TRBs per segment
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d104d0152a ]
Isoc TDs usually consist of one TRB, sometimes two. When all goes well we
receive only one success event for a TD, and move the dequeue pointer to
the next TD.
This fails if the TD consists of two TRBs and we get a transfer error
on the first TRB, we will then see two events for that TD.
Fix this by making sure the event we get is for the last TRB in that TD
before moving the dequeue pointer to the next TD. This will resolve some
of the uvc and dvb issues with the
"ERROR Transfer event TRB DMA ptr not part of current TD" error message
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b23f14302e ]
Information for packet type is in ieee80211_tx_info
band IEEE80211_BAND_5GHZ for PK_TYPE_11A.
IEEE80211_TX_RC_USE_CTS_PROTECT via tx_rate flags selects PK_TYPE_11GB
This ensures that the packet is always the right type.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 892c89d5d7 ]
Fix regression introduced by commit <29ef8a53542a>. After it writing
AT commands to /dev/GCT-ATM0 is unsuccessful (no echo, no response)
and dmesg show "gdmtty: invalid payload : 1 16 f011".
Before that commit value of dummy_cnt was only a padding size. After using
ALIGN() this value is increased by its first argument. So the following
usage of this variable needs correction.
Signed-off-by: Sławomir Demeszko <s.demeszko@wireless-instruments.com>
Cc: stable <stable@vger.kernel.org> # 3.14+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ec04847c0c ]
The string iwpm_ulib_name is recorded in a nlmsg as a netlink attribute.
Without this fix parsing of the nlmsg by the userspace port mapper service fails
because of unknown attribute length, causing the port mapper service not to
register the client, which has sent the nlmsg.
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Cc: <stable@vger.kernel.org> #v3.16
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fdb6eb0a12 ]
When there is prefix specified, currently we will add this prefix in
widget->name, but not in widget->sname.
it causes failure at snd_soc_dapm_link_dai_widgets:
if (!w->sname || !strstr(w->sname, dai_w->name))
because dai_w->name has prefix added, but w->sname does not.
We should also add prefix for stream name
Signed-off-by: Koro Chen <koro.chen@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 37e661ee10 ]
Add a new driver_caps bit, AZX_DCAPS_SNOOP_OFF, to set the snoop off
as default. This new bit is used for the checks in
azx_check_snoop_available(). Most of case-switches are replaced with
the new dcaps in each entry.
While working on it, for avoiding to spend more bits, combine three
bits AZX_DCAPS_SNOOP_SCH, AZX_DCAPS_SNOOP_ATI and
AZX_DCAPS_SNOOP_NVIDIA bits into a flat type of two bits. This
reduces the bits usages, and assign AZX_DCAPS_OFF to this empty bit
now.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3530febb5c ]
This reverts commit 7290006d8c.
Through the regression report, it was revealed that the
tpacpi_led_set() call to thinkpad_acpi helper doesn't only toggle the
mute LED but actually mutes the sound. This is contradiction to the
expectation, and rather confuses user.
According to Henrique, it's not trivial to judge which TP model
behaves "LED-only" and which model does whatever more intrusive, as
Lenovo's implementations vary model by model. So, from the safety
reason, we should revert the patch for now.
Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Pali Rohár <pali.rohar@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b40eda6408 ]
When headphone mic boost is above zero, some 10 - 20 second delay
might occur before the headphone mic is operational.
Therefore disable the headphone mic boost control (recording gain is
sufficient even without it).
(Note: this patch is not about the headset mic, it's about the less
common mic-in only mode.)
BugLink: https://bugs.launchpad.net/bugs/1454235
Suggested-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09ea997677 ]
The Lenovo ThinkPad L450 requires the ALC292_FIXUP_TPT440_DOCK fix in
order to get sound output on the docking stations audio port.
This patch was tested using a ThinkPad L450 (20DSS00B00) using kernel
4.0.3 and a ThinkPad Pro Dock.
Signed-off-by: Ansgar Hegerfeld <linux@hegerfeld.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eef0342cf3 ]
Adds Microsoft LifeCam Cinema USB ID to the snd_usb_get_sample_rate_quirk list as the Lifecam Cinema does not appear to support getting the sample rate.
Fixes the issue where the LifeCam Cinema would wait for USB timeout and log the message "cannot get freq at ep 0x82" when accessed.
Addresses bug report https://bugzilla.kernel.org/show_bug.cgi?id=95961.
Signed-off-by: Adam Honse <calcprogrammer1@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9fc88ad6fd ]
Adding this quirk allows us to avoid the noisy
"cannot get freq at ep 0x1" message in dmesg output every time
playback starts.
This ought to affect other Benchmark DAC1 variations using the same
"Microchip Technology, Inc." chip as well, but I have only tested
with the "Pre" variant.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Cc: Joe Turner <joe@oampo.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8b28c93fe5 ]
There are three places doing the same check. Let's make them
together.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b62b998010 ]
Adds a quirk to disable the check that the sample rate has been set correctly, as the Lifecam does not support getting the sample rate.
This means that we don't need to wait for the USB timeout when attempting to get the sample rate. Waiting for the timeout causes problems in some applications, which give up on the device acquisition process before it has had time to complete, resulting in no sound.
[minor tidy up by tiwai]
Signed-off-by: Joe Turner <joe@oampo.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dacacb0aa0 ]
This makes the midi interface and capture work out of the box with
R16 (and presumably R24 too but untested). Playback stream would also
seem to function fine except for one caveat: no sound is produced,
so it is disabled for now. Mixer descriptors are garbage and will
require further quirks to enable functionality, also disabled here.
Signed-off-by: Panu Matilainen <pmatilai@laiskiainen.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6874daad4b ]
Denon/Marantz USB DACs need a specific vendor command to switch between PCM and
DSD mode. This patch adds a new quirk function to switch between the two modes
using the specific USB vendor command.
This patch applies to the following devices:
- Marantz SA-14S1
- Marantz HD-DAC1
Signed-off-by: Jurgen Kramer <gtmkramer@xs4all.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 37815bf866 ]
The module notifier call chain for MODULE_STATE_COMING was moved up before
the parsing of args, into the complete_formation() call. But if the module failed
to load after that, the notifier call chain for MODULE_STATE_GOING was
never called and that prevented the users of those call chains from
cleaning up anything that was allocated.
Link: http://lkml.kernel.org/r/554C52B9.9060700@gmail.com
Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Fixes: 4982223e51 "module: set nx before marking module MODULE_STATE_COMING"
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2159184ea0 ]
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.
Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.
Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f18c34e483 ]
If the specified maximum length of the string is a multiple of unsigned
long, we would load one long behind the specified maximum. If that
happens to be in a next page, we can hit a page fault although we were
not expected to.
Fix the off-by-one bug in the test whether we are at the end of the
specified range.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c7bd6dc320 ]
The following error message is seen when loading the nct6683 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
Reported-by: Alexey Orishko <alexey.orishko@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1b63bf6172 ]
The following error message is seen when loading the nct6775 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
Reported-by: Alexey Orishko <alexey.orishko@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 97ffae1d30 ]
The VREFN channel is bipolar, not unipolar. Small negative values do
occur (e.g., -1mV), and unsigned conversion maps them incorrectly to
large positive values (about +1V), so fix this.
Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit adba657533 ]
When configured via device tree, the associated iio device needs to be
measuring voltage for the conversion to resistance to be correct.
Return -EINVAL if that is not the case.
Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 77bb3dfdc0 ]
A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
VCPU than it is bound to. This can result in a race between
handle_percpu_irq() and removing the action in __free_irq() because
handle_percpu_irq() does not take desc->lock. The interrupt handler
sees a NULL action and oopses.
Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
# cat /proc/interrupts | grep virq
40: 87246 0 xen-percpu-virq timer0
44: 0 0 xen-percpu-virq debug0
47: 0 20995 xen-percpu-virq timer1
51: 0 0 xen-percpu-virq debug1
69: 0 0 xen-dyn-virq xen-pcpu
74: 0 0 xen-dyn-virq mce
75: 29 0 xen-dyn-virq hvc_console
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 521a04d06a ]
This reverts commit ba9d114ec5.
.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies. In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().
The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.
Cc: stable@vger.kernel.org # 3.18+, needs b049453221 "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 521a04d06a ]
This reverts commit ba9d114ec5.
.. which introduced a regression that prevented all lingering requests
requeued in kick_requests() from ever being sent to the OSDs, resulting
in a lot of missed notifies. In retrospect it's pretty obvious that
r_req_lru_item item in the case of lingering requests can be used not
only for notarget, but also for unsent linkage due to how tightly
actual map and enqueue operations are coupled in __map_request().
The assertion that was being silenced is taken care of in the previous
("libceph: request a new osdmap if lingering request maps to no osd")
commit: by always kicking homeless lingering requests we ensure that
none of them ends up on the notarget list outside of the critical
section guarded by request_mutex.
Cc: stable@vger.kernel.org # 3.18+, needs b049453221 "libceph: request a new osdmap if lingering request maps to no osd"
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cddc116228 ]
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.
Thanks to Brian Foster for noticing it.
cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6dfe5a049f ]
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.
This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.
To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.
Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.
cc: <stable@vger.kernel.org> # 3.12 to 4.0
Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 83a35114d0 ]
This bug has been there since day 1; addresses in the top guest physical
page weren't considered valid. You could map that page (the check in
check_gpte() is correct), but if a guest tried to put a pagetable there
we'd check that address manually when walking it, and kill the guest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c0345ee57d ]
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block. The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.
Unfortunately, a recent change made this unsigned along with some other
related fields. The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.
Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f7bcb70eba ]
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.
And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353
This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.
Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.
[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]
Fixes: 166afb6451 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8b618628b2 ]
At least on ARM, do_div() is optimized to turn constant divisors into
an inline multiplication by the reciprocal value at compile time.
However this optimization is missed entirely whenever ktime_divns() is
used and the slow out-of-line division code is used all the time.
Let ktime_divns() use do_div() inline whenever the divisor is constant
and small enough. This will make things like ktime_to_us() and
ktime_to_ms() much faster.
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c447e76b4c ]
The MPX feature requires eager KVM FPU restore support. We have verified
that MPX cannot work correctly with the current lazy KVM FPU restore
mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
exposed to VM.
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
[Also activate the FPU on AMD processors. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0be0226f07 ]
KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page
Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7cbeed9bce ]
Current permission check assumes that RSVD bit in PFEC is always zero,
however, it is not true since MMIO #PF will use it to quickly identify
MMIO access
Fix it by clearing the bit if walking guest page table is needed
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e88221c50c ]
The kernel's handling of 'compacted' xsave state layout is buggy:
http://marc.info/?l=linux-kernel&m=142967852317199
I don't have such a system, and the description there is vague, but
from extrapolation I guess that there were two kinds of bugs
observed:
- boot crashes, due to size calculations being wrong and the dynamic
allocation allocating a too small xstate area. (This is now fixed
in the new FPU code - but still present in stable kernels.)
- FPU state corruption and ABI breakage: if signal handlers try to
change the FPU state in standard format, which then the kernel
tries to restore in the compacted format.
These breakages are scary, but they only occur on a small number of
systems that have XSAVES* CPU support. Yet we have had XSAVES support
in the upstream kernel for a large number of stable kernel releases,
and the fixes are involved and unproven.
So do the safe resolution first: disable XSAVES* support and only
use the standard xstate format. This makes the code work and is
easy to backport.
On top of this we can work on enabling (and testing!) proper
compacted format support, without backporting pressure, on top of the
new, cleaned up FPU code.
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 17fea54bf0 ]
Derek noticed that a critical MCE gets reported with the wrong
error type description:
[Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0
[Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170}
[Hardware Error]: TSC 49587b8e321cb
[Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29
[Hardware Error]: Some CPUs didn't answer in synchronization
[Hardware Error]: Machine check: Invalid
^^^^^^^
The last line with 'Invalid' should have printed the high level
MCE error type description we get from mce_severity, i.e.
something like:
[Hardware Error]: Machine check: Action required: data load error in a user process
this happens due to the fact that mce_no_way_out() iterates over
all MCA banks and possibly overwrites the @msg argument which is
used in the panic printing later.
Change behavior to take the message of only and the (last)
critical MCE it detects.
Reported-by: Derek <denc716@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e3480271f5 ]
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.
This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.
In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 820f9f147d ]
This is needed to support lazily umounting locked mounts. Because the
entire unmounted subtree needs to stay together until there are no
users with references to any part of the subtree.
To support this guarantee that the fs_pin m_list and s_list nodes
are initialized by initializing them in init_fs_pin allowing
for the possibility that pin_insert_group does not touch them.
Further use hlist_del_init in pin_remove so that there is
a hlist_unhashed test before the list we attempt to update
the previous list item.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cd4a40174b ]
The only users of collect_mounts are in audit_tree.c
In audit_trim_trees and audit_add_tree_rule the path passed into
collect_mounts is generated from kern_path passed an audit_tree
pathname which is guaranteed to be an absolute path. In those cases
collect_mounts is obviously intended to work on mounted paths and
if a race results in paths that are unmounted when collect_mounts
it is reasonable to fail early.
The paths passed into audit_tag_tree don't have the absolute path
check. But are used to play with fsnotify and otherwise interact with
the audit_trees, so again operating only on mounted paths appears
reasonable.
Avoid having to worry about what happens when we try and audit
unmounted filesystems by restricting collect_mounts to mounts
that appear in the mount tree.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This patch is a partial backport of commit ef01c6c36b ("ARM: mvebu:
remove Armada 375 Z1 workaround for I/O coherency"). This commit was
merged in v3.19, so kernel versions later than v3.19 are not affected
by the problem that this commit fixes.
It does not make a lot of sense to backport this commit entirely,
since it is mainly removing some no longer useful code. However, this
commit is also making sure that the bus_register_notifier that
register the custom DMA operations that should be used for HW I/O
coherency does not get registered when said HW I/O coherency is not
enabled.
This is particularly critical since we have decided to disable HW I/O
coherency completely in all kernels < 4.0, to be on the safe side,
while experimenting a new implementation of the HW I/O coherency in >=
4.0.
Without this commit, kernels earlier than 3.18 have the custom DMA
operations normally used for HW I/O coherency registered (they don't
do cache maintenance operations), while HW I/O coherency is
disabled. It essentially causes every DMA transfer to transfer
garbage.
The issue fixed by this commit was introduced by 5ab5afd8ba ("ARM:
mvebu: implement Armada 375 coherency workaround"), but it was not
visible until now since it didn't cause any problem when HW I/O
coherency is enabled.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> v3.16..v3.18
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1d0a0b2f6d ]
ACPICA commit b60612373a4ef63b64a57c124576d7ddb6d8efb6
For physical addresses, since the address may exceed 32-bit address range
after calculation, we should use 0x%8.8X%8.8X instead of ACPI_PRINTF_UINT
and ACPI_FORMAT_UINT64() instead of
ACPI_FORMAT_NATIVE_UINT()/ACPI_FORMAT_TO_UINT().
This patch also removes above replaced macros as there are no users.
This is a preparation to switch acpi_physical_address to 64-bit on 32-bit
kernel builds.
Link: https://github.com/acpica/acpica/commit/b6061237
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cc2080b0e5 ]
ACPICA commit 7f06739db43a85083a70371c14141008f20b2198
For physical addresses, since the address may exceed 32-bit address range
after calculation, we should use %8.8X%8.8X (see ACPI_FORMAT_UINT64()) to
convert the %p formats.
This is a preparation to switch acpi_physical_address to 64-bit on 32-bit
kernel builds.
Link: https://github.com/acpica/acpica/commit/7f06739d
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6d3fd3cc33 ]
ACPICA commit 154f6d074dd38d6ebc0467ad454454e6c5c9ecdf
There are code pieces converting pointers using "(acpi_physical_address) x"
or "ACPI_CAST_PTR (t, x)" formats, this patch cleans up them.
Known issues:
1. Cleanup of "(ACPI_PHYSICAL_ADDRRESS) x" for a table field
For the conversions around the table fields, it is better to fix it with
alignment also fixed. So this patch doesn't modify such code. There
should be no functional problem by leaving them unchanged.
Link: https://github.com/acpica/acpica/commit/154f6d07
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f254e3c57b ]
ACPICA commit 7d9fd64397d7c38899d3dc497525f6e6b044e0e3
OSPMs like Linux expect an acpi_physical_address returning value from
acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't
equal to sizeof (acpi_physical_address):
drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default]
In file included from include/acpi/acpi.h:64:0,
from include/linux/acpi.h:36,
from drivers/acpi/osl.c:41:
include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *'
This patch corrects acpi_find_root_pointer().
Link: https://github.com/acpica/acpica/commit/7d9fd643
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bc26d4d06e ]
A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND)
on /dev/sequencer with TMR_ECHO midi event.
In this case the control flow is:
sound_ioctl()
-> case SND_DEV_SEQ:
case SND_DEV_SEQ2:
sequencer_ioctl()
-> case SNDCTL_SEQ_OUTOFBAND:
spin_lock_irqsave(&lock,flags);
play_event();
-> case EV_TIMING:
seq_timing_event()
-> case TMR_ECHO:
seq_copy_to_input()
-> spin_lock_irqsave(&lock,flags);
It seems that spin_lock_irqsave() around play_event() is not necessary,
because the only other call location in seq_startplay() makes the call
without acquiring spinlock.
So, the patch just removes spinlocks around play_event().
By the way, it removes unreachable code in seq_timing_event(),
since (seq_mode == SEQ_2) case is handled in the beginning.
Compile tested only.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c097877319 ]
arm64 builds with GCC 5 have caused the __asmeq assertions in the PSCI
calling code to fire, so move the ARM PSCI calls out of line into their
own assembly file for consistency and to safeguard against the same
issue occuring with the 32-bit toolchain.
[will: brought into line with arm64 implementation]
Reported-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bad4371d87 ]
f9fd54f22e ("mmc: sh_mmcif: Use msecs_to_jiffies() for host->timeout")
changed the timeout value from 1000 jiffies to 1s. In the case where
HZ is 1000 the values are the same. However, for smaller HZ values the
timeout is now smaller, 1s instead of 10s in the case of HZ=100.
Since the timeout occurs in spite of a normal data transfer a timeout of
10s seems more appropriate. This restores the previous timeout in the
case where HZ=100 and results in an increase over the previous timeout
for larger values of HZ.
Fixes: f9fd54f22e ("mmc: sh_mmcif: Use msecs_to_jiffies() for host->timeout")
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
[horms: rewrote changelog to refer to HZ]
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 184af16b09 ]
The PM_RESTORE_PREPARE is not handled now in mmc_pm_notify(),
as result mmc_rescan() could be scheduled and executed at
late hibernation restore stages when MMC device is suspended
already - which, in turn, will lead to system crash on TI dra7-evm board:
WARNING: CPU: 0 PID: 3188 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x258/0x374()
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in User mode during Functional access
Hence, add missed PM_RESTORE_PREPARE PM event in mmc_pm_notify().
Fixes: 4c2ef25fe0 (mmc: fix all hangs related to mmc/sd card...)
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4e93b9a6ab ]
During kernel boot, it will try to read some logical sectors
of each block device node for the possible partition table.
But since RPMB partition is special and can not be accessed
by normal eMMC read / write CMDs, it will cause below error
messages during kernel boot:
...
mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
mmcblk0rpmb: error -110 transferring data, sector 0, nr 32, cmd response 0x900, card status 0xb00
mmcblk0rpmb: retrying using single block read
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
end_request: I/O error, dev mmcblk0rpmb, sector 0
Buffer I/O error on device mmcblk0rpmb, logical block 0
end_request: I/O error, dev mmcblk0rpmb, sector 8
Buffer I/O error on device mmcblk0rpmb, logical block 1
end_request: I/O error, dev mmcblk0rpmb, sector 16
Buffer I/O error on device mmcblk0rpmb, logical block 2
end_request: I/O error, dev mmcblk0rpmb, sector 24
Buffer I/O error on device mmcblk0rpmb, logical block 3
...
This patch will discard the access request in eMMC queue if
it is RPMB partition access request. By this way, it avoids
trigger above error messages.
Fixes: 090d25fe22 ("mmc: core: Expose access to RPMB partition")
Signed-off-by: Yunpeng Gao <yunpeng.gao@intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Tested-by: Michael Shigorin <mike@altlinux.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c5272a2856 ]
Way back, when the world was a simpler place and there was no war, no
evil, and no kernel bugs, there was just a single pinctrl lock. That
was how the world was when (57291ce pinctrl: core device tree mapping
table parsing support) was written. In that case, there were
instances where the pinctrl mutex was already held when
pinctrl_register_map() was called, hence a "locked" parameter was
passed to the function to indicate that the mutex was already locked
(so we shouldn't lock it again).
A few years ago in (42fed7b pinctrl: move subsystem mutex to
pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex.
...but (oops) we forgot to re-think about the whole "locked" parameter
for pinctrl_register_map(). Basically the "locked" parameter appears
to still refer to whether the bigger pinctrl_dev mutex is locked, but
we're using it to skip locks of our (now separate) pinctrl_maps_mutex.
That's kind of a bad thing(TM). Probably nobody noticed because most
of the calls to pinctrl_register_map happen at boot time and we've got
synchronous device probing. ...and even cases where we're
asynchronous don't end up actually hitting the race too often. ...but
after banging my head against the wall for a bug that reproduced 1 out
of 1000 reboots and lots of looking through kgdb, I finally noticed
this.
Anyway, we can now safely remove the "locked" parameter and go back to
a war-free, evil-free, and kernel-bug-free world.
Fixes: 42fed7ba44 ("pinctrl: move subsystem mutex to pinctrl_dev struct")
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9fcb1704d1 ]
The eDP port A register on PCH split platforms has a slightly different
register layout from the other ports, with bit 6 being either alternate
scrambler reset or reserved, depending on the generation. Our
misinterpretation of the bit as audio has lead to warning.
Fix this by not enabling audio on port A, since none of our platforms
support audio on port A anyway.
v2: DDI doesn't have audio on port A either (Sivakumar Thulasimani)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89958
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3916e3fd81 ]
Single channel LVDS maxes out at 112 MHz. The 15" pre-retina models
shipped with 1440x900 (106 MHz) by default or 1680x1050 (119 MHz)
as a BTO option, both versions used dual channel LVDS even though
the smaller one would have fit into a single channel.
Notes:
Bug report showing that the MacBookPro8,2 with 1440x900 uses dual
channel LVDS (this lead to it being hardcoded in intel_lvds.c by
Daniel Vetter with commit 618563e394):
https://bugzilla.kernel.org/show_bug.cgi?id=42842
If i915.lvds_channel_mode=2 is missing even though the machine needs
it, every other vertical line is white and consequently, only the left
half of the screen is visible (verified by myself on a MacBookPro9,1).
Forum posting concerning a MacBookPro6,2 with 1440x900, author is
using i915.lvds_channel_mode=2 on the kernel command line, proving
that the machine uses dual channels:
https://bbs.archlinux.org/viewtopic.php?id=185770
Chi Mei N154C6-L04 with 1440x900 is a replacement panel for all
MacBook Pro "A1286" models, and that model number encompasses the
MacBookPro6,2 / 8,2 / 9,1. Page 17 of the panel's datasheet shows it's
driven with dual channel LVDS:
http://www.ebay.com/itm/-/400690878560http://www.everymac.com/ultimate-mac-lookup/?search_keywords=A1286http://www.taopanel.com/chimei/datasheet/N154C6-L04.pdf
Those three 15" models, MacBookPro6,2 / 8,2 / 9,1, are the only ones
with i915 graphics and dual channel LVDS, so that list should be
complete. And the 8,2 is already in intel_lvds.c.
Possible motivation to use dual channel LVDS even on the 1440x900
models: Reduce the number of different parts, i.e. use identical logic
boards and display cabling on both versions and the only differing
component is the panel.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: stable@vger.kernel.org
[Jani: included notes in the commit message for posterity]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fdb68e09bb ]
Since commit 844b03f277 we make
sure that after vblank irq off, we return the last valid
(vblank count, vblank timestamp) pair to clients, e.g., during
modesets, which is good.
An overlooked side effect of that commit for kms drivers without
support for precise vblank timestamping is that at vblank irq
enable, when we update the vblank counter from the hw counter, we
can't update the corresponding vblank timestamp, so now we have a
totally mismatched timestamp for the new count to confuse clients.
Restore old client visible behaviour from before Linux 3.17, but
zero out the timestamp at vblank counter update (instead of disable
as in original implementation) if we can't generate a meaningful
timestamp immediately for the new vblank counter. This will fix
this regression, so callers know they need to retry again later
if they need a valid timestamp, but at the same time preserves
the improvements made in the commit mentioned above.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> #v3.17+
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 53d2669844 ]
The GPIO regulator for the SD-card isn't a ux500 SOC configuration, but
instead it's specific to the board. Move the definition of it, into the
board DTSs.
Fixes: c94a4ab7af ("ARM: ux500: Disable the MMCI gpio-regulator by default")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 19fc99d0c6 ]
In that case, emit_udiv() will be called with rn == ARM_R0 (r_scratch)
and loading rm first into ARM_R0 will result in jit_udiv() function
being called the same dividend and divisor. Fix that by loading rn
first into ARM_R1 and then rm into ARM_R0.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Cc: <stable@vger.kernel.org> # v3.13+
Fixes: aee636c480 (bpf: do not use reciprocal divide)
Acked-by: Mircea Gherzan <mgherzan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 102bcb6ed2 ]
If we use a combination of VMODE and I2C4 for retention modes,
eventually the off idle power consumption will creep up by about
23mW, even during off mode with I2C4 always staying enabled.
Turns out this is because of erratum i531 "Extra Power Consumed
When Repeated Start Operation Mode Is Enabled on I2C Interface
Dedicated for Smart Reflex (I2C4)" as pointed out by Nishanth
Menon <nm@ti.com>.
Let's fix the issue by adding i2c_cfg_clear_mask for the bits
to clear when initializing the I2C4 adapter so we can clear
SREN bit that drives the I2C4 lines low otherwise when there
is no traffic.
Fixes: 3b8c4ebb76 ("ARM: OMAP3: Fix idle mode signaling for
Cc: stable@vger.kernel.org # v3.16+
sys_clkreq and sys_off_mode")
Cc: Kevin Hilman <khilman@kernel.org>
Cc: Tero Kristo <t-kristo@ti.com>
Reviewed-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 750e30d407 ]
There is no crystal connected to the internal RTC on the Open Block
AX3. So let's disable it in order to prevent the kernel probing the
driver uselessly. Eventually this patches removes the following
warning message from the boot log:
"rtc-mv d0010300.rtc: internal RTC not ticking"
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.8 +
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0fdebe1a2f ]
The dr_mode of usb0 on imx233-olinuxino is left to default "otg".
Since the green LED (GPIO2_1) on imx233-olinuxino is connected to the
same pin as USB_OTG_ID it's possible to disable USB host by LED toggling:
echo 0 > /sys/class/leds/green/brightness
[ 1068.890000] ci_hdrc ci_hdrc.0: remove, state 1
[ 1068.890000] usb usb1: USB disconnect, device number 1
[ 1068.920000] usb 1-1: USB disconnect, device number 2
[ 1068.920000] usb 1-1.1: USB disconnect, device number 3
[ 1069.070000] usb 1-1.2: USB disconnect, device number 4
[ 1069.450000] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
[ 1074.460000] ci_hdrc ci_hdrc.0: timeout waiting for 00000800 in 11
This patch fixes the issue by setting dr_mode to "host" in the dts file.
Reported-by: Harald Geyer <harald@ccbib.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: Peter Chen <peter.chen@freescale.com>
Fixes: b493129482 ("ARM: dts: imx23-olinuxino: Add USB host support")
Cc: stable@vger.kernel.org
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1819e3034e ]
N900 audio recording needs that codec provides bias voltage for integrated
digital microphone and headset microphone depending which one is used.
Digital microphone uses 2 V bias and it comes from the codec A part. Codec
B part drives the headset microphone bias and that is set to 2.5 V.
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Pavel Machek <pavel@ucw.cz>
[Jarkko: Headset mic bias changed to 2 (2.5 V) as it was before commit
e2e8bfdf61 ("ASoC: tlv320aic3x: Convert mic bias to a supply widget")]
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c0403ec0bb ]
This reverts Linux 4.1-rc1 commit 0618764cb2.
The problem which that commit attempts to fix actually lies in the
Freescale CAAM crypto driver not dm-crypt.
dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
driver should internally backlog requests which arrive when the queue is
full and process them later. Until the crypto hw's queue becomes full,
the driver returns -EINPROGRESS. When the crypto hw's queue if full,
the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
expected to backlog the request and process it when the hardware has
queue space. At the point when the driver takes the request from the
backlog and starts processing it, it calls the completion function with
a status of -EINPROGRESS. The completion function is called (for a
second time, in the case of backlogged requests) with a status/err of 0
when a request is done.
Crypto drivers for hardware without hardware queueing use the helpers,
crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
and crypto_get_backlog() helpers to implement this behaviour correctly,
while others implement this behaviour without these helpers (ccp, for
example).
dm-crypt (before the patch that needs reverting) uses this API
correctly. It queues up as many requests as the hw queues will allow
(i.e. as long as it gets back -EINPROGRESS from the request function).
Then, when it sees at least one backlogged request (gets -EBUSY), it
waits till that backlogged request is handled (completion gets called
with -EINPROGRESS), and then continues. The references to
af_alg_wait_for_completion() and af_alg_complete() in that commit's
commit message are irrelevant because those functions only handle one
request at a time, unlink dm-crypt.
The problem is that the Freescale CAAM driver, which that commit
describes as having being tested with, fails to implement the
backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware
queue is full, it simply returns -EBUSY without backlogging the request.
What the observed deadlock was is not described in the commit message
but it is obviously the wait_for_completion() in crypto_convert() where
dm-crypto would wait for the completion being called with -EINPROGRESS
in the case of backlogged requests. This completion will never be
completed due to the bug in the CAAM driver.
Commit 0618764cb2 incorrectly made dm-crypt wait for every request,
even when the driver/hardware queues are not full, which means that
dm-crypt will never see -EBUSY. This means that that commit will cause
a performance regression on all crypto drivers which implement the API
correctly.
Revert it. Correct backlog handling should be implemented in the CAAM
driver instead.
Cc'ing stable purely because commit 0618764cb2 did. If for some reason
a stable@ kernel did pick up commit 0618764cb2 it should get reverted.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Horia Geanta <horia.geanta@freescale.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8014bcc86e ]
The variable for the 'permissive' module parameter used to be static
but was recently changed to be extern. This puts it in the kernel
global namespace if the driver is built-in, so its name should begin
with a prefix identifying the driver.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: af6fc858a3 ("xen-pciback: limit guest control of command register")
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5cec988349 ]
When a guest is resumed, the hypervisor may change event channel
assignments. If this happens and the guest uses 2-level events it
is possible for the interrupt to be claimed by wrong VCPU since
cpu_evtchn_mask bits may be stale. This can happen even though
evtchn_2l_bind_to_cpu() attempts to clear old bits: irq_info that
is passed in is not necessarily the original one (from pre-migration
times) but instead is freshly allocated during resume and so any
information about which CPU the channel was bound to is lost.
Thus we should clear the mask during resume.
We also need to make sure that bits for xenstore and console channels
are set when these two subsystems are resumed. While rebind_evtchn_irq()
(which is invoked for both of them on a resume) calls irq_set_affinity(),
the latter will in fact postpone setting affinity until handling the
interrupt. But because cpu_evtchn_mask will have bits for these two
cleared we won't be able to take the interrupt.
With that in mind, we need to bind those two channels explicitly in
rebind_evtchn_irq(). We will keep irq_set_affinity() so that we have a
pass through generic irq affinity code later, in case something needs
to be updated there as well.
(Also replace cpumask_of(0) with cpumask_of(info->cpu) in
rebind_evtchn_irq(): it should be set to zero in preceding
xen_irq_info_evtchn_setup().)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Annie Li <annie.li@oracle.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 602498f9aa ]
If multiple soft offline events hit one free page/hugepage concurrently,
soft_offline_page() can handle the free page/hugepage multiple times,
which makes num_poisoned_pages counter increased more than once. This
patch fixes this wrong counting by checking TestSetPageHWPoison for normal
papes and by checking the return value of dequeue_hwpoisoned_huge_page()
for hugepages.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Dean Nelson <dnelson@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org> [3.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 464d1387ac ]
mm/page-writeback.c has several places where 1 is added to the divisor
to prevent division by zero exceptions; however, if the original
divisor is equivalent to -1, adding 1 leads to division by zero.
There are three places where +1 is used for this purpose - one in
pos_ratio_polynom() and two in bdi_position_ratio(). The second one
in bdi_position_ratio() actually triggered div-by-zero oops on a
machine running a 3.10 kernel. The divisor is
x_intercept - bdi_setpoint + 1 == span + 1
span is confirmed to be (u32)-1. It isn't clear how it ended up that
but it could be from write bandwidth calculation underflow fixed by
c72efb658f ("writeback: fix possible underflow in write bandwidth
calculation").
At any rate, +1 isn't a proper protection against div-by-zero. This
patch converts all +1 protections to |1. Note that
bdi_update_dirty_ratelimit() was already using |1 before this patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f15133df08 ]
path_openat() jumps to the wrong place after do_tmpfile() - it has
already done path_cleanup() (as part of path_lookupat() called by
do_tmpfile()), so doing that again can lead to double fput().
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09789e5de1 ]
Currently memory_failure() calls shake_page() to sweep pages out from
pcplists only when the victim page is 4kB LRU page or thp head page.
But we should do this for a thp tail page too.
Consider that a memory error hits a thp tail page whose head page is on
a pcplist when memory_failure() runs. Then, the current kernel skips
shake_pages() part, so hwpoison_user_mappings() returns without calling
split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
still cleared due to the skip of shake_page().
As a result, me_huge_page() runs for the thp, which is broken behavior.
One effect is a leak of the thp. And another is to fail to isolate the
memory error, so later access to the error address causes another MCE,
which kills the processes which used the thp.
This patch fixes this problem by calling shake_page() for thp tail case.
Fixes: 385de35722 ("thp: allow a hwpoisoned head page to be put back to LRU")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Dean Nelson <dnelson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Cc: <stable@vger.kernel.org> [3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 483d821108 ]
Unregister GPIOs requested through sysfs at chip remove to avoid leaking
the associated memory and sysfs entries.
The stale sysfs entries prevented the gpio numbers from being exported
when the gpio range was later reused (e.g. at device reconnect).
This also fixes the related module-reference leak.
Note that kernfs makes sure that any on-going sysfs operations finish
before the class devices are unregistered and that further accesses
fail.
The chip exported flag is used to prevent gpiod exports during removal.
This also makes it harder to trigger, but does not fix, the related race
between gpiochip_remove and export_store, which is really a race with
gpiod_request that needs to be addressed separately.
Also note that this would prevent the crashes (e.g. NULL-dereferences)
at reconnect that affects pre-3.18 kernels, as well as use-after-free on
operations on open attribute files on pre-3.14 kernels (prior to
kernfs).
Fixes: d8f388d8dc ("gpio: sysfs interface")
Cc: stable <stable@vger.kernel.org> # v2.6.27: 01cca93a94
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 285214409a ]
When accepting a new IPv4 connect to an IPv6 socket, the CMA tries to
canonize the address family to IPv4, but does not properly process
the listening sockaddr to get the listening port, and does not properly
set the address family of the canonized sockaddr.
Fixes: e51060f08a ("IB: IP address based RDMA connection manager")
Cc: <stable@vger.kernel.org>
Reported-By: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d8fd150fe3 ]
The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).
Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.
This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b1432a2a35 ]
There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged. This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.
dlm_get_lock_resource {
...
spin_lock(&dlm->spinlock);
tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
if (tmpres) {
spin_unlock(&dlm->spinlock);
>>>>>>>> race window, dlm_run_purge_list() may run and purge
the lock resource
spin_lock(&tmpres->spinlock);
...
spin_unlock(&tmpres->spinlock);
}
}
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 622532bb2f ]
Commit eec15edbb0 (ACPI / PNP: use device ID list for PNPACPI device
enumeration) changed the way how ACPI devices are enumerated and when
they are added to the PNP bus.
However, it broke the sound card support on (at least) a vintage
IBM ThinkPad 600E: with said commit applied, two of the necessary
"CSC01xx" devices are not added to the PNP bus and hence can not be
found during the initialization of the "snd-cs4236" module. As a
consequence, loading "snd-cs4236" causes null pointer exceptions.
The attached patch fixes the problem end re-enables sound on the
IBM ThinkPad 600E.
Fixes: eec15edbb0 (ACPI / PNP: use device ID list for PNPACPI device enumeration)
Signed-off-by: Witold Szczeponik <Witold.Szczeponik@gmx.net>
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3349fb64b2 ]
Commit 7bc5a2bad0 'ACPI: Support _OSI("Darwin") correctly' caused
the MacBook firmware to expose the SBS, resulting in intermittent
hangs of several minutes on boot, and failure to detect or report
the battery. Fix this by adding a 5 us delay to the start of each
SMBUS transaction. This timing is the result of experimentation -
hangs were observed with 3 us but never with 5 us.
Fixes: 7bc5a2bad0 'ACPI: Support _OSI("Darwin") correctly'
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94651
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
[ rjw: Subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit db579e76f0 ]
On Mac OS X, HFS+ extended attributes are not namespaced. Since we want
to be compatible with OS X filesystems and yet still support the Linux
namespacing system, the hfsplus driver implements a special "osx"
namespace that is reported for any attribute that is not namespaced
on-disk. However, the current code for getting and setting these
unprefixed attributes is broken.
hfsplus_osx_setattr() and hfsplus_osx_getattr() are passed names that have
already had their "osx." prefixes stripped by the generic functions. The
functions first, quite correctly, check those names to make sure that they
aren't prefixed with a known namespace, which would allow namespace access
restrictions to be bypassed. However, the functions then prepend "osx."
to the name they're given before passing it on to hfsplus_getattr() and
hfsplus_setattr(). Not only does this cause the "osx." prefix to be
stored on-disk, defeating its purpose, it also breaks the check for the
special "com.apple.FinderInfo" attribute, which is reported for all files,
and as a consequence makes some userspace applications (e.g. GNU patch)
fail even when extended attributes are not otherwise in use.
There are five commits which have touched this particular code:
127e5f5ae5 ("hfsplus: rework functionality of getting, setting and deleting of extended attributes")
b168fff721 ("hfsplus: use xattr handlers for removexattr")
bf29e886b2 ("hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributes")
fcacbd95e121 ("fs/hfsplus: move xattr_name allocation in hfsplus_getxattr()")
ec1bbd346f18 ("fs/hfsplus: move xattr_name allocation in hfsplus_setxattr()")
The first commit creates the functions to begin with. The namespace is
prepended by the original code, which I believe was correct at the time,
since hfsplus_?etattr() stripped the prefix if found. The second commit
removes this behavior from hfsplus_?etattr() and appears to have been
intended to also remove the prefixing from hfsplus_osx_?etattr().
However, what it actually does is remove a necessary strncpy() call
completely, breaking the osx namespace entirely. The third commit re-adds
the strncpy() call as it was originally, but doesn't mention it in its
commit message. The final two commits refactor the code and don't affect
its functionality.
This commit does what b168fff attempted to do (prevent the prefix from
being added), but does it properly, instead of passing in an empty buffer
(which is what b168fff actually did).
Fixes: b168fff721 ("hfsplus: use xattr handlers for removexattr")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 579d69bc1f ]
The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Torsten Luettgert <ml-lkml@enda.eu>
Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de>
Cc: stable@vger.kernel.org
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 118c855b56 ]
The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9cd9554615 ]
The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 280227a75b ]
fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: Davide Italiano <dccitaliano@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d2dc317d56 ]
Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.
The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.
At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.
When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.
For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.
This problem can be easily reproduced by running the following xfs_io.
xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
-c "falloc 0 131072" \
-c "pwrite -S 0xbb 65536 2048" \
-c "fsync" /mnt/test/fff
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ee136af4a0 ]
The usb-storage driver sets max_sectors = 240 in its scsi-host template,
for uas we do not want to do that for all devices, but testing has shown
that some devices need it.
This commit adds a US_FL_MAX_SECTORS_240 flag for such devices, and
implements support for it in uas.c, while at it it also adds support
for US_FL_MAX_SECTORS_64 to uas.c.
Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a5011d44f0 ]
uas_use_uas_driver may set some US_FL_foo flags during detection, currently
these are stored in a local variable and then throw away, but these may be
of interest to the caller, so add an extra parameter to (optionally) return
the detected flags, and use this in the uas driver.
Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 082a75dad8 ]
When we end I/O struct request with error, we need to pass
obj_request->length as @nr_bytes so that the entire obj_request worth
of bytes is completed. Otherwise block layer ends up confused and we
trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
in rbd_img_obj_callback() due to more being true no matter what. We
already do it in most cases but we are missing some, in particular
those where we don't even get a chance to submit any obj_requests, due
to an early -ENOMEM for example.
A number of obj_request->xferred assignments seem to be redundant but
I haven't touched any of obj_request->xferred stuff to keep this small
and isolated.
Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+
Reported-by: Shawn Edwards <lesser.evil@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 61f8ff6939 ]
Commit 9faf6136ff (ACPI / SBS: Disable smart battery manager on
Apple) introduced a regression disabling the SBS battery manager.
The battery manager should be marked as present when
acpi_manager_get_info() returns 0.
Fixes: 9faf6136ff (ACPI / SBS: Disable smart battery manager on Apple)
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 909e26dce3 ]
Whenever the check for a send in progress introduced in commit
521e0546c9 (btrfs: protect snapshots from deleting during send) is
hit, we return without unlocking inode->i_mutex. This is easy to see
with lockdep enabled:
[ +0.000059] ================================================
[ +0.000028] [ BUG: lock held when returning to user space! ]
[ +0.000029] 4.0.0-rc5-00096-g3c435c1 #93 Not tainted
[ +0.000026] ------------------------------------------------
[ +0.000029] btrfs/211 is leaving the kernel with locks still held!
[ +0.000029] 1 lock held by btrfs/211:
[ +0.000023] #0: (&type->i_mutex_dir_key){+.+.+.}, at: [<ffffffff8135b8df>] btrfs_ioctl_snap_destroy+0x2df/0x7a0
Make sure we unlock it in the error path.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a5a356cee8 ]
Wrongly release mutex lock during otg_statemachine may result in re-enter
otg_statemachine, which is not allowed, we should do next state transtition
after previous one completed.
Fixes: 826cfe751f ("usb: chipidea: add OTG fsm operation functions implementation")
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Li Jun <jun.li@freescale.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6829e274a6 ]
Buffers allocated by dma_alloc_coherent() are always zeroed on Alpha,
ARM (32bit), MIPS, PowerPC, x86/x86_64 and probably other architectures.
It turned out that some drivers rely on this 'feature'. Allocated buffer
might be also exposed to userspace with dma_mmap() call, so clearing it
is desired from security point of view to avoid exposing random memory
to userspace. This patch unifies dma_alloc_coherent() behavior on ARM64
architecture with other implementations by unconditionally zeroing
allocated buffer.
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5c90c07b98 ]
For systems with CONFIG_SERIAL_OF_PLATFORM=y and device_type =
"serial"; property in DT of_serial.c driver maps and unmaps IRQ (because
driver probe fails). Then a driver is called but irq mapping is not
created that's why driver is failing again in again on request_irq().
Based on this use platform_get_irq() instead of platform_get_resource()
which is doing irq_desc allocation and driver itself can request IRQ.
Fix both xilinx serial drivers in the tree.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6befa9d883 ]
Do not probe all serial drivers by of_serial.c which are using
device_type = "serial"; property. Only drivers which have valid
compatible strings listed in the driver should be probed.
When PORT_UNKNOWN is setup probe will fail anyway.
Arnd quotation about driver historical background:
"when I wrote that driver initially, the idea was that it would
get used as a stub to hook up all other serial drivers but after
that, the common code learned to create platform devices from DT"
This patch fix the problem with on the system with xilinx_uartps and
16550a where of_serial failed to register for xilinx_uartps and because
of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking
for irq with request_irq() EINVAL is returned.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
CC: <stable@vger.kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0d3bba0287 ]
Phil and I found out a problem with commit:
7e860a6e7a ("cdc-acm: add sanity checks")
It added some sanity checks to ignore potential garbage in CDC headers but
also introduced a potential infinite loop. This can happen at the first
loop iteration (elength = 0 in that case) if the description isn't a
DT_CS_INTERFACE or later if 'buffer[0]' is zero.
It should also be noted that the wrong length was being added to 'buffer'
in case 'buffer[1]' was not a DT_CS_INTERFACE descriptor, since elength was
assigned after that check in the loop.
A specially crafted USB device could be used to trigger this infinite loop.
Fixes: 7e860a6e7a ("cdc-acm: add sanity checks")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
CC: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
CC: Oliver Neukum <oneukum@suse.de>
CC: Adam Lee <adam8157@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7241ea558c ]
Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two
modes for DMA. Second mode is useful for 64 bit os with more then 2 GB
of ram (fixes problems with big soundfont loading)
1) 32MB from 2 GB address space using 8192 pages (used now as default)
2) 16MB from 4 GB address space using 4096 pages
Mode is set using HCFG_EXPANDED_MEM flag in HCFG register.
Also format of emu10k2 page table is then different.
Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk>
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d02260824e ]
Some models provide too long string for the shortname that has 32bytes
including the terminator, and it results in a non-terminated string
exposed to the user-space. This isn't too critical, though, as the
string is stopped at the succeeding longname string.
This patch fixes such entries by dropping "SB" prefix (it's enough to
fit within 32 bytes, so far). Meanwhile, it also changes strcpy()
with strlcpy() to make sure that this kind of problem won't happen in
future, too.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c94e65c66 ]
The OSS emulation in synth-emux helper has a potential AB/BA deadlock
at the simultaneous closing and opening:
close ->
snd_seq_release() ->
sne_seq_free_client() ->
snd_seq_delete_all_ports(): takes client->ports_mutex ->
port_delete() ->
snd_emux_unuse(): takes emux->register_mutex
open ->
snd_seq_oss_open() ->
snd_emux_open_seq_oss(): takes emux->register_mutex ->
snd_seq_event_port_attach() ->
snd_seq_create_port(): takes client->ports_mutex
This patch addresses the deadlock by reducing the rance taking
emux->register_mutex in snd_emux_open_seq_oss(). The lock is needed
for the refcount handling, so move it locally. The calls in
emux_seq.c are already with the mutex, thus they are replaced with the
version without mutex lock/unlock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 07b0e5d49d ]
The emux-synth driver has a possible AB/BA mutex deadlock at unloading
the emu10k1 driver:
snd_emux_free() ->
snd_emux_detach_seq(): mutex_lock(&emu->register_mutex) ->
snd_seq_delete_kernel_client() ->
snd_seq_free_client(): mutex_lock(®ister_mutex)
snd_seq_release() ->
snd_seq_free_client(): mutex_lock(®ister_mutex) ->
snd_seq_delete_all_ports() ->
snd_emux_unuse(): mutex_lock(&emu->register_mutex)
Basically snd_emux_detach_seq() doesn't need a protection of
emu->register_mutex as it's already being unregistered. So, we can
get rid of this for avoiding the deadlock.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aebac99384 ]
Commit 6ebb496ffc7e("MIPS: kernel: entry.S: Add MIPS R6 related
definitions") added the MIPSR6 definition but it did not update the
ISA level of the actual assembly code so a pre-MIPSR6 jr.hb instruction
was generated instead. Fix this by using the MISP_ISA_LEVEL_RAW macro.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Fixes: 6ebb496ffc7e("MIPS: kernel: entry.S: Add MIPS R6 related definitions")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9386/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Commit c67881fc89 is a backport of
0b67c43ce3 upstream commit, fixing a
bug on clk rate change propagation.
But in 4.0 ->determine_rate() prototype has changed, thus introducing
a prototype mismatch when applying it on 3.19.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 876a7ae65b ]
ALU64_DIV instruction should be dividing 64-bit by 64-bit,
whereas do_div() does 64-bit by 32-bit divide.
x64 and arm64 JITs correctly implement 64 by 64 unsigned divide.
llvm BPF backend emits code assuming that ALU64_DIV does 64 by 64.
Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8b01fc86b9 ]
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.
This patch was mostly written by Linus Torvalds.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d0311314d0 ]
The hardware and firmware for the RTL8192EE utilize a FIFO list of
descriptors. There were some problems with the initial implementation.
The worst of these failed to detect that the FIFO was becoming full,
which led to the device needing to be power cycled. As this condition
is not relevant to most of the devices supported by rtlwifi, a callback
routine was added to detect this situation. This patch implements the
necessary changes in the pci handler, and the linkage into the appropriate
rtl8192ee routine.
Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [V3.18]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e66f17ff71 ]
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.
This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.
This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.
We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.
Here is the reproducer:
$ cat movepages.c
#include <stdio.h>
#include <stdlib.h>
#include <numaif.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
#define PS 0x1000
int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;
pid = strtol(argv[2], NULL, 0);
addrs = malloc(sizeof(char *) * nr_p + 1);
status = malloc(sizeof(char *) * nr_p + 1);
nodes = malloc(sizeof(char *) * nr_p + 1);
while (1) {
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 1;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 0;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
}
return 0;
}
$ cat hugepage.c
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
int main(int argc, char *argv[]) {
int nr_hp = strtol(argv[1], NULL, 0);
char *p;
while (1) {
p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (p != (void *)ADDR_INPUT) {
perror("mmap");
break;
}
memset(p, 0, nr_hp * HPS);
munmap(p, nr_hp * HPS);
}
}
$ sysctl vm.nr_hugepages=40
$ ./hugepage 10 &
$ ./movepages 10 $(pgrep -f hugepage)
Fixes: e632a938d9 ("mm: migrate: add hugepage migration code to move_pages()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit abe46b8932 ]
Reading of analog input channels by the `INSN_READ` comedi instruction
is broken for all except channel 0. `pci171x_ai_insn_read()` calls
`pci171x_ai_read_sample()` with the wrong value for the third parameter.
It is supposed to be the current index in a channel list (which is
always of length 1 in this case, so the index should be 0), but instead
it is passing the actual channel number. `pci171x_ai_read_sample()`
checks the channel number encoded in the raw sample value read from the
hardware matches the channel number stored in the specified index of the
previously set up channel list and returns `-ENODATA` if it doesn't
match. Since the index should always be 0 in this case, the match will
fail unless the channel number is also 0. Fix it by passing 0 as the
channel index.
Note that when the bug first appeared, it was `pci171x_ai_dropout()`
that was called with the wrong parameter value. `pci171x_ai_dropout()`
got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae250
("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()").
Fixes: 16c7eb6047 ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code")
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: stable <stable@vger.kernel.org> # 3.16+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0790ec172d ]
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0. L1 triple faulted when running L2 guest that required emulation.
Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.
Fixes: 78051e3b7e ("KVM: nVMX: Disable unrestricted mode if ept=0")
Cc: stable@vger.kernel.org
Tested-By: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0b67c43ce3 ]
We also need to save/store in forward, else br_parse_ip_options call
will zero frag_max_size as well.
Fixes: 93fdd47e5 ('bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING')
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c34203a14 ]
It is not necessary to call device_remove_groups() when device_add_groups()
fails.
The group added by device_add_groups() should be removed if sysfs_create_link()
fails.
Fixes: fa6fdb33b4 ("driver core: bus_type: add dev_groups")
Signed-off-by: Junjie Mao <junjie_mao@yeah.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7085a7401b ]
This fixes a regression from the net subsystem:
After commit d52fdbb735
"smc91x: retrieve IRQ and trigger flags in a modern way"
a regression would appear on some legacy platforms such
as the ARM PXA Zylonite that specify IRQ resources like
this:
static struct resource r = {
.start = X,
.end = X,
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHEDGE,
};
The previous code would retrieve the resource and parse
the high edge setting in the SMC91x driver, a use pattern
that means every driver specifying an IRQ flag from a
static resource need to parse resource flags and apply
them at runtime.
As we switched the code to use IRQ descriptors to retrieve
the the trigger type like this:
irqd_get_trigger_type(irq_get_irq_data(...));
the code would work for new platforms using e.g. device
tree as the backing irq descriptor would have its flags
properly set, whereas this kind of oldstyle static
resources at no point assign the trigger flags to the
corresponding IRQ descriptor.
To make the behaviour identical on modern device tree
and legacy static platform data platforms, modify
platform_get_irq() to assign the trigger flags to the
irq descriptor when a client looks up an IRQ from static
resources.
Fixes: d52fdbb735 ("smc91x: retrieve IRQ and trigger flags in a modern way")
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 13f6b191aa ]
Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.
Fixes: f1d8269802 ('memstick: use fully asynchronous request processing')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f4831605f2 ]
time_init invokes timer64_init (which is __init annotation)
since all of these are invoked at init time, lets maintain
consistency by ensuring time_init is marked appropriately
as well.
This fixes the following warning with CONFIG_DEBUG_SECTION_MISMATCH=y
WARNING: vmlinux.o(.text+0x3bfc): Section mismatch in reference from the function time_init() to the function .init.text:timer64_init()
The function time_init() references
the function __init timer64_init().
This is often because time_init lacks a __init
annotation or the annotation of timer64_init is wrong.
Fixes: 546a39546c ("C6X: time management")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6d7e7e02a0 ]
For cases where total length of an input SGs is not same as
length of the input data for encryption, omap-aes driver
crashes. This happens in the case when IPsec is trying to use
omap-aes driver.
To avoid this, we copy all the pages from the input SG list
into a contiguous buffer and prepare a single element SG list
for this buffer with length as the total bytes to crypt, which is
similar thing that is done in case of unaligned lengths.
Fixes: 6242332ff2 ("crypto: omap-aes - Add support for cases of unaligned lengths")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a3fa71c40f ]
In struct wl18xx_acx_rx_rate_stat, rx_frames_per_rates field is an
array, not a number. This means WL18XX_DEBUGFS_FWSTATS_FILE can't be
used to display this field in debugfs (it would display a pointer, not
the actual data). Use WL18XX_DEBUGFS_FWSTATS_FILE_ARRAY instead.
This bug has been found by adding a __printf attribute to
wl1271_format_buffer. gcc complained about "format '%u' expects
argument of type 'unsigned int', but argument 5 has type 'u32 *'".
Fixes: c5d94169e8 ("wl18xx: use new fw stats structures")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0b053c9518 ]
OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient to
ensure protection from dead store optimization.
For the random driver and crypto drivers, calls are emitted ...
$ gdb vmlinux
(gdb) disassemble memzero_explicit
Dump of assembler code for function memzero_explicit:
0xffffffff813a18b0 <+0>: push %rbp
0xffffffff813a18b1 <+1>: mov %rsi,%rdx
0xffffffff813a18b4 <+4>: xor %esi,%esi
0xffffffff813a18b6 <+6>: mov %rsp,%rbp
0xffffffff813a18b9 <+9>: callq 0xffffffff813a7120 <memset>
0xffffffff813a18be <+14>: pop %rbp
0xffffffff813a18bf <+15>: retq
End of assembler dump.
(gdb) disassemble extract_entropy
[...]
0xffffffff814a5009 <+313>: mov %r12,%rdi
0xffffffff814a500c <+316>: mov $0xa,%esi
0xffffffff814a5011 <+321>: callq 0xffffffff813a18b0 <memzero_explicit>
0xffffffff814a5016 <+326>: mov -0x48(%rbp),%rax
[...]
... but in case in future we might use facilities such as LTO, then
OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible
eviction of the memset(). We have to use a compiler barrier instead.
Minimal test example when we assume memzero_explicit() would *not* be
a call, but would have been *inlined* instead:
static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
<foo>
}
int main(void)
{
char buff[20];
snprintf(buff, sizeof(buff) - 1, "test");
printf("%s", buff);
memzero_explicit(buff, sizeof(buff));
return 0;
}
With <foo> := OPTIMIZER_HIDE_VAR():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: xor %eax,%eax
0x000000000040046b <+43>: add $0x28,%rsp
0x000000000040046f <+47>: retq
End of assembler dump.
With <foo> := barrier():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: movq $0x0,(%rsp)
0x0000000000400471 <+49>: movq $0x0,0x8(%rsp)
0x000000000040047a <+58>: movl $0x0,0x10(%rsp)
0x0000000000400482 <+66>: xor %eax,%eax
0x0000000000400484 <+68>: add $0x28,%rsp
0x0000000000400488 <+72>: retq
End of assembler dump.
As can be seen, movq, movq, movl are being emitted inlined
via memset().
Reference: http://thread.gmane.org/gmane.linux.kernel.cryptoapi/13764/
Fixes: d4c5efdb97 ("random: add and use memzero_explicit() for clearing data")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: mancha security <mancha1@zoho.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 80f1d68ccb ]
I noticed that a helper function with argument type ARG_ANYTHING does
not need to have an initialized value (register).
This can worst case lead to unintented stack memory leakage in future
helper functions if they are not carefully designed, or unintended
application behaviour in case the application developer was not careful
enough to match a correct helper function signature in the API.
The underlying issue is that ARG_ANYTHING should actually be split
into two different semantics:
1) ARG_DONTCARE for function arguments that the helper function
does not care about (in other words: the default for unused
function arguments), and
2) ARG_ANYTHING that is an argument actually being used by a
helper function and *guaranteed* to be an initialized register.
The current risk is low: ARG_ANYTHING is only used for the 'flags'
argument (r4) in bpf_map_update_elem() that internally does strict
checking.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 08e8331654 ]
There is a race condition between e1000_change_mtu's cleanups and
netpoll, when we change the MTU across jumbo size:
Changing MTU frees all the rx buffers:
e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings ->
e1000_clean_rx_ring
Then, close to the end of e1000_change_mtu:
pr_info -> ... -> netpoll_poll_dev -> e1000_clean ->
e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag
And when we come back to do the rest of the MTU change:
e1000_up -> e1000_configure -> e1000_configure_rx ->
e1000_alloc_jumbo_rx_buffers
alloc_jumbo finds the buffers already != NULL, since data (shared with
page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage,
or at least not what is expected when in jumbo state.
This results in an unusable adapter (packets don't get through), and a
NULL pointer dereference on the next call to e1000_clean_rx_ring
(other mtu change, link down, shutdown):
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330
[...]
Call Trace:
[<ffffffff81195445>] put_page+0x55/0x60
[<ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200
[<ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60
[<ffffffff815df5e0>] e1000_down+0x1c0/0x1d0
[<ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840
[<ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170
[<ffffffff81647050>] dev_set_mtu+0xa0/0x140
[<ffffffff81664218>] do_setlink+0x218/0xac0
[<ffffffff814459e9>] ? nla_parse+0xb9/0x120
[<ffffffff816652d0>] rtnl_newlink+0x6d0/0x890
[<ffffffff8104f000>] ? kvm_clock_read+0x20/0x40
[<ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260
By setting the allocator to a dummy version, netpoll can't mess up our
rx buffers. The allocator is set back to a sane value in
e1000_configure_rx.
Fixes: edbbb3ca10 ("e1000: implement jumbo receive with partial descriptors")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7c61f0d389 ]
d4b18c3e (pnfs: remove GETDEVICELIST implementation) removed the
GETDEVICELIST operation from the NFS client, but left a "hole" in the
nfs4_procedures array. This caused /proc/self/mountstats to report an
operation named "51" where GETDEVICELIST used to be. This patch adds a
stub to fix mountstats.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: d4b18c3e (pnfs: remove GETDEVICELIST implementation)
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6e4891dc28 ]
In the case we already have a struct file (derived from a stateid), we
still need to do permission-checking; otherwise an unauthorized user
could gain access to a file by sniffing or guessing somebody else's
stateid.
Cc: stable@vger.kernel.org
Fixes: dc97618ddd "nfsd4: separate splice and readv cases"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3cab989afd ]
Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something. And in such cases we end up
with excessive mntput().
Cc: stable@vger.kernel.org # since 2.6.39
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9535c4757b ]
The hardware, according to the specs, is limited to 256 byte transfers,
and current driver has no protections in case users attempt to do larger
transfers. The code will just stomp over status register and mayhem
ensues.
Let's split larger transfers into digestable chunks. Doing this allows
Atmel MXT driver on Pixel 1 function properly (it hasn't since commit
9d8dc3e529 "Input: atmel_mxt_ts -
implement T44 message handling" which tries to consume multiple
touchscreen/touchpad reports in a single transaction).
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b5f1c97f94 ]
Due this typo we don't save/restore the GFX_MAX_REQ_COUNT register across
suspend/resume, so fix this.
This was introduced in
commit ddeea5b0c3
Author: Imre Deak <imre.deak@intel.com>
Date: Mon May 5 15:19:56 2014 +0300
drm/i915: vlv: add runtime PM support
I noticed this only by reading the code. To my knowledge it shouldn't
cause any real problems at the moment, since the power well backing this
register remains on across a runtime s/r. This may change once
system-wide s0ix functionality is enabled in the kernel.
v2:
- resend after a missing git add -u :/
Cc: stable@vger.kernel.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c1c21f4e60 ]
Current -next fails to link an ARM allmodconfig because drivers that use
the core recovery functions can be built as modules but those functions
are not exported:
ERROR: "i2c_generic_gpio_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_generic_scl_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_recover_bus" [drivers/i2c/busses/i2c-davinci.ko] undefined!
Add exports to fix this.
Fixes: 5f9296ba21 (i2c: Add bus recovery infrastructure)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c6cbfb91b8 ]
master_xfer() method should return number of i2c messages transferred,
but on Rockchip we were usually returning just 1, which caused trouble
with users that actually check number of transferred messages vs.
checking for negative error codes.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a065fe6aa2 ]
This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.
Fixes: d77e65350f ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ca9b590caa ]
The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.
It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.
The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.
An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.
Fixes: b832be1e40 ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 66578b0b2f ]
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).
This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().
There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org> # 8494057ab5 ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic")
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 80ccf4ad06 ]
img_ir_remove() passes a pointer to the ISR function as the 2nd
parameter to irq_free() instead of a pointer to the device data
structure.
This issue causes unloading img-ir module to fail with the below
warning after building and loading img-ir as a module.
WARNING: CPU: 2 PID: 155 at ../kernel/irq/manage.c:1278
__free_irq+0xb4/0x214() Trying to free already-free IRQ 58
Modules linked in: img_ir(-)
CPU: 2 PID: 155 Comm: rmmod Not tainted 3.14.0 #55 ...
Call Trace:
...
[<8048d420>] __free_irq+0xb4/0x214
[<8048d6b4>] free_irq+0xac/0xf4
[<c009b130>] img_ir_remove+0x54/0xd4 [img_ir] [<8073ded0>]
platform_drv_remove+0x30/0x54 ...
Fixes: 160a8f8aec ("[media] rc: img-ir: add base driver")
Signed-off-by: Sifan Naeem <sifan.naeem@imgtec.com>
Cc: <stable@vger.kernel.org> # 3.15+
Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 56cbd0ccc1 ]
mvsas is giving a General protection fault when it encounters an expander
attached ATA device. Analysis of mvs_task_prep_ata() shows that the driver is
assuming all ATA devices are locally attached and obtaining the phy mask by
indexing the local phy table (in the HBA structure) with the phy id. Since
expanders have many more phys than the HBA, this is causing the index into the
HBA phy table to overflow and returning rubbish as the pointer.
mvs_task_prep_ssp() instead does the phy mask using the port properties.
Mirror this in mvs_task_prep_ata() to fix the panic.
Reported-by: Adam Talbot <ajtalbot1@gmail.com>
Tested-by: Adam Talbot <ajtalbot1@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c0403ec0bb ]
This reverts Linux 4.1-rc1 commit 0618764cb2.
The problem which that commit attempts to fix actually lies in the
Freescale CAAM crypto driver not dm-crypt.
dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG. This means the the crypto
driver should internally backlog requests which arrive when the queue is
full and process them later. Until the crypto hw's queue becomes full,
the driver returns -EINPROGRESS. When the crypto hw's queue if full,
the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
expected to backlog the request and process it when the hardware has
queue space. At the point when the driver takes the request from the
backlog and starts processing it, it calls the completion function with
a status of -EINPROGRESS. The completion function is called (for a
second time, in the case of backlogged requests) with a status/err of 0
when a request is done.
Crypto drivers for hardware without hardware queueing use the helpers,
crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
and crypto_get_backlog() helpers to implement this behaviour correctly,
while others implement this behaviour without these helpers (ccp, for
example).
dm-crypt (before the patch that needs reverting) uses this API
correctly. It queues up as many requests as the hw queues will allow
(i.e. as long as it gets back -EINPROGRESS from the request function).
Then, when it sees at least one backlogged request (gets -EBUSY), it
waits till that backlogged request is handled (completion gets called
with -EINPROGRESS), and then continues. The references to
af_alg_wait_for_completion() and af_alg_complete() in that commit's
commit message are irrelevant because those functions only handle one
request at a time, unlink dm-crypt.
The problem is that the Freescale CAAM driver, which that commit
describes as having being tested with, fails to implement the
backlogging behaviour correctly. In cam_jr_enqueue(), if the hardware
queue is full, it simply returns -EBUSY without backlogging the request.
What the observed deadlock was is not described in the commit message
but it is obviously the wait_for_completion() in crypto_convert() where
dm-crypto would wait for the completion being called with -EINPROGRESS
in the case of backlogged requests. This completion will never be
completed due to the bug in the CAAM driver.
Commit 0618764cb2 incorrectly made dm-crypt wait for every request,
even when the driver/hardware queues are not full, which means that
dm-crypt will never see -EBUSY. This means that that commit will cause
a performance regression on all crypto drivers which implement the API
correctly.
Revert it. Correct backlog handling should be implemented in the CAAM
driver instead.
Cc'ing stable purely because commit 0618764cb2 did. If for some reason
a stable@ kernel did pick up commit 0618764cb2 it should get reverted.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Horia Geanta <horia.geanta@freescale.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0b21503dbb ]
Currently, a RCG's M/N counter (used for fraction division) is
set to either 'bypass' (counter disabled) or 'dual edge' (counter
enabled) based on whether the corresponding rcg struct has a mnd
field specified and a non-zero N.
In the case where M and N are the same value, the M/N counter is
still enabled by code even though no division takes place.
Leaving the RCG in such a state can result in improper behavior.
This was observed with the DSI pixel clock RCG when M and N were
both set to 1.
Add an additional check (M != N) to enable the M/N counter only
when it's needed for fraction division.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Fixes: bcd61c0f53 (clk: qcom: Add support for root clock
generators (RCGs))
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e43e25917 ]
The number of resets controls is 32 times the number of peripheral
register banks rather than 32 times the number of clocks. This reduces
(drastically) the number of reset controls registered from 10080 (315
clocks * 32) to 224 (6 peripheral register banks * 32).
This also fixes a potential crash because trying to use any of the
excess reset controls (224-10079) would have caused accesses beyond
the array bounds of the peripheral register banks definition array.
Cc: Peter De Schrijver <pdeschrijver@nvidia.com>
Cc: Prashant Gaikwad <pgaikwad@nvidia.com>
Fixes: 6d5b988e7d ("clk: tegra: implement a reset driver")
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3a9e9cb65b ]
Commit 42773b28e7 ("clk: samsung: exynos4: Enable ARMCLK
down feature") enabled ARMCLK down feature on all Exynos4
SoCs. Unfortunately on Exynos4210 SoC ARMCLK down feature
causes a lockup when ondemand cpufreq governor is used.
Fix it by limiting ARMCLK down feature to Exynos4x12 SoCs.
This patch was tested on:
- Exynos4210 SoC based Trats board
- Exynos4210 SoC based Origen board
- Exynos4412 SoC based Trats2 board
- Exynos4412 SoC based Odroid-U3 board
Cc: Daniel Drake <drake@endlessm.com>
Cc: Tomasz Figa <t.figa@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Fixes: 42773b28e7 ("clk: samsung: exynos4: Enable ARMCLK down feature")
Cc: <stable@vger.kernel.org> # v3.17+
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 61819549f5 ]
Level IRQ handlers and edge IRQ handler are managed by tow different
sets of registers. But currently the driver uses the same mask for the
both registers. It lead to issues with the following scenario:
First, an IRQ is requested on a GPIO to be triggered on front. After,
this an other IRQ is requested for a GPIO of the same bank but
triggered on level. Then the first one will be also setup to be
triggered on level. It leads to an interrupt storm.
The different kind of handler are already associated with two
different irq chip type. With this patch the driver uses a private
mask for each one which solves this issue.
It has been tested on an Armada XP based board and on an Armada 375
board. For the both boards, with this patch is applied, there is no
such interrupt storm when running the previous scenario.
This bug was already fixed but in a different way in the legacy
version of this driver by Evgeniy Dushistov:
9ece8839b1 "ARM: orion: Fix for certain
sequence of request_irq can cause irq storm". The fact the new version
of the gpio drive could be affected had been discussed there:
http://thread.gmane.org/gmane.linux.ports.arm.kernel/344670/focus=364012
Reported-by: Evgeniy A. Dushistov <dushistov@mail.ru>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.7 +
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 24e94454c8 ]
- don't lock lp->lock in the iss_net_timer for the call of iss_net_poll,
it will lock it itself;
- invert order of lp->lock and opened_lock acquisition in the
iss_net_open to make it consistent with iss_net_poll;
- replace spin_lock with spin_lock_bh when acquiring locks used in
iss_net_timer from non-atomic context;
- replace spin_lock_irqsave with spin_lock_bh in the iss_net_start_xmit
as the driver doesn't use lp->lock in the hard IRQ context;
- replace __SPIN_LOCK_UNLOCKED(lp.lock) with spin_lock_init, otherwise
lockdep is unhappy about using non-static key.
Cc: <stable@vger.kernel.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4949009eb8 ]
LCD driver is always built for the XTFPGA platform, but its base address
is not configurable, and is wrong for ML605/KC705. Its initialization
locks up KC705 board hardware.
Make the whole driver optional, and its base address and bus width
configurable. Implement 4-bit bus access method.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4c533c801d ]
acpi_scan_is_offline() may be called under the physical_node_lock
lock of the given device object's parent, so prevent lockdep from
complaining about that by annotating that instance with
SINGLE_DEPTH_NESTING.
Fixes: caa73ea158 (ACPI / hotplug / driver core: Handle containers in a special way)
Reported-and-tested-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 77ddc2fe08 ]
ACPICA commit c70434d4da13e65b6163c79a5aa16b40193631c7
ACPI_MTX_TABLES is acquired and released by the callers of
acpi_tb_install_standard_table() so releasing it in the function itself is
causing the following error in Linux kernel if the table is reloaded:
ACPI Error: Mutex [0x2] is not acquired, cannot release (20141107/utmutex-321)
Call Trace:
[<ffffffff81b0bd48>] dump_stack+0x4f/0x7b
[<ffffffff81546bf5>] acpi_ut_release_mutex+0x47/0x67
[<ffffffff81544357>] acpi_load_table+0x73/0xcb
Link: https://github.com/acpica/acpica/commit/c70434d4
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2b8760100e ]
ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451
It is reported that on a physically 64-bit addressed machine, 32-bit kernel
can trigger crashes in accessing the memory regions that are beyond the
32-bit boundary. The region field's start address should still be 32-bit
compliant, but after a calculation (adding some offsets), it may exceed the
32-bit boundary. This case is rare and buggy, but there are real BIOSes
leaked with such issues (see References below).
This patch fixes this gap by always defining IO addresses as 64-bit, and
allows OSPMs to optimize it for a real 32-bit machine to reduce the size of
the internal objects.
Internal acpi_physical_address usages in the structures that can be fixed
by this change include:
1. struct acpi_object_region:
acpi_physical_address address;
2. struct acpi_address_range:
acpi_physical_address start_address;
acpi_physical_address end_address;
3. struct acpi_mem_space_context;
acpi_physical_address address;
4. struct acpi_table_desc
acpi_physical_address address;
See known issues 1 for other usages.
Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer
from same problem, so this patch changes it accordingly.
For iasl, it will enforce acpi_physical_address as 32-bit to generate
32-bit OSPM compatible tables on 32-bit platforms, we need to define
ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h.
Known issues:
1. Cleanup of mapped virtual address
In struct acpi_mem_space_context, acpi_physical_address is used as a virtual
address:
acpi_physical_address mapped_physical_address;
It is better to introduce acpi_virtual_address or use acpi_size instead.
This patch doesn't make such a change. Because this should be done along
with a change to acpi_os_map_memory()/acpi_os_unmap_memory().
There should be no functional problem to leave this unchanged except
that only this structure is enlarged unexpectedly.
Link: https://github.com/acpica/acpica/commit/aacf863c
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501
Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Reported-and-tested-by: Sial Nije <sialnije@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a57069e33f ]
As davinci card gets registered using 'devm_' api
there is no need to unregister the card in 'remove'
function.
Hence drop the 'remove' function.
Fixes: ee2f615d6e (ASoC: davinci-evm: Add device tree binding)
Signed-off-by: Manish Badarkhe <manishvb@ti.com>
Signed-off-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8787041d9b ]
The WM8741 DAC supports the following typical audio sampling rates:
44.1kHz, 88.2kHz, 176.4kHz (eg: with a master clock of 22.5792MHz)
32kHz, 48kHz, 96kHz, 192kHz (eg: with a master clock of 24.576MHz)
For the rates lists, we should use 82000 instead of 88235, 176400
instead of 1764000 and 192000 instead of 19200 (seems to be a typo).
Signed-off-by: Sergej Sawazki <ce3a@gmx.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7261b956b2 ]
The patch to add it_page_shift incorrectly changed the increment of
uaddr to use it_page_shift, rather then (1 << it_page_shift).
This broke booting on at least some Cell blades, as the iommu was
basically non-functional.
Fixes: 3a553170d3 ("powerpc/iommu: Add it_page_shift field to determine iommu page size")
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b0dd00addc ]
The conversion from __get_cpu_var() to this_cpu_ptr() in iic_setup_cpu()
is wrong. It causes an oops at boot.
We need the per-cpu address of struct cpu_iic, not cpu_iic.regs->prio.
Sparse noticed this, because we pass a non-iomem pointer to out_be64(),
but we obviously don't check the sparse results often enough.
Fixes: 69111bac42 ("powerpc: Replace __get_cpu_var uses")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f7e9e35836 ]
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9 "Rewrite sysfs processor cache info code".
This caused lscpu to error out on at least e500v2 devices, eg:
error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory
Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.
Fixes: 93197a36a9 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 027fa02f84 ]
If M64 has been supported, the prefetchable 64-bits memory resources
shouldn't be mapped to the corresponding PE# via M32DT. Unfortunately,
we're doing that in pnv_ioda_setup_pe_seg() wrongly. The issue was
introduced by commit 262af55 ("powerpc/powernv: Enable M64 aperatus
for PHB3"). The patch fixes the issue by simply skipping M64 resources
when updating to M32DT.
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 364189f0ad ]
This hang was a result of a missing command put when
a DIF error occurred during a rdma read (and we sent
an CHECK_CONDITION error without passing it to the
backend).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c836777830 ]
In fd_do_prot_rw(), it allocates prot_buf which is used to copy from
se_cmd->t_prot_sg by sbc_dif_copy_prot(). The SG table for prot_buf
is also initialized by allocating 'se_cmd->t_prot_nents' entries of
scatterlist and setting the data length of each entry to PAGE_SIZE
at most.
However if se_cmd->t_prot_sg contains a clustered entry (i.e.
sg->length > PAGE_SIZE), the SG table for prot_buf can't be
initialized correctly and sbc_dif_copy_prot() can't copy to prot_buf.
(This actually happened with TCM loopback fabric module)
As prot_buf is allocated by kzalloc() and it's physically contiguous,
we only need a single scatterlist entry.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 64d240b721 ]
When UNMAP command is issued with DIF protection support enabled,
the protection info for the unmapped region is remain unchanged.
So READ command for the region causes data integrity failure.
This fixes it by invalidating protection info for the unmapped region
by filling with 0xff pattern. This change also adds helper function
fd_do_prot_fill() in order to reduce code duplication with existing
fd_format_prot().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 38da0f49e8 ]
When CONFIG_DEBUG_SG=y and DIF protection support enabled, kernel
BUG()s are triggered due to the following two issues:
1) prot_sg is not initialized by sg_init_table().
When CONFIG_DEBUG_SG=y, scatterlist helpers check sg entry has a
correct magic value.
2) vmalloc'ed buffer is passed to sg_set_buf().
sg_set_buf() uses virt_to_page() to convert virtual address to struct
page, but it doesn't work with vmalloc address. vmalloc_to_page()
should be used instead. As prot_buf isn't usually too large, so
fix it by allocating prot_buf by kmalloc instead of vmalloc.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c8e639852a ]
This patch fixes a bug for COMPARE_AND_WRITE handling with
fabrics using SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC.
It adds the missing allocation for cmd->t_bidi_data_sg within
transport_generic_new_cmd() that is used by COMPARE_AND_WRITE
for the initial READ payload, even if the fabric is already
providing a pre-allocated buffer for cmd->t_data_sg.
Also, fix zero-length COMPARE_AND_WRITE handling within the
compare_and_write_callback() and target_complete_ok_work()
to queue the response, skipping the initial READ.
This fixes COMPARE_AND_WRITE emulation with loopback, vhost,
and xen-backend fabric drivers using SG_TO_MEM_NOALLOC.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 299d0c5b27 ]
The comparison from the previous line seems to have been erroneously
(partially) copied-and-pasted onto the next. The second line should be
checking req.bytes, not req.lnum.
Coverity CID #139400
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[rw: Fixed comparison]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f16db8071c ]
In some of the 'out_not_moved' error paths, lnum may be used
uninitialized. Don't ignore the warning; let's fix it.
This uninitialized variable doesn't have much visible effect in the end,
since we just schedule the PEB for erasure, and its LEB number doesn't
really matter (it just gets printed in debug messages). But let's get it
straight anyway.
Coverity CID #113449
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8eef7d70f7 ]
We are completely discarding the earlier value of 'bitflips', which
could reflect a bitflip found in ubi_io_read_vid_hdr(). Let's use the
bitwise OR of header and data 'bitflip' statuses instead.
Coverity CID #1226856
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f82263c698 ]
Since commit ee0778a301
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using
[...]
BUILD_OUTPUT := $(PWD)
[...]
which obviously causes trouble when building "turbostat" with
make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat
because GNU make does not update nor guarantee that $PWD is set.
This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).
Link: https://bugs.gentoo.org/show_bug.cgi?id=533918
Fixes: ee0778a301 ("tools/power: turbostat: make Makefile a bit more capable")
Signed-off-by: Thomas D. <whissi@whissi.de>
Cc: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8318e667f1 ]
Invoking mount propagation from __detach_mounts is inefficient and
wrong.
It is inefficient because __detach_mounts already walks the list of
mounts that where something needs to be done, and mount propagation
walks some subset of those mounts again.
It is actively wrong because if the dentry that is passed to
__detach_mounts is not part of the path to a mount that mount should
not be affected.
change_mnt_propagation(p,MS_PRIVATE) modifies the mount propagation
tree of a master mount so it's slaves are connected to another master
if possible. Which means even removing a mount from the middle of a
mount tree with __detach_mounts will not deprive any mount propagated
mount events.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e819f15210 ]
- Remove the unneeded declaration from pnode.h
- Mark umount_tree static as it has no callers outside of namespace.c
- Define an enumeration of umount_tree's flags.
- Pass umount_tree's flags in by name
This removes the magic numbers 0, 1 and 2 making the code a little
clearer and makes it possible for there to be lazy unmounts that don't
propagate. Which is what __detach_mounts actually wants for example.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e12fb97222 ]
Previously commit 14ece1028b added a
support for for syncing parent directory of newly created inodes to
make sure that the inode is not lost after a power failure in
no-journal mode.
However this does not work in majority of cases, namely:
- if the directory has inline data
- if the directory is already indexed
- if the directory already has at least one block and:
- the new entry fits into it
- or we've successfully converted it to indexed
So in those cases we might lose the inode entirely even after fsync in
the no-journal mode. This also includes ext2 default mode obviously.
I've noticed this while running xfstest generic/321 and even though the
test should fail (we need to run fsck after a crash in no-journal mode)
I could not find a newly created entries even when if it was fsynced
before.
Fix this by adjusting the ext4_add_entry() successful exit paths to set
the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
parent directory as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b72c186999 ]
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <pthread.h>
#include <assert.h>
int pid;
void *waiter(void *arg)
{
int stat;
for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;
assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}
int main(void)
{
pthread_t thread;
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}
assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);
return 0;
}
Note for stable: the bug is very old, but without 9899d11f65 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a87938b2e2 ]
With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.
Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.
Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).
Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB. The larger the data
segment of the binary the higher the probability of failure.
Fix this by calculating the total size of the binary in the same way as
load_elf_interp().
Signed-off-by: Michael Davidson <md@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bd884149ac ]
On ASUS TP500LN and X750JN, the touchpad absolute mode is reset each
time set_rate is done.
In order to fix this, we will verify the firmware version, and if it
matches the one in those laptops, the set_rate function is overloaded
with a function elantech_set_rate_restore_reg_07 that performs the
set_rate with the original function, followed by a restore of reg_07
(the register that sets the absolute mode on elantech v4 hardware).
Also the ASUS TP500LN and X750JN firmware version, capabilities, and
button constellation is added to elantech.c
Cc: stable@vger.kernel.org
Reported-and-tested-by: George Moutsopoulos <gmoutso@yahoo.co.uk>
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d1b6e2932 ]
The ALC256 does not have a mixer nid at 0x0b, and there's no
loopback path (the output pins are directly connected to the DACs).
This commit fixes an "num_steps = 0 for NID=0xb (ctl = Beep Playback Volume)"
error (and as a result, problems with amixer/alsamixer).
If there's pcbeep functionality, it certainly isn't controlled by setting an
amp on 0x0b, so disable beep functionality (at least for now).
Cc: stable@vger.kernel.org
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1446517
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f2aa111041 ]
The Lenovo Thinkpad T450 requires the ALC292_FIXUP_TPT440_DOCK as well in
order to get working sound output on the docking stations headphone jack.
Patch tested on a Thinkpad T450 (20BVCTO1WW) using kernel 4.0-rc7 in
conjunction with a ThinkPad Ultradock.
Signed-off-by: Jo-Philipp Wich <jow@openwrt.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 91bf0c2dcb ]
The functions snd_emu10k1_proc_spdif_read and snd_emu1010_fpga_read
acquire the emu_lock before accessing the FPGA. The function used
to access the FPGA (snd_emu1010_fpga_read) also tries to take
the emu_lock which causes a deadlock.
Remove the outer locking in the proc-functions (guarding only the
already safe fpga read) to prevent this deadlock.
[removed superfluous flags variables too -- tiwai]
Signed-off-by: Michael Gernoth <michael@gernoth.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4591243102 ]
The at91sam9n12 and at91sam9x5 usb clocks do not propagate rate
modification requests to their parents.
This causes a bug when the PLLB is left uninitialized by the bootloader
(PLL multiplier set to 0, or in other words, PLL rate = 0 Hz).
Implement the determinate_rate method and propagate the change rate
request to the parent clk.
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Bo Shen <voice.shen@atmel.com>
Tested-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7a606ac297 ]
While this driver was already using a 50ms resume
timeout, let's make sure everybody uses the same
macro so it's easy to fix later should anything
go wrong.
It also gives a more "stable" expectation to Linux
users.
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 62f0342de1 ]
Every USB Host controller should use this new
macro to define for how long resume signalling
should be driven on the bus.
Currently, almost every single USB controller
is using a 20ms timeout for resume signalling.
That's problematic for two reasons:
a) sometimes that 20ms timer expires a little
before 20ms, which makes us fail certification
b) some (many) devices actually need more than
20ms resume signalling.
Sure, in case of (b) we can state that the device
is against the USB spec, but the fact is that
we have no control over which device the certification
lab will use. We also have no control over which host
they will use. Most likely they'll be using a Windows
PC which, again, we have no control over how that
USB stack is written and how long resume signalling
they are using.
At the end of the day, we must make sure Linux passes
electrical compliance when working as Host or as Device
and currently we don't pass compliance as host because
we're driving resume signallig for exactly 20ms and
that confuses certification test setup resulting in
Certification failure.
Cc: <stable@vger.kernel.org> # v3.10+
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 869aee0f31 ]
The res parameter passed to devm_usb_phy_match() is the location where the
pointer to the usb_phy is stored, hence it needs to be dereferenced before
comparing to the match data in order to find the correct match.
Fixes: 410219dcd2 ("usb: otg: utils: devres: Add API's to associate a device with the phy")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Cc: <stable@vger.kernel.org> # v3.6+
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e3c93e1a3f ]
As per Mentor Graphics' documentation, we should
always handle TX endpoints before RX endpoints.
This patch fixes that error while also updating
some hard-to-read comments which were scattered
around musb_interrupt().
This patch should be backported as far back as
possible since this error has been in the driver
since it's conception.
Cc: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4e330ae4ab ]
There are two PMICs on Cragganmore, currently one dynamically assign
its IRQ base and the other uses a fixed base. It is possible for the
statically assigned PMIC to fail if its IRQ is taken by the dynamically
assigned one. Fix this by statically assigning both the IRQ bases.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 548ae94c1c ]
On Armada 38x SoCs, under heavy I/O load, the system hangs when CPU
Idle is enabled. Waiting for a solution to this issue, this patch
disables the CPU Idle support for this SoC.
As CPU Hot plug support also uses some of the CPU Idle functions it is
also affected by the same issue. This patch disables it also for the
Armada 38x SoCs.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.17 +
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8defb3367f ]
Usually ELF_ET_DYN_BASE is 2/3 of TASK_SIZE. With 3G/1G user/kernel
split this is not so, because 2*TASK_SIZE overflows 32 bits,
so the actual value of ELF_ET_DYN_BASE is:
(2 * TASK_SIZE / 3) = 0x2a000000
When ASLR is disabled PIE binaries will load at ELF_ET_DYN_BASE address.
On 32bit platforms AddressSanitzer uses addresses [0x20000000 - 0x40000000]
for shadow memory [1]. So ASan doesn't work for PIE binaries when ASLR disabled
as it fails to map shadow memory.
Also after Kees's 'split ET_DYN ASLR from mmap ASLR' patchset PIE binaries
has a high chance of loading somewhere in between [0x2a000000 - 0x40000000]
even if ASLR enabled. This makes ASan with PIE absolutely incompatible.
Fix overflow by dividing TASK_SIZE prior to multiplying.
After this patch ELF_ET_DYN_BASE equals to (for CONFIG_VMSPLIT_3G=y):
(TASK_SIZE / 3 * 2) = 0x7f555554
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm#Mapping
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Maria Guseva <m.guseva@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 767bf7e7a1 ]
Normally, when a CPU wants to clear a cache line to zero in the external
L2 cache, it would generate bus cycles to write each word as it would do
with any other data access.
However, a Cortex A9 connected to a L2C-310 has a specific feature where
the CPU can detect this operation, and signal that it wants to zero an
entire cache line. This feature, known as Full Line of Zeros (FLZ),
involves a non-standard AXI signalling mechanism which only the L2C-310
can properly interpret.
There are separate enable bits in both the L2C-310 and the Cortex A9 -
the L2C-310 needs to be enabled and have the FLZ enable bit set in the
auxiliary control register before the Cortex A9 has this feature
enabled.
Unfortunately, the suspend code was not respecting this - it's not
obvious from the code:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image()
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
At this point, we end up with the L2C disabled, but the Cortex A9 with
FLZ enabled - which means any memset() or zeroing of a full cache line
will fail to take effect.
A similar issue exists in the resume path, but it's slightly more
complex:
swsusp_arch_suspend()
cpu_suspend() /* saves the Cortex A9 auxiliary control register */
arch_save_image() /* image with A9 auxcr saved */
...
swsusp_arch_resume()
call_with_stack()
arch_restore_image() /* restores image with A9 auxcr saved above */
soft_restart() /* turns off FLZ in Cortex A9, and disables L2C */
cpu_resume() /* restores the Cortex A9 registers, inc auxcr */
Again, here we end up with the L2C disabled, but Cortex A9 FLZ enabled.
There's no need to turn off the L2C in either of these two paths; there
are benefits from not doing so - for example, the page copies will be
faster with the L2C enabled.
Hence, fix this by providing a variant of soft_restart() which can be
used without turning the L2 cache controller off, and use it in both
of these paths to keep the L2C enabled across the respective resume
transitions.
Fixes: 8ef418c717 ("ARM: l2c: trial at enabling some Cortex-A9 optimisations")
Reported-by: Sean Cross <xobs@kosagi.com>
Tested-by: Sean Cross <xobs@kosagi.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c1b8940b42 ]
We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
occurs during an rsync into a filesystem that is exported via NFS.
1.) fs/attr.c:notify_change() modifies the caller's version of attr.
2.) 6de0ec00ba ("VFS: make notify_change pass ATTR_KILL_S*ID to
setattr operations") introduced a BUG() restriction such that "no
function will ever call notify_change() with both ATTR_MODE and
ATTR_KILL_S*ID set". Under some circumstances though, it will have
assisted in setting the caller's version of attr to this very
combination.
3.) 27ac0ffeac ("locks: break delegations on any attribute
modification") introduced code to handle breaking
delegations. This can result in notify_change() being re-called. attr
_must_ be explicitly reset to avoid triggering the BUG() established
in #2.
4.) The path that that triggers this is via fs/open.c:chmod_common().
The combination of attr flags set here and in the first call to
notify_change() along with a later failed break_deleg_wait()
results in notify_change() being called again via retry_deleg
without resetting attr.
Solution is to move retry_deleg in chmod_common() a bit further up to
ensure attr is completely reset.
There are other places where this seemingly could occur, such as
fs/utimes.c:utimes_common(), but the attr flags are not initially
set in such a way to trigger this.
Fixes: 27ac0ffeac ("locks: break delegations on any attribute modification")
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Tested-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a2c1d53185 ]
The return values of create_singlethread_workqueue() and
power_supply_register() calls were not checked and even on error probe()
function returned 0.
1. If allocation of workqueue failed (returning NULL) then further
accesses could lead to NULL pointer dereference. The
queue_delayed_work() expects workqueue to be non-NULL.
2. If registration of power supply failed then during unbind the driver
tried to unregister power supply which was not actually registered.
This could lead to memory corruption because
power_supply_unregister() unconditionally cleans up given power
supply.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 00a588f9d2 ("power: add driver for battery reading on iPaq h3xxx")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a7117f81e8 ]
Driver forgot to unregister charger power supply if registering of
battery supply failed in probe(). In such case the memory associated
with power supply leaked.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 98a2766493 ("power_supply: Add new lp8788 charger driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 68c3ed6fa7 ]
The return value of power_supply_register() call was not checked and
even on error probe() function returned 0. If registering failed then
during unbind the driver tried to unregister power supply which was not
actually registered.
This could lead to memory corruption because power_supply_unregister()
unconditionally cleans up given power supply.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: da0a00ebc2 ("power: Add twl4030_madc battery driver.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 80a9b64e2c ]
It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.
101.356868: preempt_count_add <-ring_buffer_lock_reserve
101.356870: preempt_count_sub <-ring_buffer_lock_reserve
The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.
The generic version of this_cpu_read() and write() are:
#define this_cpu_generic_read(pcp) \
({ typeof(pcp) ret__; \
preempt_disable(); \
ret__ = *this_cpu_ptr(&(pcp)); \
preempt_enable(); \
ret__; \
})
#define this_cpu_generic_to_op(pcp, val, op) \
do { \
unsigned long flags; \
raw_local_irq_save(flags); \
*__this_cpu_ptr(&(pcp)) op val; \
raw_local_irq_restore(flags); \
} while (0)
Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.
Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.
I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.
Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1915a718b1 ]
The return value of power_supply_register() call was not checked and
even on error probe() function returned 0. If registering failed then
during unbind the driver tried to unregister power supply which was not
actually registered.
This could lead to memory corruption because power_supply_unregister()
unconditionally cleans up given power supply.
Fix this by checking return status of power_supply_register() call. In
case of failure, clean up sysfs entries and fail the probe.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 9be0fcb5ed ("compal-laptop: add JHL90, battery & hwmon interface")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ad774702f1 ]
The commit c2be45f09b ("compal-laptop: Use
devm_hwmon_device_register_with_groups") wanted to change the
registering of hwmon device to resource-managed version. It mostly did
it except the main thing - it forgot to use devm-like function so the
hwmon device leaked after device removal or probe failure.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: c2be45f09b ("compal-laptop: Use devm_hwmon_device_register_with_groups")
Cc: <stable@vger.kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f20fbaad76 ]
`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length. It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow. For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long. If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address. Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.
Thanks to Dan Carpenter for reporting the potential integer overflow.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f511ab09df ]
They are used to decide if the controller can do DMA on a buffer
of a specific length and thus are needed before any transfer is attempted.
This fixes a memory leak where the SPI core uses the drivers can_dma()
callback to determine if a buffer needs to be mapped. As the watermark
levels aren't correct at that point the driver falsely claims to be able to
DMA the buffer when it fact it isn't.
After the transfer has been done the core uses the same callback to
determine if it needs to unmap the buffers. As the driver now correctly
claims to not being able to DMA the buffer the core doesn't attempt to
unmap the buffer which leaves the SGT leaking.
Fixes: f62caccd12 (spi: spi-imx: add DMA support)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9e71c589e4 ]
The reset control for the sunxi mmc controller is optional. Some
newer platforms (sun6i, sun8i, sun9i) have it, while older ones
(sun4i, sun5i, sun7i) don't.
Use the properly stubbed _optional version so the driver does not
fail to compile when RESET_CONTROLLER=n.
This patch also adds a check for deferred probing on the reset
control.
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Cc: <stable@vger.kernel.org> # 3.16+
Acked-by: David Lanzendörfer <david.lanzendoerfer@o2s.ch>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 47d68979cc ]
Since commit 20d0189b10
in v3.14-rc1 RAID0 has performed incorrect calculations
when the chunksize is not a power of 2.
This happens because "sector_div()" modifies its first argument, but
this wasn't taken into account in the patch.
So restore that first arg before re-using the variable.
Reported-by: Joe Landman <joe.landman@gmail.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Fixes: 20d0189b10
Cc: stable@vger.kernel.org (3.14 and later).
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e43c9c75f ]
The android_fence_release() function checks for active sync points
by calling list_empty() on the list head embedded on the sync
point. However, it is only valid to use list_empty() on nodes that
have been initialized with INIT_LIST_HEAD() or list_del_init().
Because the list entry has likely been removed from the active list
by sync_timeline_signal(), there is a good chance that this
WARN_ON_ONCE() will be hit due to dangling pointers pointing at
freed memory (even though the sync drivers did nothing wrong)
and memory corruption will ensue as the list entry is removed for
a second time, corrupting the active list.
This problem can be reproduced quite easily with CONFIG_DEBUG_LIST=y
and fences with more than one sync point.
Signed-off-by: Alistair Strachan <alistair.strachan@imgtec.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Colin Cross <ccross@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2c20d92dad ]
the lcd type as defined in the Kconfig is not matching in the code.
as a result the rs, rw and en pins were getting interchanged.
Kconfig defines the value of PANEL_LCD to be 1 if we select custom
configuration but in the code LCD_TYPE_CUSTOM is defined as 5.
my hardware is LCD_TYPE_CUSTOM, but the pins were assigned to it
as pins of LCD_TYPE_OLD, and it was not working.
Now values are corrected with referenece to the values defined in
Kconfig and it is working.
checked on JHD204A lcd with LCD_TYPE_CUSTOM configuration.
Cc: <stable@vger.kernel.org> # 2.6.32+
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f7f8aea4b9 ]
memsize denotes the amount of RAM we can access from kseg{0,1} and
that should be up to 256M. In case the bootloader reports a value
higher than that (perhaps reporting all the available RAM) it's best
if we fix it ourselves and just warn the user about that. This is
usually a problem with the bootloader and/or its environment.
[ralf@linux-mips.org: Remove useless parens as suggested bei Sergei.
Reformat long pr_warn statement to fit into 80 column limit.]
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: <stable@vger.kernel.org> # v3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9362/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f8483988ca ]
The lose_fpu() function only disables the FPU in CP0_Status.CU1 if the
FPU is in use and MSA isn't enabled.
This isn't necessarily a problem because KSTK_STATUS(current), the
version of CP0_Status stored on the kernel stack on entry from user
mode, does always get updated and gets restored when returning to user
mode, but I don't think it was intended, and it is inconsistent with the
case of only the FPU being in use. Sometimes leaving the FPU enabled may
also mask kernel bugs where FPU operations are executed when the FPU
might not be enabled.
So lets disable the FPU in the MSA case too.
Fixes: 33c771ba5c ("MIPS: save/disable MSA in lose_fpu")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9323/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 98119ad533 ]
Guest user mode can generate a guest MSA Disabled exception on an MSA
capable core by simply trying to execute an MSA instruction. Since this
exception is unknown to KVM it will be passed on to the guest kernel.
However guest Linux kernels prior to v3.15 do not set up an exception
handler for the MSA Disabled exception as they don't support any MSA
capable cores. This results in a guest OS panic.
Since an older processor ID may be being emulated, and MSA support is
not advertised to the guest, the correct behaviour is to generate a
Reserved Instruction exception in the guest kernel so it can send the
guest process an illegal instruction signal (SIGILL), as would happen
with a non-MSA-capable core.
Fix this as minimally as reasonably possible by preventing
kvm_mips_check_privilege() from relaying MSA Disabled exceptions from
guest user mode to the guest kernel, and handling the MSA Disabled
exception by emulating a Reserved Instruction exception in the guest,
via a new handle_msa_disabled() KVM callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fd1d0ddf2a ]
When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
only check it against a fixed limit, which historically is set
to 127. With the new dynamic IRQ allocation the effective limit may
actually be smaller (64).
So when now a malicious or buggy userland injects a SPI in that
range, we spill over on our VGIC bitmaps and bytemaps memory.
I could trigger a host kernel NULL pointer dereference with current
mainline by injecting some bogus IRQ number from a hacked kvmtool:
-----------------
....
DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
DEBUG: IRQ #114 still in the game, writing to bytemap now...
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ffffffc07652e000
[00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
Internal error: Oops: 96000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
Hardware name: FVP Base (DT)
task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
PC is at kvm_vgic_inject_irq+0x234/0x310
LR is at kvm_vgic_inject_irq+0x30c/0x310
pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
.....
So this patch fixes this by checking the SPI number against the
actual limit. Also we remove the former legacy hard limit of
127 in the ioctl code.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
[maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
as suggested by Christopher Covington]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ca3f087472 ]
kvm_write_guest_cached() does not mark all written pages as dirty and
code comments in kvm_gfn_to_hva_cache_init() talk about NULL memslot
with cross page accesses. Fix all the easy way.
The check is '<= 1' to have the same result for 'len = 0' cache anywhere
in the page. (nr_pages_needed is 0 on page boundary.)
Fixes: 8f964525a1 ("KVM: Allow cross page reads and writes from cached translations.")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Message-Id: <20150408121648.GA3519@potion.brq.redhat.com>
Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d744194956 ]
Sebastian reported a crash caused by a jump label mismatch after resume.
This happens because we do not save the kernel text section during suspend
and therefore also do not restore it during resume, but use the kernel image
that restores the old system.
This means that after a suspend/resume cycle we lost all modifications done
to the kernel text section.
The reason for this is the pfn_is_nosave() function, which incorrectly
returns that read-only pages don't need to be saved. This is incorrect since
we mark the kernel text section read-only.
We still need to make sure to not save and restore pages contained within
NSS and DCSS segment.
To fix this add an extra case for the kernel text section and only save
those pages if they are not contained within an NSS segment.
Fixes the following crash (and the above bugs as well):
Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
Found: c0 04 00 00 00 00
Expected: c0 f4 00 00 00 11
New: c0 04 00 00 00 00
Kernel panic - not syncing: Corrupted kernel text
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4
Call Trace:
[<0000000000113972>] show_stack+0x72/0xf0
[<000000000081f15e>] dump_stack+0x6e/0x90
[<000000000081c4e8>] panic+0x108/0x2b0
[<000000000081be64>] jump_label_bug.isra.2+0x104/0x108
[<0000000000112176>] __jump_label_transform+0x9e/0xd0
[<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
[<00000000001d1136>] multi_cpu_stop+0x12e/0x170
[<00000000001d1472>] cpu_stopper_thread+0xb2/0x168
[<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
[<0000000000158baa>] kthread+0x10a/0x110
[<0000000000824a86>] kernel_thread_starter+0x6/0xc
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 69a8d45626 ]
The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 15462e37ca ]
The reinjection of an I/O interrupt can fail if the list is at the limit
and between the dequeue and the reinjection, another I/O interrupt is
injected (e.g. if user space floods kvm with I/O interrupts).
This patch avoids this memory leak and returns -EFAULT in this special
case. This error is not recoverable, so let's fail hard. This can later
be avoided by not dequeuing the interrupt but working directly on the
locked list.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 261520dcfc ]
If the I/O interrupt could not be written to the guest provided
area (e.g. access exception), a program exception was injected into the
guest but "inti" wasn't freed, therefore resulting in a memory leak.
In addition, the I/O interrupt wasn't reinjected. Therefore the dequeued
interrupt is lost.
This patch fixes the problem while cleaning up the function and making the
cc and rc logic easier to handle.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eb132ccbde ]
Function-specific setup requests should be handled in such a way, that
apart from filling in the data buffer, the requests are also actually
enqueued: if function-specific setup is called from composte_setup(),
the "usb_ep_queue()" block of code in composite_setup() is skipped.
The printer function lacks this part and it results in e.g. get device id
requests failing: the host expects some response, the device prepares it
but does not equeue it for sending to the host, so the host finally asserts
timeout.
This patch adds enqueueing the prepared responses.
Cc: <stable@vger.kernel.org> # v3.4+
Fixes: 2e87edf492: "usb: gadget: make g_printer use composite"
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
commit 113e828386 upstream.
If we pass a length of 0 to the extent_same ioctl, we end up locking an
extent range with a start offset greater then its end offset (if the
destination file's offset is greater than zero). This results in a warning
from extent_io.c:insert_state through the following call chain:
btrfs_extent_same()
btrfs_double_lock()
lock_extent_range()
lock_extent(inode->io_tree, offset, offset + len - 1)
lock_extent_bits()
__set_extent_bit()
insert_state()
--> WARN_ON(end < start)
This leads to an infinite loop when evicting the inode. This is the same
problem that my previous patch titled
"Btrfs: fix inode eviction infinite loop after cloning into it" addressed
but for the extent_same ioctl instead of the clone ioctl.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9dc106617d)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
commit ccccf3d672 upstream.
If we attempt to clone a 0 length region into a file we can end up
inserting a range in the inode's extent_io tree with a start offset
that is greater then the end offset, which triggers immediately the
following warning:
[ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3914.620886] BTRFS: end < start 4095 4096
(...)
[ 3914.638093] Call Trace:
[ 3914.638636] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3914.639620] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3914.640789] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3914.642041] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3914.643236] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3914.644441] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3914.645711] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3914.646914] [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
[ 3914.648058] [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
[ 3914.650105] [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
[ 3914.651361] [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
[ 3914.652761] [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
[ 3914.654128] [<ffffffff811226dd>] ? might_fault+0x58/0xb5
[ 3914.655320] [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
(...)
[ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---
This later makes the inode eviction handler enter an infinite loop that
keeps dumping the following warning over and over:
[ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3915.119913] BTRFS: end < start 4095 4096
(...)
[ 3915.137394] Call Trace:
[ 3915.137913] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3915.139154] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3915.140316] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3915.141505] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3915.142709] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3915.143849] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3915.145120] [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
[ 3915.146352] [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
[ 3915.147565] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3915.148785] [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
[ 3915.149931] [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
[ 3915.151154] [<ffffffff81168904>] evict+0xa0/0x148
[ 3915.152094] [<ffffffff811689e5>] dispose_list+0x39/0x43
[ 3915.153081] [<ffffffff81169564>] evict_inodes+0xdc/0xeb
[ 3915.154062] [<ffffffff81154418>] generic_shutdown_super+0x49/0xef
[ 3915.155193] [<ffffffff811546d1>] kill_anon_super+0x13/0x1e
[ 3915.156274] [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
(...)
[ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---
So just bail out of the clone ioctl if the length of the region to clone
is zero, without locking any extent range, in order to prevent this issue
(same behaviour as a pwrite with a 0 length for example).
This is trivial to reproduce. For example, the steps for the test I just
made for fstests:
mkfs.btrfs -f SCRATCH_DEV
mount SCRATCH_DEV $SCRATCH_MNT
touch $SCRATCH_MNT/foo
touch $SCRATCH_MNT/bar
$CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
umount $SCRATCH_MNT
A test case for fstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 449b46275c)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit HEAD ]
commit dcc82f4783 upstream.
While committing a transaction we free the log roots before we write the
new super block. Freeing the log roots implies marking the disk location
of every node/leaf (metadata extent) as pinned before the new super block
is written. This is to prevent the disk location of log metadata extents
from being reused before the new super block is written, otherwise we
would have a corrupted log tree if before the new super block is written
a crash/reboot happens and the location of any log tree metadata extent
ended up being reused and rewritten.
Even though we pinned the log tree's metadata extents, we were issuing a
discard against them if the fs was mounted with the -o discard option,
resulting in corruption of the log tree if a crash/reboot happened before
writing the new super block - the next time the fs was mounted, during
the log replay process we would find nodes/leafs of the log btree with
a content full of zeroes, causing the process to fail and require the
use of the tool btrfs-zero-log to wipeout the log tree (and all data
previously fsynced becoming lost forever).
Fix this by not doing a discard when pinning an extent. The discard will
be done later when it's safe (after the new super block is committed) at
extent-tree.c:btrfs_finish_extent_commit().
Fixes: e688b7252f (Btrfs: fix extent pinning bugs in the tree log)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3909e5a93e)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b253149b84 ]
In Linux-3.9 we removed the mwait_idle() loop:
69fb3676df ("x86 idle: remove mwait_idle() and "idle=mwait" cmdline param")
The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT
loop, until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.
But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.
2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.
https://bugzilla.kernel.org/show_bug.cgi?id=60770
So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.
Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in
context For 3.11, 3.12, 3.13, this patch applies cleanly
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Mike Galbraith <bitbucket@online.de>
Cc: <stable@vger.kernel.org> # 3.9+
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ian Malone <ibmalone@gmail.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/345254a551eb5a6a866e048d7ab570fd2193aca4.1389763084.git.len.brown@intel.com
[ Ported to recent kernels. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 79930f5892 ]
build_skb() should look at the page pfmemalloc status.
If set, this means page allocator allocated this page in the
expectation it would help to free other pages. Networking
stack can do that only if skb->pfmemalloc is also set.
Also, we must refrain using high order pages from the pfmemalloc
reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
them. Under memory pressure, using order-0 pages is probably the best
strategy.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 845704a535 ]
Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.
Lets try a different strategy :
In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.
By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.
In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d83769a580 ]
Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.
To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.
In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.
This patch reverts d22e153718 ("tcp: fix tcp fin memory accounting")
Fixes: d22e153718 ("tcp: fix tcp fin memory accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3dfb05340e ]
Call checksum_complete_unset in PPP receive to discard checksum-complete
value. PPP does not pull checksum for headers and also modifies packet
as in VJ compression.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4e18b9adf2 ]
This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE
is set. This is called to discard checksum-complete when packet
is being modified and checksum is not pulled for headers in a layer.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2ab957492d ]
Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets
Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.
This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.
v2:
Remove useless comment
Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 07841f9d94 ]
When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 42eab005a5 ]
By default, the number of tx queues is limited by the number of online cpus
in mlx4_en_get_profile(). However, this limit no longer holds after the
ethtool .set_channels method has been called. In that situation, the driver
may access invalid bits of certain cpumask variables when queue_index >=
nr_cpu_ids.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Ido Shamay <idos@mellanox.com>
Fixes: d03a68f ("net/mlx4_en: Configure the XPS queue mapping on driver load")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit ae705930fc upstream.
There is an interesting bug in the vgic code, which manifests itself
when the KVM run loop has a signal pending or needs a vmid generation
rollover after having disabled interrupts but before actually switching
to the guest.
In this case, we flush the vgic as usual, but we sync back the vgic
state and exit to userspace before entering the guest. The consequence
is that we will be syncing the list registers back to the software model
using the GICH_ELRSR and GICH_EISR from the last execution of the guest,
potentially overwriting a list register containing an interrupt.
This showed up during migration testing where we would capture a state
where the VM has masked the arch timer but there were no interrupts,
resulting in a hung test.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Alex Bennee <alex.bennee@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 04b8dc85bf upstream.
The kernel's pgd_index macro is designed to index a normal, page
sized array. KVM is a bit diffferent, as we can use concatenated
pages to have a bigger address space (for example 40bit IPA with
4kB pages gives us an 8kB PGD.
In the above case, the use of pgd_index will always return an index
inside the first 4kB, which makes a guest that has memory above
0x8000000000 rather unhappy, as it spins forever in a page fault,
whist the host happilly corrupts the lower pgd.
The obvious fix is to get our own kvm_pgd_index that does the right
thing(tm).
Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
high address.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit a987370f8e upstream.
We're using __get_free_pages with to allocate the guest's stage-2
PGD. The standard behaviour of this function is to return a set of
pages where only the head page has a valid refcount.
This behaviour gets us into trouble when we're trying to increment
the refcount on a non-head page:
page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x4000000000000000()
page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
BUG: failure at include/linux/mm.h:548/get_page()!
Kernel panic - not syncing: BUG!
CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
Hardware name: APM X-Gene Mustang board (DT)
Call trace:
[<ffff80000008a09c>] dump_backtrace+0x0/0x13c
[<ffff80000008a1e8>] show_stack+0x10/0x1c
[<ffff800000691da8>] dump_stack+0x74/0x94
[<ffff800000690d78>] panic+0x100/0x240
[<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
[<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
[<ffff8000000a420c>] handle_exit+0x58/0x180
[<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
[<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
[<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
[<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
CPU0: stopping
A possible approach for this is to split the compound page using
split_page() at allocation time, and change the teardown path to
free one page at a time. It turns out that alloc_pages_exact() and
free_pages_exact() does exactly that.
While we're at it, the PGD allocation code is reworked to reduce
duplication.
This has been tested on an X-Gene platform with a 4kB/48bit-VA host
kernel, and kvmtool hacked to place memory in the second page of
the hardware PGD (PUD for the host kernel). Also regression-tested
on a Cubietruck (Cortex-A7).
[ Reworked to use alloc_pages_exact() and free_pages_exact() and to
return pointers directly instead of by reference as arguments
- Christoffer ]
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 0d3e4d4fad upstream.
When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.
That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.
At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.
Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 363ef89f8e upstream.
Let's assume a guest has created an uncached mapping, and written
to that page. Let's also assume that the host uses a cache-coherent
IO subsystem. Let's finally assume that the host is under memory
pressure and starts to swap things out.
Before this "uncached" page is evicted, we need to make sure
we invalidate potential speculated, clean cache lines that are
sitting there, or the IO subsystem is going to swap out the
cached view, loosing the data that has been written directly
into memory.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 801f6772ce upstream.
Commit b856a59141 (arm/arm64: KVM: Reset the HCR on each vcpu
when resetting the vcpu) moved the init of the HCR register to
happen later in the init of a vcpu, but left out the fixup
done in kvm_reset_vcpu when preparing for a 32bit guest.
As a result, the 32bit guest is run as a 64bit guest, but the
rest of the kernel still manages it as a 32bit. Fun follows.
Moving the fixup to vcpu_reset_hcr solves the problem for good.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 05971120fc upstream.
It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.
To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.
When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.
We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit ca7d9c829d upstream.
Userspace assumes that it can wire up IRQ injections after having
created all VCPUs and after having created the VGIC, but potentially
before starting the first VCPU. This can currently lead to lost IRQs
because the state of that IRQ injection is not stored anywhere and we
don't return an error to userspace.
We haven't seen this problem manifest itself yet, presumably because
guests reset the devices on boot, but this could cause issues with
migration and other non-standard startup configurations.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 716139df25 upstream.
When the vgic initializes its internal state it does so based on the
number of VCPUs available at the time. If we allow KVM to create more
VCPUs after the VGIC has been initialized, we are likely to error out in
unfortunate ways later, perform buffer overflows etc.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6d3cfbe21b upstream.
VGIC initialization currently happens in three phases:
(1) kvm_vgic_create() (triggered by userspace GIC creation)
(2) vgic_init_maps() (triggered by userspace GIC register read/write
requests, or from kvm_vgic_init() if not already run)
(3) kvm_vgic_init() (triggered by first VM run)
We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.
This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).
Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.
[ This patch is originally written by Peter Maydell, but I have
modified it somewhat heavily, renaming various bits and moving code
around. If something is broken, I am to be blamed. - Christoffer ]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 957db105c9 upstream.
Introduce a new function to unmap user RAM regions in the stage2 page
tables. This is needed on reboot (or when the guest turns off the MMU)
to ensure we fault in pages again and make the dcache, RAM, and icache
coherent.
Using unmap_stage2_range for the whole guest physical range does not
work, because that unmaps IO regions (such as the GIC) which will not be
recreated or in the best case faulted in on a page-by-page basis.
Call this function on secondary and subsequent calls to the
KVM_ARM_VCPU_INIT ioctl so that a reset VCPU will detect the guest
Stage-1 MMU is off when faulting in pages and make the caches coherent.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit cf5d318865 upstream.
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.
Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit b856a59141 upstream.
When userspace resets the vcpu using KVM_ARM_VCPU_INIT, we should also
reset the HCR, because we now modify the HCR dynamically to
enable/disable trapping of guest accesses to the VM registers.
This is crucial for reboot of VMs working since otherwise we will not be
doing the necessary cache maintenance operations when faulting in pages
with the guest MMU off.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 3ad8b3de52 upstream.
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.
Implement the expected functionality and clarify the ABI.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 03f1d4c17e upstream.
If a VCPU was originally started with power off (typically to be brought
up by PSCI in SMP configurations), there is no need to clear the
POWER_OFF flag in the kernel, as this flag is only tested during the
init ioctl itself.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 849260c72c upstream.
Readonly memslots are often used to implement emulation of ROMs and
NOR flashes, in which case the guest may legally map these regions as
uncached.
To deal with the incoherency associated with uncached guest mappings,
treat all readonly memslots as incoherent, and ensure that pages that
belong to regions tagged as such are flushed to DRAM before being passed
to the guest.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 840f4bfbe0 upstream.
To allow handling of incoherent memslots in a subsequent patch, this
patch adds a paramater 'ipa_uncached' to cache_coherent_guest_page()
so that we can instruct it to flush the page's contents to DRAM even
if the guest has caching globally enabled.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 1050dcda30 upstream.
Memory regions may be incoherent with the caches, typically when the
guest has mapped a host system RAM backed memory region as uncached.
Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots
and handle them appropriately when mapping them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9ab3b598d2 ]
A race condition starts to be visible in recent mmotm, where a PG_hwpoison
flag is set on a migration source page *before* it's back in buddy page
poo= l.
This is problematic because no page flag is supposed to be set when
freeing (see __free_one_page().) So the user-visible effect of this race
is that it could trigger the BUG_ON() when soft-offlining is called.
The root cause is that we call lru_add_drain_all() to make sure that the
page is in buddy, but that doesn't work because this function just
schedule= s a work item and doesn't wait its completion.
drain_all_pages() does drainin= g directly, so simply dropping
lru_add_drain_all() solves this problem.
Fixes: f15bdfa802 ("mm/memory-failure.c: fix memory leak in successful soft offlining")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aa8e22128b ]
If a USB serial device is unplugged while there is an active program
using the device it may spam the logs with -EPROTO (71) messages as it
attempts to retry.
Most serial usb drivers (metro-usb, pl2303, mos7840, ...) only output
these messages for debugging. The generic driver treats these as
errors.
Change the default output for the generic serial driver from error to
debug to silence these non-critical errors.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Register MIDR_EL1 is masked to get variant and revision fields, then
compared against midr_range_min and midr_range_max when checking
whether CPU is affected by any particular erratum. However, variant
and revision fields in MIDR_EL1 are separated by 16 bits, so the min
and max of midr range should be constructed accordingly, otherwise
the patch will not be applied when variant field is non-0.
Acked-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Bo Yan <byan@nvidia.com>
[will: use MIDR_VARIANT_SHIFT to construct upper bound]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit 6d1966dfd6)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
When running a compat (AArch32) userspace on Cortex-A53, a load at EL0
from a virtual address that matches the bottom 32 bits of the virtual
address used by a recent load at (AArch64) EL1 might return incorrect
data.
This patch works around the issue by writing to the contextidr_el1
register on the exception return path when returning to a 32-bit task.
This workaround is patched in at runtime based on the MIDR value of the
processor.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit 905e8c5dca)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Not all of the errata we have workarounds for apply necessarily to all
SoCs, so people compiling a kernel for one very specific SoC may not
need to patch the kernel.
Introduce a new submenu in the "Platform selection" menu to allow
people to turn off certain bugs if they are not affected. By default
all of them are enabled.
Normal users or distribution kernels shouldn't bother to deselect any
bugs here, since the alternatives framework will take care of
patching them in only if needed.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[will: moved kconfig menu under `Kernel Features']
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit c0a01b84b1)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The ARM erratum 832075 applies to certain revisions of Cortex-A57,
one of the workarounds is to change device loads into using
load-aquire semantics.
This is achieved using the alternatives framework.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit 5afaa1fc1b)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
The ARM errata 819472, 826319, 827319 and 824069 define the same
workaround for these hardware issues in certain Cortex-A53 parts.
Use the new alternatives framework and the CPU MIDR detection to
patch "cache clean" into "cache clean and invalidate" instructions if
an affected CPU is detected at runtime.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[will: add __maybe_unused to squash gcc warning]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit 301bcfac42)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
After each CPU has been started, we iterate through a list of
CPU features or bugs to detect CPUs which need (or could benefit
from) kernel code patches.
For each feature/bug there is a function which checks if that
particular CPU is affected. We will later provide some more generic
functions for common things like testing for certain MIDR ranges.
We do this for every CPU to cover big.LITTLE systems properly as
well.
If a certain feature/bug has been detected, the capability bit will
be set, so that later the call to apply_alternatives() will trigger
the actual code patching.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit e116a37542)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
With a blatant copy of some x86 bits we introduce the alternative
runtime patching "framework" to arm64.
This is quite basic for now and we only provide the functions we need
at this time.
This is connected to the newly introduced feature bits.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit e039ee4ee3)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
For taking note if at least one CPU in the system needs a bug
workaround or would benefit from a code optimization, we create a new
bitmap to hold (artificial) feature bits.
Since elf_hwcap is part of the userland ABI, we keep it alone and
introduce a new data structure for that (along with some accessors).
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org> # v3.18.y
(cherry picked from commit 930da09f5e)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d0af71a357 ]
tg3_init_one() calls tg3_halt() without tp->lock despite its assumption
and causes deadlock.
If lockdep is enabled, a warning like this shows up before the stall:
[ BUG: bad unlock balance detected! ]
3.19.0test #3 Tainted: G E
-------------------------------------
insmod/369 is trying to release lock (&(&tp->lock)->rlock) at:
[<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3]
but there are no more locks to release!
tg3_init_one() doesn't call tg3_halt() under normal situation but
during kexec kdump I hit this problem.
Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7a1e890e21 ]
cdc_ncm disagrees with usbnet about how much framing overhead should
be counted in the tx_bytes statistics, and tries 'fix' this by
decrementing tx_bytes on the transmit path. But statistics must never
be decremented except due to roll-over; this will thoroughly confuse
user-space. Also, tx_bytes is only incremented by usbnet in the
completion path.
Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a
tx_bytes delta along with the tx_packets count.
Fixes: beeecd42c3 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1e9e39f4a2 ]
Currently the usbnet core does not update the tx_packets statistic for
drivers with FLAG_MULTI_PACKET and there is no hook in the TX
completion path where they could do this.
cdc_ncm and dependent drivers are bumping tx_packets stat on the
transmit path while asix and sr9800 aren't updating it at all.
Add a packet count in struct skb_data so these drivers can fill it
in, initialise it to 1 for other drivers, and add the packet count
to the tx_packets statistic on completion.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b736a623bd ]
handle_offloads() calls skb_reset_inner_headers() to store
the layer pointers to the encapsulated packet. However, we
currently push the vlag tag (if there is one) onto the packet
afterwards. This changes the MAC header for the encapsulated
packet but it is not reflected in skb->inner_mac_header, which
breaks GSO and drivers which attempt to use this for encapsulation
offloads.
Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.")
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 74f47278cb ]
In case of error vxlan_xmit_one() can free already freed skb.
Also fixes memory leak of dst-entry.
Fixes: acbf74a763 ("vxlan: Refactor vxlan driver to make use
of the common UDP tunnel functions").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b4bef1b575 ]
Since both tx and rx paths work with skb->vlan_tci, there's no need for
this function anymore. Switch users directly to __vlan_hwaccel_put_tag.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 213dd74aee ]
On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e8e ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e8e ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels. While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.
Sure.
PS I used the wrong email for James the first time around. So
let me repeat the question here. Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?
---8<---
The commit ea23192e8e ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels. While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.
This patch rearranges skb_scrub_packet to preserve the mark field.
Fixes: ea23192e8e ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4c0ee414e8 ]
This patch reverts commit b8fb4e0648
because the secmark must be preserved even when a packet crosses
namespace boundaries. The reason is that security labels apply to
the system as a whole and is not per-namespace.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c3de6317d7 ]
Due to missing bounds check the DAG pass of the BPF verifier can corrupt
the memory which can cause random crashes during program loading:
[8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
[8.451293] IP: [<ffffffff811de33d>] kmem_cache_alloc_trace+0x8d/0x2f0
[8.452329] Oops: 0000 [#1] SMP
[8.452329] Call Trace:
[8.452329] [<ffffffff8116cc82>] bpf_check+0x852/0x2000
[8.452329] [<ffffffff8116b7e4>] bpf_prog_load+0x1e4/0x310
[8.452329] [<ffffffff811b190f>] ? might_fault+0x5f/0xb0
[8.452329] [<ffffffff8116c206>] SyS_bpf+0x806/0xa30
Fixes: f1bca824da ("bpf: add search pruning optimization to verifier")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 074975d037 ]
Commit 9a2620c877 ("bnx2x: prevent WARN during driver unload")
switched the napi/busy_lock locking mechanism from spin_lock() into
spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
disables interrupts prior to calling our napi mechanism.
This switches the driver into using atomic assignments instead of the
spinlock mechanisms previously employed.
Based on initial patch from Yuval Mintz & Ariel Elior
I basically added softirq starvation avoidance, and mixture
of atomic operations, plain writes and barriers.
Note this slightly reduces the overhead for this driver when no
busy_poll sockets are in use.
Fixes: 9a2620c877 ("bnx2x: prevent WARN during driver unload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b50edd7812 ]
I noticed tcpdump was giving funky timestamps for locally
generated SYNACK messages on loopback interface.
11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
3160535375:3160535375(0) ack 945476043 win 43690 <mss
65495,nop,nop,sackOK,nop,wscale 7>
This is because we need to clear skb->tstamp before
entering lower stack, otherwise net_timestamp_check()
does not set skb->tstamp.
Fixes: 7faee5c0d5 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fde913e254 ]
Commit 1daa4303b4 ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.
Fixes: 1daa4303b4 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f60e5990d9 ]
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 666b805150 ]
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0c36820e2a ]
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d9 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f5e2dc5d7f ]
Before commit 3900f29021 ("bonding: slight
optimizztion for bond_slave_override()") the override logic was to send packets
with non-zero queue_id through the slave with corresponding queue_id, under two
conditions only - if the slave can transmit and it's up.
The above mentioned commit changed this logic by introducing an additional
condition - whether the bond is active (indirectly, using the slave_can_tx and
later - bond_is_active_slave), that prevents the user from implementing more
complex policies according to the Documentation/networking/bonding.txt.
Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: Alexey Bogoslavsky <alexey@swortex.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4ad19de877 ]
tcp_v6_fill_cb() will be called twice if socket's state changes from
TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
corruption because in the second tcp_v6_fill_cb() call it's not copying
IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
unpredictable results. Performance loss of up to 1200% has been observed
in LTP/vxlan03 test.
This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
called again.
Fixes: 2dc49d1680 ("tcp6: don't move IP6CB before xfrm6_policy_check()")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6fd99094de ]
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e5eda89d97 ]
Netdevice registration should be performed a the end of the driver
initialization flow. If we don't do that, after calling register_netdevice,
device callbacks may be issued by higher layers of the stack before
final configuration of the device is done.
For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
after the register_netdev command. System network scripts may configure
the interface (UP) right after the registration, which also attach
unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
causing the firmware to fail the rule attachment.
Fixes: 837052d0cc ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d0c294c53a ]
On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
struct dst_entry *dst = sk->sk_rx_dst;
if (dst)
dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
to code reading sk->sk_rx_dst twice, once for the test and once for
the argument of ip6_dst_check() (dst_check() is inline). This allows
ip6_dst_check() to be called with null first argument, causing a crash.
Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
TCP early demux code.
Fixes: 41063e9dd1 ("ipv4: Early TCP socket demux.")
Fixes: c7109986db ("ipv6: Early TCP socket demux")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 230fa253df ]
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
scalar types as suggested by Linus Torvalds. Accesses larger than
the machines word size cannot be guaranteed to be atomic. These
macros will use memcpy and emit a build warning.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 744961341d ]
KVM guest can fail to startup with following trace on host:
qemu-system-x86: page allocation failure: order:4, mode:0x40d0
Call Trace:
dump_stack+0x47/0x67
warn_alloc_failed+0xee/0x150
__alloc_pages_direct_compact+0x14a/0x150
__alloc_pages_nodemask+0x776/0xb80
alloc_kmem_pages+0x3a/0x110
kmalloc_order+0x13/0x50
kmemdup+0x1b/0x40
__kvm_set_memory_region+0x24a/0x9f0 [kvm]
kvm_set_ioapic+0x130/0x130 [kvm]
kvm_set_memory_region+0x21/0x40 [kvm]
kvm_vm_ioctl+0x43f/0x750 [kvm]
Failure happens when attempting to allocate pages for
'struct kvm_memslots', however it doesn't have to be
present in physically contiguous (kmalloc-ed) address
space, change allocation to kvm_kvzalloc() so that
it will be vmalloc-ed when its size is more then a page.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5885ebda87 ]
A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.
Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.
Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6f30b7e37a ]
Commit 4f579ae7de (ext4: fix punch hole on files with indirect
mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
mapping. However, there are bugs in several corner cases. This fixes 5
distinct bugs:
1. When there is at least one entire level of indirection between the
start and end of the punch range and the end of the punch range is the
first block of its level, we can't return early; we have to free the
intervening levels.
2. When the end is at a higher level of indirection than the start and
ext4_find_shared returns a top branch for the end, we still need to free
the rest of the shared branch it returns; we can't decrement partial2.
3. When a punch happens within one level of indirection, we need to
converge on an indirect block that contains the start and end. However,
because the branches returned from ext4_find_shared do not necessarily
start at the same level (e.g., the partial2 chain will be shallower if
the last block occurs at the beginning of an indirect group), the walk
of the two chains can end up "missing" each other and freeing a bunch of
extra blocks in the process. This mismatch can be handled by first
making sure that the chains are at the same level, then walking them
together until they converge.
4. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the start, we must free it,
but only if the end does not occur within that branch.
5. When the punch happens within one level of indirection and
ext4_find_shared returns a top branch for the end, then we shouldn't
free the block referenced by the end of the returned chain (this mirrors
the different levels case).
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a127d2bcf1 ]
The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. The associated
call graph is :
cpuidle_idle_call()
|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
|_____tick_broadcast_set_event()
|____clockevents_program_event()
|____bc_set_next()
The hrtimer_{start/cancel} functions call into tracing which uses RCU.
But it is not legal to call into RCU in cpuidle because it is one of the
quiescent states. Hence protect this region with RCU_NONIDLE which informs
RCU that the cpu is momentarily non-idle.
As an aside it is helpful to point out that the clock event device that is
programmed here is not a per-cpu clock device; it is a
pseudo clock device, used by the broadcast framework alone.
The per-cpu clock device programming never goes through bc_set_next().
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 61a3855bb7 ]
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes: c37791349c ('IB/mlx4: Support PMA counters for IBoE')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit da321133b5 ]
The rate provided at the output of a clk-divider is calculated as:
DIV_ROUND_UP(parent_rate, div)
since commit b11d282dbe (clk: divider: fix rate calculation for
fractional rates). So to yield a rate not bigger than r parent_rate
must be <= r * div.
The effect of choosing a parent rate that is too big as was done before
this patch results in wrongly ruling out good dividers.
Note that this is not a complete fix as __clk_round_rate might return a
value >= its 2nd parameter. Also for dividers with
CLK_DIVIDER_ROUND_CLOSEST set the calculation is not accurate. But this
fixes the test case by Sascha Hauer that uses a chain of three dividers
under a fixed clock.
Fixes: b11d282dbe (clk: divider: fix rate calculation for fractional rates)
Suggested-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 26bac95aa8 ]
It's an invalid approach to assume that among two divider values
the one nearer the exact divider is the better one.
Assume a parent rate of 1000 Hz, a divider with CLK_DIVIDER_POWER_OF_TWO
and a target rate of 89 Hz. The exact divider is ~ 11.236 so 8 and 16
are the candidates to choose from yielding rates 125 Hz and 62.5 Hz
respectivly. While 8 is nearer to 11.236 than 16 is, the latter is still
the better divider as 62.5 is nearer to 89 than 125 is.
Fixes: 774b514390 (clk: divider: Add round to closest divider)
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Maxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0e66100637 ]
Stopping the vb2 thread (as used by several DVB devices) can result
in an 'UNBALANCED' warning such as this:
vb2: counters for queue ffff880407ee9828: UNBALANCED!
vb2: setup: 1 start_streaming: 1 stop_streaming: 1
vb2: wait_prepare: 249333 wait_finish: 249334
This is due to a race condition between stopping the thread and
calling vb2_internal_streamoff(). While I have not been able to deduce
the exact mechanism how this race condition can produce this warning,
I can see that the way the stream is stopped is likely to lead to a
race somewhere.
This patch simplifies how this is done by first ensuring that the
thread is completely stopped before cleaning up the vb2 queue. It
does that by setting threadio->stop to true, followed by a call to
vb2_queue_error() which will wake up the thread. The thread sees that
'stop' is true and it will exit.
The call to kthread_stop() waits until the thread has exited, and only
then is the queue cleaned up by calling __vb2_cleanup_fileio().
This is a much cleaner sequence and the warning has now disappeared.
Reported-by: Jurgen Kramer <gtmkramer@xs4all.nl>
Tested-by: Jurgen Kramer <gtmkramer@xs4all.nl>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org> # for v3.18 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 05b676ab42 ]
TASK_SIZE is depends on the systems architecture (32 or 64 bits) and it
should not be used for defining offset boundary for mmaping buffers for
CAPTURE and OUTPUT queues. This patch fixes support for MMAP calls on
the CAPTURE queue on 64bit architectures (like ARM64).
Cc: stable@vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kamil Debski <k.debski@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b815fc12d4 ]
This fixes a oops due to a double list add when adding a reject PDU for
iscsit_allocate_iovecs allocation failures. The cmd has already been
added to the conn_cmd_list in iscsit_setup_scsi_cmd, so this has us call
iscsit_reject_cmd.
Note that for ERL0 the reject PDU is not actually sent, so this patch
is not completely tested. Just verified we do not oops. The problem is the
add reject functions return -1 which is returned all the way up to
iscsi_target_rx_thread which for ERL0 will drop the connection.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit deeb8525f9 ]
If we fail past the aio_setup_ring(), we need to destroy the
mapping. We don't need to care about anybody having found ctx,
or added requests to it, since the last failure exit is exactly
the failure to make ctx visible to lookups.
Reproducer (based on one by Joe Mario <jmario@redhat.com>):
void count(char *p)
{
char s[80];
printf("%s: ", p);
fflush(stdout);
sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
system(s);
}
int main()
{
io_context_t *ctx;
int created, limit, i, destroyed;
FILE *f;
count("before");
if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
perror("opening aio-max-nr");
else if (fscanf(f, "%d", &limit) != 1)
fprintf(stderr, "can't parse aio-max-nr\n");
else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
perror("allocating aio_context_t array");
else {
for (i = 0, created = 0; i < limit; i++) {
if (io_setup(1000, ctx + created) == 0)
created++;
}
for (i = 0, destroyed = 0; i < created; i++)
if (io_destroy(ctx[i]) == 0)
destroyed++;
printf("created %d, failed %d, destroyed %d\n",
created, limit - created, destroyed);
count("after");
}
}
Found-by: Joe Mario <jmario@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 64b4e2526d ]
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit be0b5e6358 ]
Transmission of an AP beacon does not call the TX interrupt service routine,
which usually does the cleanup. Instead, cleanup is handled in a tasklet
completion routine. Unfortunately, this routine has a serious bug in that it does
not release the DMA mapping before it frees the skb, thus one IOMMU mapping is
leaked for each beacon. The test system failed with no free IOMMU mapping slots
approximately one hour after hostapd was used to start an AP.
This issue was reported and tested at https://github.com/lwfinger/rtlwifi_new/issues/30.
Reported-and-tested-by: Kevin Mullican <kevin@mullican.com>
Cc: Kevin Mullican <kevin@mullican.com>
Signed-off-by: Shao Fu <shaofu@realtek.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [3.18+]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7168440690 ]
Device domains never span IOMMU hardware units, which allows the
domain ID space for each IOMMU to be an independent address space.
Therefore we can have multiple, independent domains, each with the
same domain->id, but attached to different hardware units. This is
also why we need to do a heavy-weight search for VM domains since
they can span multiple IOMMUs hardware units and we don't require a
single global ID to use for all hardware units.
Therefore, if we call iommu_detach_domain() across all active IOMMU
hardware units for a non-VM domain, the result is that we clear domain
IDs that are not associated with our domain, allowing them to be
re-allocated and causing apparent coherency issues when the device
cannot access IOVAs for the intended domain.
This bug was introduced in commit fb170fb4c5 ("iommu/vt-d: Introduce
helper functions to make code symmetric for readability"), but is
significantly exacerbated by the more recent commit 62c22167dd
("iommu/vt-d: Fix dmar_domain leak in iommu_attach_device") which calls
domain_exit() more frequently to resolve a domain leak.
Fixes: fb170fb4c5 ("iommu/vt-d: Introduce helper functions to make code symmetric for readability")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # v3.17+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e1e9bda22d ]
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:
Thread 1 Thread 2
======== ========
inv_file = NULL
refind = 0
spin_lock(&cifs_file_list_lock)
// invalidHandle found on openFileList
inv_file = open_file
// inv_file->count currently 1
cifsFileInfo_get(inv_file)
// inv_file->count = 2
spin_unlock(&cifs_file_list_lock);
cifs_reopen_file() cifs_close()
// fails (rc != 0) ->cifsFileInfo_put()
spin_lock(&cifs_file_list_lock)
// inv_file->count = 1
spin_unlock(&cifs_file_list_lock)
spin_lock(&cifs_file_list_lock);
list_move_tail(&inv_file->flist,
&cifs_inode->openFileList);
spin_unlock(&cifs_file_list_lock);
cifsFileInfo_put(inv_file);
->spin_lock(&cifs_file_list_lock)
// inv_file->count = 0
list_del(&cifs_file->flist);
// cleanup!!
kfree(cifs_file);
spin_unlock(&cifs_file_list_lock);
spin_lock(&cifs_file_list_lock);
++refind;
// refind = 1
goto refind_writable;
At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@samba.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2477bc58d4 ]
While attempting to clone a file on a samba server, we receive a
STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
isn't handled in smb2_clone_range(). We end up looping in the while loop
making same call to the samba server over and over again.
The proposed fix is to exit and return the error value when encountered
with an unhandled error.
Cc: <stable@vger.kernel.org>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e4934c6d6 ]
When the receiver was enabled during startup, a character could
have been in the FIFO when the UART get initially used. The
driver configures the (receive) watermark level, and flushes the
FIFO. However, the receive flag (RDRF) could still be set at that
stage (as mentioned in the register description of UARTx_RWFIFO).
This leads to an interrupt which won't be handled properly in
interrupt mode: The receive interrupt function lpuart_rxint checks
the FIFO count, which is 0 at that point (due to the flush
during initialization). The problem does not manifest when using
DMA to receive characters.
Fix this situation by explicitly read the status register, which
leads to clearing of the RDRF flag. Due to the flush just after
the status flag read, a explicit data read is not to required.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4e8f245937 ]
Specify transmit FIFO size which might be different depending on
LPUART instance. This makes sure uart_wait_until_sent in serial
core getting called, which in turn waits and checks if the FIFO
is really empty on shutdown by using the tx_empty callback.
Without the call of this callback, the last several characters
might not yet be transmitted when closing the serial port. This
can be reproduced by simply using echo and redirect the output to
a ttyLP device.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 227a4fd801 ]
When a device with an isochronous endpoint is plugged into the Intel
xHCI host controller, and the driver submits multiple frames per URB,
the xHCI driver will set the Block Event Interrupt (BEI) flag on all
but the last TD for the URB. This causes the host controller to place
an event on the event ring, but not send an interrupt. When the last
TD for the URB completes, BEI is cleared, and we get an interrupt for
the whole URB.
However, under Intel xHCI host controllers, if the event ring is full
of events from transfers with BEI set, an "Event Ring is Full" event
will be posted to the last entry of the event ring, but no interrupt
is generated. Host will cease all transfer and command executions and
wait until software completes handling the pending events in the event
ring. That means xHC stops, but event of "event ring is full" is not
notified. As the result, the xHC looks like dead to user.
This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
it should be backported to kernels as old as 3.0, that contains the
commit 69e848c209 ("Intel xhci: Support EHCI/xHCI port switching.").
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alistair Grant <akgrant0710@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9425183d17 ]
Linux xHCI driver doesn't report and handle port cofig error change.
If Port Configure Error for root hub port occurs, CEC bit in PORTSC
would be set by xHC and remains 1. This happends when the root port
fails to configure its link partner, e.g. the port fails to exchange
port capabilities information using Port Capability LMPs.
Then the Port Status Change Events will be blocked until all status
change bits(CEC is one of the change bits) are cleared('0') (refer to
xHCI spec 4.19.2). Otherwise, the port status change event for this
root port will not be generated anymore, then root port would look
like dead for user and can't be recovered until a Host Controller
Reset(HCRST).
This patch is to check CEC bit in PORTSC in xhci_get_port_status()
and set a Config Error in the return status if CEC is set. This will
cause a ClearPortFeature request, where CEC bit is cleared in
xhci_clear_port_change_bit().
[The commit log is based on initial Marvell patch posted at
http://marc.info/?l=linux-kernel&m=142323612321434&w=2]
Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: stable <stable@vger.kernel.org> # v3.2+
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c7e8bdf587 ]
Fix a bug that leads to showing the name and description of C-state C0
as "<null>" in sysfs after the ACPI C-states changed (e.g. after AC->DC
or DC->AC
transition).
The function poll_idle_init() in drivers/cpuidle/driver.c initializes the
state 0 during cpuidle_register_driver(), so we better do not overwrite it
again with '\0' during acpi_processor_cst_has_changed().
Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d75e4af14e ]
Thomas Schlichter reports the following issue on his Samsung NC20:
"The C-states C1 and C2 to the OS when connected to AC, and additionally
provides the C3 C-state when disconnected from AC. However, the number
of C-states shown in sysfs is fixed to the number of C-states present
at boot.
If I boot with AC connected, I always only see the C-states up to C2
even if I disconnect AC.
The reason is commit 130a5f6924 (ACPI / cpuidle: remove dev->state_count
setting). It removes the update of dev->state_count, but sysfs uses
exactly this variable to show the C-states.
The fix is to use drv->state_count in sysfs. As this is currently the
last user of dev->state_count, this variable can be completely removed."
Remove dev->state_count as per the above.
Reported-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 555828ef45 ]
Return EPROBE_DEFER if Regulator returns EPROBE_DEFER
If the Flexcan driver is built into kernel and a regulator is used to
enable the CAN transceiver, the Flexcan driver may not use the regulator.
When initializing the Flexcan device with a regulator defined in the device
tree, but not initialized, the regulator subsystem returns EPROBE_DEFER, hence
the Flexcan init fails.
The solution for this is to return EPROBE_DEFER if regulator is not initialized
and wait until the regulator is initialized.
Signed-off-by: Andreas Werner <kernel@andy89.org>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 80313b3078 ]
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.
The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.
Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.
( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: <stable@vger.kernel.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1cf48f22c9 ]
sc->nbcnvifs tracks assigned beacon slots, not enabled beacons.
Therefore, it cannot be used to decide if cur_conf->enable_beacon (bool)
should be updated, or if beacons have been enabled already.
With the current code (depending on the order of calls), beacons often
do not get enabled in an AP+STA setup.
To fix tracking of enabled beacons, convert cur_conf->enable_beacon to a
bitmask of enabled beacon slots.
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 02d88b735f ]
In omap_dma_start_desc the vdesc->node is removed from the virt-dma
framework managed lists (to be precise from the desc_issued list).
If a terminate_all comes before the transfer finishes the omap_desc will
not be freed up because it is not in any of the lists and we stopped the
DMA channel so the transfer will not going to complete.
There is no special sequence for leaking memory when using cyclic (audio)
transfer: with every start and stop of a cyclic transfer the driver leaks
struct omap_desc worth of memory.
Free up the allocated memory directly in omap_dma_terminate_all() since the
framework will not going to do that for us.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: <stable@vger.kernel.org>
CC: <linux-omap@vger.kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5ca9e7ce6e ]
If edma_terminate_all() was called while a transfer was running (i.e. after
edma_execute() but before edma_callback()) the echan->edesc was not freed.
This was due to the fact that a running transfer is on none of the
vchan lists: desc_submitted, desc_issued, desc_completed (edma_execute()
removes it from the desc_issued list), so the vchan_dma_desc_free_list()
called at the end of edma_terminate_all() didn't find it and didn't free it.
This bug was found on an AM1808 based hardware (very similar to da850evm,
however using the second MMC/SD controller), where intense operations on the SD
card wasted the device 128MB RAM within a couple of days.
Peter Ujfalusi:
The issue is even more severe since it affects cyclic (audio) transfers as
well. In this case starting/stopping audio will results memory leak.
Signed-off-by: Petr Kulhavy <petr@barix.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
CC: <stable@vger.kernel.org>
CC: <linux-omap@vger.kernel.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f54e9f2be3 ]
Depending on conversion mode used, the ADC clock (ADCK) needs
to be below a maximum frequency. According to Vybrid's data
sheet this is 20MHz for the low power conversion mode.
The ADC clock is depending on input clock, which is the bus
clock by default. Vybrid SoC are typically clocked at at 400MHz
or 500MHz, which leads to 66MHz or 83MHz bus clock respectively.
Hence, a divider of 8 is required to stay below the specified
maximum clock of 20MHz.
Due to the different bus clock speeds, the resulting sampling
frequency is not static. Hence use the ADC clock and calculate
the actual available sampling frequency dynamically.
This fixes bogous values observed on some 500MHz clocked Vybrid
SoC. The resulting value usually showed Bit 9 being stuck at 1,
or 0, which lead to a value of +/-512.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Fugang Duan <B38611@freescale.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c1b03ab5e8 ]
When an error occurred during event registration memory was freed twice
resulting in kernel memory corruption and a crash in unrelated code.
The problem was caused by
iio_device_unregister_eventset()
iio_device_unregister_sysfs()
being called twice, once on the error path and then
again via iio_dev_release().
Fix this by making these two functions idempotent so they
may be called multiple times.
The problem was observed before applying
78b33216 iio:core: Handle error when mask type is not separate
Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bba0bdd7ad ]
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
BUG: unable to handle kernel paging request at ffffc9001a6bc084
IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
Call Trace:
[<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
[<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
[<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
[<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
[<ffffffff81223b37>] __blk_run_queue+0x27/0x30
[<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
[<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
[<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
[<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
[<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
[<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
[<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
[<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
[<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
[<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
[<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
[<ffffffff811589de>] vfs_write+0xce/0x140
[<ffffffff81158b53>] sys_write+0x53/0xa0
[<ffffffff81464592>] system_call_fastpath+0x16/0x1b
[<00007f611c9d9300>] 0x7f611c9d92ff
Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b229a0f840 ]
This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
to avoid attaching a TTY to the JTAG port as this board is based on the
CALAO Systems reference design and needs the same fix up.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
CC: stable <stable@vger.kernel.org>
[johan: clean up probe logic ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5e71fc8629 ]
Add USB VID/PID for Xircom PGMFHUB USB/serial component. (The hub and SCSI
bridge on that hardware are recognized out of the box by existing drivers.)
Tested VID/PID using new_id and loopback connection and was met with
success, but that's all the testing done.
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c72efb658f ]
From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 23 Mar 2015 00:08:19 -0400
2f800fbd77 ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.
bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation. While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.
Fix it by subtracing min of the old and new values when calculating
delta. AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported. The
risk of the fix is very low, so tagged for -stable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Fixes: 2f800fbd77 ("writeback: fix dirtied pages accounting on redirty")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d70e15480 ]
global_update_bandwidth() uses static variable update_time as the
timestamp for the last update but forgets to initialize it to
INITIALIZE_JIFFIES.
This means that global_dirty_limit will be 5 mins into the future on
32bit and some large amount jiffies into the past on 64bit. This
isn't critical as the only effect is that global_dirty_limit won't be
updated for the first 5 mins after booting on 32bit machines,
especially given the auxiliary nature of global_dirty_limit's role -
protecting against global dirty threshold's sudden dips; however, it
does lead to unintended suboptimal behavior. Fix it.
Fixes: c42843f2f0 ("writeback: introduce smoothed global dirty limit")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c75de0ac07 ]
All CPUs leaving the first-online CPU are hotplugged out on suspend and
and cpufreq core stops managing them.
On resume, we need to call cpufreq_update_policy() for this CPU's policy
to make sure its frequency is in sync with cpufreq's cached value, as it
might have got updated by hardware during suspend/resume.
The policies are always added to the top of the policy-list. So, in
normal circumstances, CPU 0's policy will be the last one in the list.
And so the code checks for the last policy.
But there are cases where it will fail. Consider quad-core system, with
policy-per core. If CPU0 is hotplugged out and added back again, the
last policy will be on CPU1 :(
To fix this in a proper way, always look for the policy of the first
online CPU. That way we will be sure that we are calling
cpufreq_update_policy() for the only CPU that wasn't hotplugged out.
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Fixes: 2f0aea9363 ("cpufreq: suspend governors on system suspend/hibernate")
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 746db9443e ]
When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.
I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.
Signed-off-by: Brian Silverman <brian@peloton-tech.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: austin@peloton-tech.com
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cfa8694382 ]
Commit 3c605096d3 ("mm/page_alloc: restrict max order of merging on
isolated pageblock") changed the logic of unset_migratetype_isolate to
check the buddy allocator and explicitly call __free_pages to merge.
The page that is being freed in this path never had prep_new_page called
so set_page_refcounted is called explicitly but there is no call to
kernel_map_pages. With the default kernel_map_pages this is mostly
harmless but if kernel_map_pages does any manipulation of the page
tables (unmapping or setting pages to read only) this may trigger a
fault:
alloc_contig_range test_pages_isolated(ceb00, ced00) failed
Unable to handle kernel paging request at virtual address ffffffc0cec00000
pgd = ffffffc045fc4000
[ffffffc0cec00000] *pgd=0000000000000000
Internal error: Oops: 9600004f [#1] PREEMPT SMP
Modules linked in: exfatfs
CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
PC is at memset+0xc8/0x1c0
LR is at kernel_map_pages+0x1ec/0x244
Fix this by calling kernel_map_pages to ensure the page is set in the
page table properly
Fixes: 3c605096d3 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3fe89b3e2a ]
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
upgrading to 3.19 and had no luck with 4.0-rc1 neither.
So, after looking into new logic introduced by commit 7a3ef208e6 ("mm:
prevent endless growth of anon_vma hierarchy"), I found chances are that
unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
in anon_vma_clone() due to allocation failure. If dst->anon_vma is not
NULL in error path, its degree will be incorrectly decremented in
unlink_anon_vmas() and eventually underflow when exiting as a result of
another call to unlink_anon_vmas(). That's how "kernel BUG at
mm/rmap.c:399!" is triggered for me.
This patch fixes the underflow by dropping dst->anon_vma when allocation
fails. It's safe to do so regardless of original value of dst->anon_vma
because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
fails. Besides, callers don't care dst->anon_vma in such case neither.
Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
anon_vma_clone() now does the work.
[akpm@linux-foundation.org: tweak comment]
Fixes: 7a3ef208e6 ("mm: prevent endless growth of anon_vma hierarchy")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b800c91a05 ]
Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel
BUG at mm/rmap.c:399!") caused by commit 7a3ef208e6 ("mm: prevent
endless growth of anon_vma hierarchy")
Anon_vma_clone() is usually called for a copy of source vma in
destination argument. If source vma has anon_vma it should be already
in dst->anon_vma. NULL in dst->anon_vma is used as a sign that it's
called from anon_vma_fork(). In this case anon_vma_clone() finds
anon_vma for reusing.
Vma_adjust() calls it differently and this breaks anon_vma reusing
logic: anon_vma_clone() links vma to old anon_vma and updates degree
counters but vma_adjust() overrides vma->anon_vma right after that. As
a result final unlink_anon_vmas() decrements degree for wrong anon_vma.
This patch assigns ->anon_vma before calling anon_vma_clone().
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com>
Reported-and-tested-by: Oded Gabbay <oded.gabbay@amd.com>
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # to match back-porting of 7a3ef208e6
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7a3ef208e6 ]
Constantly forking task causes unlimited grow of anon_vma chain. Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.
This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one. It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two. As a result each anon_vma has either vma
or at least two descending anon_vmas. In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.
This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.
Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb493052 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [2.6.34+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 788211d81b ]
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:
* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free
The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.
Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fea559f303 ]
Implement arch_irq_work_has_interrupt() for powerpc
Commit 9b01f5bf3 introduced a dependency on "IRQ work self-IPIs" for
full dynamic ticks to be enabled, by expecting architectures to
implement a suitable arch_irq_work_has_interrupt() routine.
Several arches have implemented this routine, including x86 (3010279f)
and arm (09f6edd4), but powerpc was omitted.
This patch implements this routine for powerpc.
The symptom, at boot (on powerpc systems) with "nohz_full=<CPU list>"
is displayed:
NO_HZ: Can't run full dynticks because arch doesn't support irq work self-IPIs
after this patch:
NO_HZ: Full dynticks CPUs: <CPU list>.
Tested against 3.19.
powerpc implements "IRQ work self-IPIs" by setting the decrementer to 1 in
arch_irq_work_raise(), which causes a decrementer exception on the next
timebase tick. We then handle the work in __timer_interrupt().
CC: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul A. Clarke <pc@us.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[mpe: Flesh out change log, fix ws & include guards, remove include of processor.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d52356e7f4 ]
Space allocated for paca is based off nr_cpu_ids,
but pnv_alloc_idle_core_states() iterates paca with
cpu_nr_cores()*threads_per_core, which is using NR_CPUS.
This causes pnv_alloc_idle_core_states() to write over memory,
which is outside of paca array and may later lead to various panics.
Fixes: 7cba160ad7 (powernv/cpuidle: Redesign idle states management)
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 340f0ba1c6 ]
alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
Noticed by inspection after Jeff Layton fixed the same bug for open
owners. Depending on client behavior, this one may be trickier to
trigger in practice.
Fixes: c58c6610ec "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c5952338bf ]
alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.
In commit 7ffb588086, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).
This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner. That
has changed now with v4.0 kernels, and this race can now easily occur.
Fixes: 7ffb588086
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3c56b3a12c ]
Commit 25b884a83d ("x86/xen: set
regions above the end of RAM as 1:1") introduced a regression.
To be able to add memory pages which were added via memory hotplug to
a pv domain, the pages must be "invalid" instead of "identity" in the
p2m list before they can be added.
Suggested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9c8928f517 ]
The assumption before this patch was that we don't need to
run again the INIT firmware after the system booted. The
INIT firmware runs calibrations which impact the physical
layer's behavior.
Users reported that it may be helpful to run these
calibrations again every time the interface is brought up.
The penatly is minimal, since the calibrations run fast.
This fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=94341
CC: <stable@vger.kernel.org>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8494057ab5 ]
Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.
Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.
This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.
Addresses: CVE-2014-8159
Cc: <stable@vger.kernel.org>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9c4f61f01d ]
We can search and add the orphan item in one go,
btrfs_insert_orphan_item will find out if the item already exists.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8218c3f4df ]
Originally it was impossible to be dropping the last refcount in this
function since there was always one around still from the idr. But in
commit 83f45fc360
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Aug 6 09:10:18 2014 +0200
drm: Don't grab an fb reference for the idr
we've switched to weak references, broke that assumption but forgot to
fix it up.
Since we still force-disable planes it's only possible to hit this
when racing multiple rmfb with fbdev restoring or similar evil things.
As long as userspace is nice it's impossible to hit the BUG_ON.
But the BUG_ON would most likely be hit from fbdev code, which usually
invovles the console_lock besides all modeset locks. So very likely
we'd never get the bug reports if this was hit in the wild, hence
better be safe than sorry and backport.
Spotted by Matt Roper while reviewing other patches.
[airlied: pull this back into 4.0 - the oops happens there]
Cc: stable@vger.kernel.org
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fdc0074c5f ]
As the sunxi usb clocks all contain a reset controller, it is not
possible to build the sunxi clock driver without RESET_CONTROLLER
enabled. Doing so results in an undefined symbol error:
drivers/built-in.o: In function `sunxi_gates_clk_setup':
linux/drivers/clk/sunxi/clk-sunxi.c:1071: undefined reference to
`reset_controller_register'
This is possible if building a minimal kernel without PHY_SUN4I_USB.
The dependency issue is made visible at compile time instead of
link time by the new A80 mmc clocks, which also use a reset control
itself.
This patch makes ARCH_SUNXI select ARCH_HAS_RESET_CONTROLLER and
RESET_CONTROLLER.
Fixes: 559482d1f9 ARM: sunxi: Split the various SoCs support in Kconfig
Cc: <stable@vger.kernel.org> # 3.16+
Reported-by: Lourens Rozema <ik@lourensrozema.nl>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e4140819da ]
A malicious signal handler / restorer can DOS the system by fudging the
user regs saved on stack, causing weird things such as sigreturn returning
to user mode PC but cpu state still being kernel mode....
Ensure that in sigreturn path status32 always has U bit; any other bogosity
(gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.
Reproducer signal handler:
void handle_sig(int signo, siginfo_t *info, void *context)
{
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
regs->scratch.status32 = 0;
}
Before the fix, kernel would go off to weeds like below:
--------->8-----------
[ARCLinux]$ ./signal-test
Path: /signal-test
CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
task: 8f177880 ti: 5ffe6000 task.ti: 8f15c000
[ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
[EFA ]: 0x00000010
[BLINK ]: 0x2007c1ee
[ERET ]: 0x10698
[STAT32]: 0x00000000 : <--------
BTA: 0x00010680 SP: 0x5ffe7e48 FP: 0x00000000
LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000
...
--------->8-----------
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6914e1e3f6 ]
The regfile provided to SA_SIGINFO signal handler as ucontext was off by
one due to pt_regs gutter cleanups in 2013.
Before handling signal, user pt_regs are copied onto user_regs_struct and copied
back later. Both structs are binary compatible. This was all fine until
commit 2fa919045b (ARC: pt_regs update #2) which removed the empty stack slot
at top of pt_regs (corresponding to first pad) and made the corresponding
fixup in struct user_regs_struct (the pad in there was moved out of
@scratch - not removed altogether as it is part of ptrace ABI)
struct user_regs_struct {
+ long pad;
struct {
- long pad;
long bta, lp_start, lp_end,....
} scratch;
...
}
This meant that now user_regs_struct was off by 1 reg w.r.t pt_regs and
signal code needs to user_regs_struct.scratch to reflect it as pt_regs,
which is what this commit does.
This problem was hidden for 2 years, because both save/restore, despite
using wrong location, were using the same location. Only an interim
inspection (reproducer below) exposed the issue.
void handle_segv(int signo, siginfo_t *info, void *context)
{
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
printf("regs %x %x\n", <=== prints 7 8 (vs. 8 9)
regs->scratch.r8, regs->scratch.r9);
}
int main()
{
struct sigaction sa;
sa.sa_sigaction = handle_segv;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sigaction(SIGSEGV, &sa, NULL);
asm volatile(
"mov r7, 7 \n"
"mov r8, 8 \n"
"mov r9, 9 \n"
"mov r10, 10 \n"
:::"r7","r8","r9","r10");
*((unsigned int*)0x10) = 0;
}
Fixes: 2fa919045b "ARC: pt_regs update #2: Remove unused gutter at start of pt_regs"
CC: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a43f32d647 ]
Struct spear13xx_pcie_driver was in initdata, but we passed a pointer to it
to platform_driver_register(), which can use the pointer at arbitrary times
in the future, even after the initdata is freed. That leads to crashes.
Move spear13xx_pcie_driver and things referenced by it
(spear13xx_pcie_probe() and dw_pcie_host_init()) out of initdata.
[bhelgaas: changelog]
Fixes: 6675ef212d ("PCI: spear: Fix Section mismatch compilation warning for probe()")
Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
CC: stable@vger.kernel.org # v3.17+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8647ca9ad5 ]
Booting a v3.18 or newer Xen domU kernel with PCI devices passed through
results in an oops (this is a 32-bit 3.13.11 dom0 with a 64-bit 4.4.0
hypervisor and 32-bit domU):
BUG: unable to handle kernel paging request at 0030303e
IP: [<c06ed0e6>] acpi_ns_validate_handle+0x12/0x1a
Call Trace:
[<c06eda4d>] ? acpi_evaluate_object+0x31/0x1fc
[<c06b78e1>] ? pci_get_hp_params+0x111/0x4e0
[<c0407bc7>] ? xen_force_evtchn_callback+0x17/0x30
[<c04085fb>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<c0699d34>] ? pci_device_add+0x24/0x450
Don't look for ACPI configuration information if ACPI has been disabled.
I don't think this is the best fix, because we can boot plain Linux (no
Xen) with "acpi=off", and we don't need this check in pci_get_hp_params().
There should be a better fix that would make Xen domU work the same way.
The domU kernel has ACPI support but it has no AML. There should be a way
to initialize the ACPI data structures so things fail gracefully rather
than oopsing. This is an interim fix to address the regression.
Fixes: 6cd33649fa ("PCI: Add pci_configure_device() during enumeration")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96301
Reported-by: Michael D Labriola <mlabriol@gdeb.com>
Tested-by: Michael D Labriola <mlabriol@gdeb.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.18+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a1b7f2f636 ]
Commit fab4c256a5 ("PCI/AER: Add a TLP header print helper") introduced
the helper function __print_tlp_header(), but contrary to the intention,
the behaviour did change: Since we're taking the address of the parameter
t, the first 4 or 8 bytes printed will be the value of the pointer t
itself, and the remaining 12 or 8 bytes will be who-knows-what (something
from the stack).
We want to show the values of the four members of the struct
aer_header_log_regs; that can be done without ugly and error-prone casts.
On little-endian this should produce the same output as originally
intended, and since no-one has complained about getting garbage output so
far, I think big-endian should be ok too.
Fixes: fab4c256a5 ("PCI/AER: Add a TLP header print helper")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
CC: stable@vger.kernel.org # v3.14+
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cc7016ab1a ]
Some BIOS version of Fujitsu Lifebook T731 seems to set up the
headphone pin (0x21) without the assoc number 0x0f while it's set only
to the output on the docking port (0x1a). With the recent commit
[03ad6a8c93: ALSA: hda - Fix "PCM" name being used on one DAC when
there are two DACs], this resulted in the weird mixer element
mapping where the headphone on the laptop is assigned as a shared
volume with the speaker and the docking port is assigned as an
individual headphone.
This patch improves the situation by correcting the headphone pin
config to the more appropriate value.
Reported-and-tested-by: Taylor Smock <smocktaylor@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a59d7199f6 ]
Pin sense will active when power pin is wake up.
Power pin will not wake up immediately during resume state.
Add some delay to wait for power pin activated.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a053fc318b ]
Some M-Audio devices require to receive bootup command just after
powering on, while codes in BeBoB driver doesn't work properly in
big-endian machine because the command should be aligned by
little-endian.
This commit fixes this bug. This fix should go to stable kernel.
Cc: Takayuki Shiroma <t.shiroma.oki@gmail.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3dc8523fa7 ]
Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"
Signed-off-by: Dmitry M. Fedin <dmitry.fedin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fb5ef9e7da ]
BugLink: http://bugs.launchpad.net/bugs/1381005
In canon mode, the read buffer head will advance over the buffer tail
if the input > 4095 bytes without receiving a line termination char.
Discard additional input until a line termination is received.
Before evaluating for overflow, the 'room' value is normalized for
I_PARMRK and 1 byte is reserved for line termination (even in !icanon
mode, in case the mode is switched). The following table shows the
transform:
actual buffer | 'room' value before overflow calc
space avail | !I_PARMRK | I_PARMRK
--------------------------------------------------
0 | -1 | -1
1 | 0 | 0
2 | 1 | 0
3 | 2 | 0
4+ | 3 | 1
When !icanon or when icanon and the read buffer contains newlines,
normalized 'room' values of -1 and 0 are clamped to 0, and
'overflow' is 0, so read_head is not adjusted and the input i/o loop
exits (setting no_room if called from flush_to_ldisc()). No input
is discarded since the reader does have input available to read
which ensures forward progress.
When icanon and the read buffer does not contain newlines and the
normalized 'room' value is 0, then overflow and room are reset to 1,
so that the i/o loop will process the next input char normally
(except for parity errors which are ignored). Thus, erasures, signalling
chars, 7-bit mode, etc. will continue to be handled properly.
If the input char processed was not a line termination char, then
the canon_head index will not have advanced, so the normalized 'room'
value will now be -1 and 'overflow' will be set, which indicates the
read_head can safely be reset, effectively erasing the last char
processed.
If the input char processed was a line termination, then the
canon_head index will have advanced, so 'overflow' is cleared to 0,
the read_head is not reset, and 'room' is cleared to 0, which exits
the i/o loop (because the reader now have input available to read
which ensures forward progress).
Note that it is possible for a line termination to be received, and
for the reader to copy the line to the user buffer before the
input i/o loop is ready to process the next input char. This is
why the i/o loop recomputes the room/overflow state with every
input char while handling overflow.
Finally, if the input data was processed without receiving
a line termination (so that overflow is still set), the pty
driver must receive a write wakeup. A pty writer may be waiting
to write more data in n_tty_write() but without unthrottling
here that wakeup will not arrive, and forward progress will halt.
(Normally, the pty writer is woken when the reader reads data out
of the buffer and more space become available).
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit fb5ef9e7da)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
[ Upstream commit 87f966d97b ]
On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.
Cc: <netdev@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Don Fry <pcnet32@frontier.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bb344ca5b9 ]
Commit 746c9e9f92 "of/base: Fix PowerPC address parsing hack" limited
the applicability of the workaround whereby a missing ranges is treated
as an empty ranges. This workaround was hiding a bug in the etsec2
device tree nodes, which have children with reg, but did not have
ranges.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f6ff041496 ]
We currently use the device tree update code in the kernel after resuming
from a suspend operation to re-sync the kernels view of the device tree with
that of the hypervisor. The code as it stands is not endian safe as it relies
on parsing buffers returned by RTAS calls that thusly contains data in big
endian format.
This patch annotates variables and structure members with __be types as well
as performing necessary byte swaps to cpu endian for data that needs to be
parsed.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: Cyril Bur <cyrilbur@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e53f21bce4 ]
The idle_task_exit() function may call switch_mm() with next ==
&init_mm. On arm64, init_mm.pgd cannot be used for user mappings, so
this patch simply sets the reserved TTBR0.
Cc: <stable@vger.kernel.org>
Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Tested-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e03826d504 ]
The register offset for REGEN2_CTRL in different for TPS659038 chip as when
compared with other Palmas family PMICs. In the case of TPS659038 the wrong
offset pointed to PLLEN_CTRL and was causing a hang. Correcting the same.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 44d5f6f590 ]
commit id 2ba9f0d has changed CONFIG_KVM_BOOK3S_64_HV to tristate to allow
HV/PR bits to be built as modules. But the MCE code still depends on
CONFIG_KVM_BOOK3S_64_HV which is wrong. When user selects
CONFIG_KVM_BOOK3S_64_HV=m to build HV/PR bits as a separate module the
relevant MCE code gets excluded.
This patch fixes the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER. This
makes sure that the relevant MCE code is included when HV/PR bits
are built as a separate modules.
Fixes: 2ba9f0d887 ("kvm: powerpc: book3s: Support building HV and PR KVM as module")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f7cf2a22a2 ]
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset. There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).
This resulted in a bogus value for the top of low memory:
EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)
which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523eac86
sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)
With the fix we see the correct TOLM value:
DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)
and we decode address 2000000 correctly:
sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523e1086
sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6d7fdb0ab3 ]
This reverts commit 89baaa570a.
Dirty page throttling should be sufficient for us in the general case
so there is no need to use __GFP_MEMALLOC - it would be needed only in
the swap-over-rbd case, which we currently don't support. (It would
probably take approximately the commit that is being reverted to add
that support, but we would also need the "swap" option to distinguish
from the general case and make sure swap ceph_client-s aren't shared
with anything else.) See ceph-devel threads [1] and [2] for the
details of why enabling pfmemalloc reserves for all cases is a bad
thing.
On top of potential system lockups related to drained emergency
reserves, this turned out to cause ceph lockups in case peers are on
the same host and communicating via loopback due to sk_filter()
dropping pfmemalloc skbs on the receiving side because the receiving
loopback socket is not tagged with SOCK_MEMALLOC.
[1] "SOCK_MEMALLOC vs loopback"
http://www.spinics.net/lists/ceph-devel/msg22998.html
[2] "[PATCH] libceph: don't set memalloc flags in loopback case"
http://www.spinics.net/lists/ceph-devel/msg23392.html
Conflicts:
net/ceph/messenger.c [ context: tcp_nodelay option ]
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.18+, needs backporting
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mel Gorman <mgorman@suse.de>
[idryomov@gmail.com: backport to 3.18, 3.19: context]
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 98cf21c61a ]
Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated. This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.
The resulting first leaf node is correct:
----------------------------------------------------
| key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
----------------------------------------------------
But the parent index key0 still contains the previous key CNID=123:
-----------------------
| key0.CNID=123 | ... |
-----------------------
A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().
Along the way, I removed duplicate code with unification of the if
condition. The resulting code is equivalent to the original code
because node is never 0.
Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value. However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch. brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 391949b6f0 ]
With spidev the mesg->complete callback points to spidev_complete.
Calling this unblocks spidev_sync and so spidev_sync_write finishes. As
the struct spi_message just read is a local variable in
spidev_sync_write and recording the trace event accesses this message
the recording is better done first. The same can happen for
spidev_sync_read.
This fixes an oops observed on a 3.14-rt system with spidev activity
after
echo 1 > /sys/kernel/debug/tracing/events/spi/enable
.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 09ee96b214 ]
The "dm snapshot: suspend origin when doing exception handover" commit
fixed a exception store handover bug associated with pending exceptions
to the "snapshot-origin" target.
However, a similar problem exists in snapshot merging. When snapshot
merging is in progress, we use the target "snapshot-merge" instead of
"snapshot-origin". Consequently, during exception store handover, we
must find the snapshot-merge target and suspend its associated
mapped_device.
To avoid lockdep warnings, the target must be suspended and resumed
without holding _origins_lock.
Introduce a dm_hold() function that grabs a reference on a
mapped_device, but unlike dm_get(), it doesn't crash if the device has
the DMF_FREEING flag set, it returns an error in this case.
In snapshot_resume() we grab the reference to the origin device using
dm_hold() while holding _origins_lock (_origins_lock guarantees that the
device won't disappear). Then we release _origins_lock, suspend the
device and grab _origins_lock again.
NOTE to stable@ people:
When backporting to kernels 3.18 and older, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b735fede8d ]
In the function snapshot_resume we perform exception store handover. If
there is another active snapshot target, the exception store is moved
from this target to the target that is being resumed.
The problem is that if there is some pending exception, it will point to
an incorrect exception store after that handover, causing a crash due to
dm-snap-persistent.c:get_exception()'s BUG_ON.
This bug can be triggered by repeatedly changing snapshot permissions
with "lvchange -p r" and "lvchange -p rw" while there are writes on the
associated origin device.
To fix this bug, we must suspend the origin device when doing the
exception store handover to make sure that there are no pending
exceptions:
- introduce _origin_hash that keeps track of dm_origin structures.
- introduce functions __lookup_dm_origin, __insert_dm_origin and
__remove_dm_origin that manipulate the origin hash.
- modify snapshot_resume so that it calls dm_internal_suspend_fast() and
dm_internal_resume_fast() on the origin device.
NOTE to stable@ people:
When backporting to kernels 3.12-3.18, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.
When backporting to kernels older than 3.12, you need to pick functions
dm_internal_suspend and dm_internal_resume from the commit
fd2ed4d252.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5f027a3bf1 ]
It was always intended that a read to an unprovisioned block will return
zeroes regardless of whether the pool is in read-only or read-write
mode. thin_bio_map() was inconsistent with its handling of such reads
when the pool is in read-only mode, it now properly zero-fills the bios
it returns in response to unprovisioned block reads.
Eliminate thin_bio_map()'s special read-only mode handling of -ENODATA
and just allow the IO to be deferred to the worker which will result in
pool->process_bio() handling the IO (which already properly zero-fills
reads to unprovisioned blocks).
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e5db29806b ]
Since it's possible for the discard and write same queue limits to
change while the upper level command is being sliced and diced, fix up
both of them (a) to reject IO if the special command is unsupported at
the start of the function and (b) read the limits once and let the
commands error out on their own if the status happens to change.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a104a45ba7 ]
The commit 9cade1a46c (dma: dw: split driver to library part and platform
code) introduced a separate platform driver but missed to add a
MODULE_ALIAS("platform:dw_dmac"); to that module.
The patch adds this to get driver loaded automatically if platform device is
registered.
Reported-by: "Blin, Jerome" <jerome.blin@intel.com>
Fixes: 9cade1a46c (dma: dw: split driver to library part and platform code)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d525211f9d ]
Vince reported a watchdog lockup like:
[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
[<ffffffff810b7f10>] lock_release+0x130/0x260
[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
[<ffffffff811f71ce>] send_sigio+0xae/0x100
[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
[<ffffffff8115d103>] perf_pending_event+0x33/0x60
[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
[<ffffffff8114e448>] irq_work_run+0x18/0x40
[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80
Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.
This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.
Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d7c146053d ]
The error code paths that require cleanup use a goto to jump to the
cleanup code and return an error code. However, the error code variable
res, which is initialized to -EINVAL when declared, is then overwritten
with the return value of of_parse_phandle_with_args(), and reused as the
return code from of_irq_parse_one(). This leads to an undetermined error
being returned instead of the expected -EINVAL value. Fix it.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 43b68879de ]
As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
that cpu_pm_enter is not called twice on the same CPU before
cpu_pm_exit is called.". In the current code in case of failure when
calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
called whereas cpu_pm_enter() was called just before.
This patch moves the cpu_pm_exit() in order to balance the
cpu_pm_enter() calls.
Cc: stable@vger.kernel.org
Reported-by: Fulvio Benini <fbf@libero.it>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c8f0345586 ]
Routine rtl_is_special_data() is supposed to identify packets that need to
use a low bit rate so that the probability of successful transmission is
high. The current version has a bug that causes all IPv6 packets to be
labelled as special, with a corresponding low rate of transmission. A
complete fix will be quite intrusive, but until that is available, all
IPv6 packets are identified as regular.
This patch also removes a magic number.
Reported-and-tested-by: Alan Fisher <acf@unixcube.org>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [3.18+]
Cc: Alan Fisher <acf@unixcube.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f1bce487c ]
devm_phy_create() stores the pointer to the new PHY at the address
returned by devres_alloc(). The res parameter passed to devm_phy_match()
is therefore the location where the pointer to the PHY is stored, hence
it needs to be dereferenced before comparing to the match data in order
to find the correct match.
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a886bd9226 ]
We should signal connect (pull up dp) after we have already
at peripheral mode, otherwise, the dp may be toggled due to
we reset controller or do disconnect during the initialization
for peripheral, then, the host may be confused during the
enumeration, eg, it finds the reset can't succeed, but the
device is still there, see below error message.
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: cannot reset port 1 (err = -32)
hub 1-0:1.0: Cannot enable port 1. Maybe the USB cable is bad?
hub 1-0:1.0: unable to enumerate USB device on port 1
Fixes: the issue existed when the otg fsm code was added.
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d20f780799 ]
This patch adds response to a_alt_hnp_support set feature request from legacy
A device, that is, B-device can provide a message to the user indicating that
the user needs to connect the B-device to an alternate port on the A-device.
A device sets this feature indicates to the B-device that it is connected
to an A-device port that is not capable of HNP, but that the A-device does have
an alternate port that is capable of HNP.
[Peter]
Without this patch, the OTG B device can't be enumerated on
non-HNP port at A device, see below log:
[ 2.287464] usb 1-1: Dual-Role OTG device on non-HNP port
[ 2.293105] usb 1-1: can't set HNP mode: -32
[ 2.417422] usb 1-1: new high-speed USB device number 4 using ci_hdrc
[ 2.460635] usb 1-1: Dual-Role OTG device on non-HNP port
[ 2.466424] usb 1-1: can't set HNP mode: -32
[ 2.587464] usb 1-1: new high-speed USB device number 5 using ci_hdrc
[ 2.630649] usb 1-1: Dual-Role OTG device on non-HNP port
[ 2.636436] usb 1-1: can't set HNP mode: -32
[ 2.641003] usb usb1-port1: unable to enumerate USB device
Cc: stable <stable@vger.kernel.org>
Acked-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Li Jun <b47624@freescale.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0f611d28fc ]
Since moving the interface combination checks to mac80211, it's
broken because it now only considers interfaces with an assigned
channel context, so for example any interface that isn't active
can still be up, which is clearly an issue; also, in particular
P2P-Device wdevs are an issue since they never have a chanctx.
Fix this by counting running interfaces instead the ones with a
channel context assigned.
Cc: stable@vger.kernel.org [3.16+]
Fixes: 73de86a389 ("cfg80211/mac80211: move interface counting for combination check to mac80211")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[rewrite commit message, dig out the commit it fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit aa75ebc275 ]
Some APs experience problems when working with
U-APSD. Decreasing the probability of that
happening by using legacy mode for all ACs but VO
isn't enough.
Cisco 4410N originally forced us to enable VO by
default only because it treated non-VO ACs as
legacy.
However some APs (notably Netgear R7000) silently
reclassify packets to different ACs. Since u-APSD
ACs require trigger frames for frame retrieval
clients would never see some frames (e.g. ARP
responses) or would fetch them accidentally after
a long time.
It makes little sense to enable u-APSD queues by
default because it needs userspace applications to
be aware of it to actually take advantage of the
possible additional powersavings. Implicitly
depending on driver autotrigger frame support
doesn't make much sense.
Cc: stable@vger.kernel.org
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 496fcc294d ]
As HT/VHT depend heavily on QoS/WMM, it's not a good idea to
let userspace add clients that have HT/VHT but not QoS/WMM.
Since it does so in certain cases we've observed (client is
using HT IEs but not QoS/WMM) just ignore the HT/VHT info at
this point and don't pass it down to the drivers which might
unconditionally use it.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dc5465dc8a ]
On the X1 Carbon 3rd gen (with a 2015 broadwell cpu), the physical middle
button of the trackstick (attached to the touchpad serio device, of course)
seems to get lost.
Actually, the touchpads reports 3 extra buttons, which falls in the switch
below to the '2' case. Let's handle the case of odd numbers also, so that
the middle button finds its way back.
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 02e07492cd ]
Post-2013 Lenovo laptops provide correct min/max dimensions, which are
different with the ones currently quirked. According to
https://bugzilla.kernel.org/show_bug.cgi?id=91541 the following board ids
are assigned in the post-2013 touchpads:
t440p/t440s: LEN0036 -> 2964/2962
t540p: LEN0034 -> 2964
Using 2961 as the common minimum makes these 3 laptops OK. We may need
to update those values later if other pnp_ids has a lower board_id.
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8b04baba10 ]
Split the function synaptics_resolution() into synaptics_resolution() and
synaptics_quirks(). synaptics_resolution() will be called before
synaptics_quirks() to query dimensions and resolutions before overwriting
them with quirks.
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Martin <consume.noise@gmail.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 75c3d0bf9c ]
This patch fixes the incorrect use of __transport_register_session()
in tcm_qla2xxx_check_initiator_node_acl() code, that does not perform
explicit se_tpg->session_lock when accessing se_tpg->tpg_sess_list
to add new se_sess nodes.
Given that tcm_qla2xxx_check_initiator_node_acl() is not called with
qla_hw->hardware_lock held for all accesses of ->tpg_sess_list, the
code should be using transport_register_session() instead.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: <stable@vger.kernel.org> # 3.5+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7b8f10da3b ]
The initialisation of the efm32 clocksource first sets up the irq and only
after that initialises the data needed for irq handling. In case this
initialisation is delayed the irq handler would dereference a NULL pointer.
I'm not aware of anything that could delay the process in such a way, but it's
better to be safe than sorry, so setup the irq only when the clock event device
is ready.
Cc: stable@vger.kernel.org
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Yongbae Park <yongbae2@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 328f494d95 ]
When inserting a new register into a block at the lower end the present
bitmap is currently shifted into the wrong direction. The effect of this is
that the bitmap becomes corrupted and registers which are present might be
reported as not present and vice versa.
Fix this by shifting left rather than right.
Fixes: 472fdec7380c("regmap: rbtree: Reduce number of nodes, take 2")
Reported-by: Daniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1096be084a ]
The interrupt is enabled before the handler is set. Even this bug
did not appear, it is potentially dangerous as it can lead to a
NULL pointer dereference.
Fix the error by enabling the interrupt after
clockevents_config_and_register() is called.
Cc: stable@vger.kernel.org
Signed-off-by: Yongbae Park <yongbae2@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 88660f7fb9 ]
virtio spec requires that all drivers set DRIVER_OK
before using devices. While balloon isn't yet
included in the virtio 1 spec, previous spec versions
also required this.
virtio balloon might violate this rule: probe calls
kthread_run before setting DRIVER_OK, which might run
immediately and cause balloon to inflate/deflate.
To fix, call virtio_device_ready before running the kthread.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2bf4c1d483 ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 08641d9b7b ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4c523ef611 ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d223b0e7fc ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e8371aa0fe ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Paul Handrigan <Paul.Handrigan@cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d7f58db49d ]
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cdd3d2a93f ]
Routes without a control must use NULL for the control name. The sn95031
driver uses "NULL" instead in a few places. Previous to commit 5fe5b767dc
("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
the DAPM core silently ignored non-NULL controls on non-mixer and non-mux
routes. But starting with that commit it will complain and not add the
route breaking the sn95031 driver in the process.
This patch replaces the incorrect "NULL" control name with NULL to fix the
issue.
Fixes: 5fe5b767dc ("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ce9594c6b3 ]
Routes without a control must use NULL for the control name. The ak4671
driver uses "NULL" instead in a few places. Previous to commit 5fe5b767dc
("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
the DAPM core silently ignored non-NULL controls on non-mixer and non-mux
routes. But starting with that commit it will complain and not add the
route breaking the ak4671 driver in the process.
This patch replaces the incorrect "NULL" control name with NULL to fix the
issue.
Fixes: 5fe5b767dc ("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8e6a75c102 ]
Routes without a control must use NULL for the control name. The da732x
driver uses "NULL" instead in a few places. Previous to commit 5fe5b767dc
("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
the DAPM core silently ignored non-NULL controls on non-mixer and non-mux
routes. But starting with that commit it will complain and not add the
route breaking the da732x driver in the process.
This patch replaces the incorrect "NULL" control name with NULL to fix the
issue.
Fixes: 5fe5b767dc ("ASoC: dapm: Do not pretend to support controls for non mixer/mux widgets")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 215a8fe419 ]
This patch fixes a NULL pointer dereference OOPs with pSCSI backends
within target_core_stat.c code. The bug is caused by a configfs attr
read if no pscsi_dev_virt->pdv_sd has been configured.
Reported-by: Olaf Hering <olaf@aepfle.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f068fbc82e ]
This patch fixes a iser specific logout bug where early complete()
of conn->conn_logout_comp in iscsit_close_connection() was causing
isert_wait4logout() to complete too soon, triggering a use after
free NULL pointer dereference of iscsi_conn memory.
The complete() was originally added for traditional iscsi-target
when a ISCSI_LOGOUT_OP failed in iscsi_target_rx_opcode(), but given
iser-target does not wait in logout failure, this special case needs
to be avoided.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Slava Shwartsman <valyushash@gmail.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5f7da044f8 ]
This patch fixes a NULL pointer dereference triggered by a late
target_configure_device() -> alloc_workqueue() failure that results
in target_free_device() being called with DF_CONFIGURED already set,
which subsequently OOPses in destroy_workqueue() code.
Currently this only happens at modprobe target_core_mod time when
core_dev_setup_virtual_lun0() -> target_configure_device() fails,
and the explicit target_free_device() gets called.
To address this bug originally introduced by commit 0fd97ccf45, go
ahead and move DF_CONFIGURED to end of target_configure_device()
code to handle this special failure case.
Reported-by: Claudio Fleiner <cmf@daterainc.com>
Cc: Claudio Fleiner <cmf@daterainc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v3.7+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7544e59734 ]
This patch fixes a se_cmd->cmd_kref leak buf when se_sess->sess_tearing_down
is true within target_get_sess_cmd() submission path code.
This se_cmd reference leak can occur during active session shutdown when
ack_kref=1 is passed by target_submit_cmd_[map_sgls,tmr]() callers.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d53d25578 ]
ehrpwm tbclk is wrongly modelled as deriving from dpll_per_m2_ck.
The TRM says tbclk is derived from SYSCLKOUT. SYSCLKOUT nothing but the
functional clock of pwmss (l4ls_gclk).
Fix this by changing source of ehrpwmx_tbclk to l4ls_gclk.
Fixes: 4da1c67719 ("add tbclk data for ehrpwm")
Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 6e22616eba ]
ehrpwm tbclk is wrongly modelled as deriving from dpll_per_m2_ck.
The TRM says tbclk is derived from SYSCLKOUT. SYSCLKOUT nothing but the
functional clock of pwmss (l4ls_gclk).
Fix this by changing source of ehrpwmx_tbclk to l4ls_gclk.
Fixes: 9e100ebafb: ("Fix ehrpwm tbclk data")
Signed-off-by: Vignesh R <vigneshr@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d2192ea098 ]
Fixes: ee6c750761 (ARM: dts: dra7 clock data)
On DRA7x, For DPLL_IVA, the ref clock(CLKINP) is connected to sys_clk1 and
the bypass input(CLKINPULOW) is connected to iva_dpll_hs_clk_div clock.
But the bypass input is not directly routed to bypass clkout instead
both CLKINP and CLKINPULOW are connected to bypass clkout via a mux.
This mux is controlled by the bit - CM_CLKSEL_DPLL_IVA[23]:DPLL_BYP_CLKSEL
and it's POR value is zero which selects the CLKINP as bypass clkout.
which means iva_dpll_hs_clk_div is not the bypass clock for dpll_iva_ck
Fix this by adding another mux clock as parent in bypass mode.
This design is common to most of the PLLs and the rest have only one bypass
clock. Below is a list of the DPLLs that need this fix:
DPLL_IVA, DPLL_DDR,
DPLL_DSP, DPLL_EVE,
DPLL_GMAC, DPLL_PER,
DPLL_USB and DPLL_CORE
Signed-off-by: Ravikumar Kattekola <rk@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 84e871660b ]
at91rm9200 standby and suspend to ram has been broken since
00482a4078. It is wrongly using AT91_BASE_SYS which is a physical address
and actually doesn't correspond to any register on at91rm9200.
Use the correct at91_ramc_base[0] instead.
Fixes: 00482a4078 (ARM: at91: implement the standby function for pm/cpuidle)
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit e893286918 ]
On gcc5 the kernel does not link:
ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.
Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.
.LSTARTFDEDLSI1 says:
/* HACK: The dwarf2 unwind routines will subtract 1 from the
return address to get an address in the middle of the
presumed call instruction. Since we didn't get here via
a call, we need to include the nop before the real start
to make up for it. */
.long .LSTART_sigreturn-1-. /* PC-relative start address */
But commit 69d0627a7f ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.
Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".
So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.
Kudos for reporting and diagnosing should go to Richard.
Reported-by: Richard Biener <rguenther@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit dc9be0fac7 ]
POWER supports irqfds but forgot to advertise them. Some userspace does
not check for the capability, but others check it---thus they work on
x86 and s390 but not POWER.
To avoid that other architectures in the future make the same mistake, let
common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE.
Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 297e21053a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a7c80ebcac ]
math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().
This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.
This triggers:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
dump_stack
___might_sleep
? _raw_spin_unlock_irqrestore
__might_sleep
down_read
? _raw_spin_unlock_irqrestore
print_vma_addr
signal_fault
sys32_rt_sigreturn
Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.
[ Note: this is only the first step. math_state_restore() should
not check used_math(), it should set this flag. While
init_fpu() should simply die. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ccfe8c3f7e ]
The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.
The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.
In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.
Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.
Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.
[1] http://www.chronox.de/libkcapi.html
CC: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 001eabfd54 ]
This updates the bit sliced AES module to the latest version in the
upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a
bug in the XTS decryption path, where data chunked in a certain way
could trigger the ciphertext stealing code, which is not supposed to
be active in the kernel build (The kernel implementation of XTS only
supports round multiples of the AES block size of 16 bytes, whereas
the conformant OpenSSL implementation of XTS supports inputs of
arbitrary size by applying ciphertext stealing). This is fixed in
the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK
around the offending instructions.
The upstream code also contains the change applied by Russell to
build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7,
but implemented slightly differently.
Cc: stable@vger.kernel.org
Fixes: e4e7f10bfc ("ARM: add support for bit sliced AES using NEON instructions")
Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5724be8464 ]
On the Cortex-A9-based Armada SoCs, the MPIC is not the primary interrupt
controller. Yet, it still has to handle some per-cpu interrupt.
To do so, it is chained with the GIC using a per-cpu interrupt. However, the
current code only call irq_set_chained_handler, which is called and enable that
interrupt only on the boot CPU, which means that the parent per-CPU interrupt
is never unmasked on the secondary CPUs, preventing the per-CPU interrupt to
actually work as expected.
This was not seen until now since the only MPIC PPI users were the Marvell
timers that were not working, but not used either since the system use the ARM
TWD by default, and the ethernet controllers, that are faking there interrupts
as SPI, and don't really expect to have interrupts on the secondary cores
anyway.
Add a CPU notifier that will enable the PPI on the secondary cores when they
are brought up.
Cc: <stable@vger.kernel.org> # 3.15+
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1425378443-28822-1-git-send-email-maxime.ripard@free-electrons.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f01d35a15f ]
AIO_PREAD requests call ->aio_read() with iovec on caller's stack, so if
we are going to access it asynchronously, we'd better get ourselves
a copy - the one on kernel stack of aio_run_iocb() won't be there
anymore. function/f_fs.c take care of doing that, legacy/inode.c
doesn't...
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit af6fc858a3 ]
Otherwise the guest can abuse that control to cause e.g. PCIe
Unsupported Request responses by disabling memory and/or I/O decoding
and subsequently causing (CPU side) accesses to the respective address
ranges, which (depending on system configuration) may be fatal to the
host.
Note that to alter any of the bits collected together as
PCI_COMMAND_GUEST permissive mode is now required to be enabled
globally or on the specific device.
This is CVE-2015-2150 / XSA-120.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 85e40b0539 ]
Using the pvops kernel a NULL pointer dereference was detected on a
large machine (144 processors) when booting as dom0 in
evtchn_fifo_unmask() during assignment of a pirq.
The event channel in question was the first to need a new entry in
event_array[] in events_fifo.c. Unfortunately xen_irq_info_pirq_setup()
is called with evtchn being 0 for a new pirq and the real event channel
number is assigned to the pirq only during __startup_pirq().
It is mandatory to call xen_evtchn_port_setup() after assigning the
event channel number to the pirq to make sure all memory needed for the
event channel is allocated.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8792f7772f ]
Commit df9e26d093 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
added an "rtc_src" DT property to specify the clock used as a source to
the S3C real-time clock.
Not all SoCs needs this so commit eaf3a65908 ("drivers/rtc/rtc-s3c.c:
fix initialization failure without rtc source clock") changed to check
the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock.
But that commit didn't update the data for each IP version so the RTC
broke on the boards that needs a source clock. This is the case of at
least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block.
This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420
Peach Pit and Pi Chromebooks.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Chanwoo Choi <cw00.choi@samsung.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Olof Johansson <olof@lixom.net>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Tyler Baker <tyler.baker@linaro.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9a6f513014 ]
The internal framebuffers we create to remap legacy cursor ioctls to
plane operations for the universal plane support shouldn't be linke to
the file like normal userspace framebuffers. This bug goes back to the
original universal cursor plane support introduced in
commit 161d0dc1dc
Author: Matt Roper <matthew.d.roper@intel.com>
Date: Tue Jun 10 08:28:10 2014 -0700
drm: Support legacy cursor ioctls via universal planes when possible (v4)
The isn't too disastrous since fbs are small, we only create one when the
cursor bo gets changed and ultimately they'll be reaped when the window
server restarts.
Conceptually we'd want to just pass NULL for file_priv when creating it,
but the driver needs the file to lookup the underlying buffer object for
cursor id. Instead let's move the file_priv linking out of
add_framebuffer_internal() into the addfb ioctl implementation, which is
the only place it is needed. And also rename the function for a more
accurate since it only creates the fb, but doesn't add it anywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> (fix & commit msg)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (provider of lipstick)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5151adb37a ]
Experimental lockdep annotation added to the TTM lock has unveiled a
couple of lock dependency violations in the vmwgfx driver. In both
cases it turns out that the device_private::reservation_sem is not
needed so the offending code is moved out of that lock.
Cc: <stable@vger.kernel.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3458390b9f ]
To take down the MOB and GMR memory types, the driver may have to issue
fence objects and thus make sure that the fence manager is taken down
after those memory types.
Reorder device init accordingly.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a494457270 ]
This reverts commit e4df3a0b62
("i2c: core: Dispose OF IRQ mapping at client removal time")
Calling irq_dispose_mapping() will destroy the mapping and disassociate
the IRQ from the IRQ chip to which it belongs. Keeping it is OK, because
existent mappings are reused properly.
Also, this commit breaks drivers using devm* for IRQ management on
OF-based systems because devm* cleanup happens in device code, after
bus's remove() method returns.
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Reported-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[wsa: updated the commit message with findings fromt the other bug report]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Fixes: e4df3a0b62
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 283ee1482f ]
According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time. The code path that caused the deadlock was
as follows:
nilfs_fill_super()
load_nilfs()
nilfs_salvage_orphan_logs()
* Do roll-forwarding, attach segment constructor for recovery,
and kick it.
nilfs_segctor_thread()
nilfs_segctor_thread_construct()
* A lock is held with nilfs_transaction_lock()
nilfs_segctor_do_construct()
nilfs_segctor_drop_written_files()
iput()
iput_final()
write_inode_now()
writeback_single_inode()
__writeback_single_inode()
do_writepages()
nilfs_writepage()
nilfs_construct_dsync_segment()
nilfs_transaction_lock() --> deadlock
This can happen if commit 7ef3ff2fea ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time. The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that. For
instance, we can reproduce the issue with the following steps:
< nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
# dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
# dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
count=1 && reboot -nfh
< the system will immediately reboot >
# mount -t nilfs2 /dev/sdb1 /nilfs
The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.
This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 29d62ec5f8 ]
Normally _regulator_do_enable() isn't called on an already-enabled
rdev. That's because the main caller, _regulator_enable() always
calls _regulator_is_enabled() and only calls _regulator_do_enable() if
the rdev was not already enabled.
However, there is one caller of _regulator_do_enable() that doesn't
check: regulator_suspend_finish(). While we might want to make
regulator_suspend_finish() behave more like _regulator_enable(), it's
probably also a good idea to make _regulator_do_enable() robust if it
is called on an already enabled rdev.
At the moment, _regulator_do_enable() is _not_ robust for already
enabled rdevs if we're using an ena_pin. Each time
_regulator_do_enable() is called for an rdev using an ena_pin the
reference count of the ena_pin is incremented even if the rdev was
already enabled. This is not as intended because the ena_pin is for
something else: for keeping track of how many active rdevs there are
sharing the same ena_pin.
Here's how the reference counting works here:
* Each time _regulator_enable() is called we increment
rdev->use_count, so _regulator_enable() calls need to be balanced
with _regulator_disable() calls.
* There is no explicit reference counting in _regulator_do_enable()
which is normally just a warapper around rdev->desc->ops->enable()
with code for supporting delays. It's not expected that the
"ops->enable()" call do reference counting.
* Since regulator_ena_gpio_ctrl() does have reference counting
(handling the sharing of the pin amongst multiple rdevs), we
shouldn't call it if the current rdev is already enabled.
Note that as part of this we cleanup (remove) the initting of
ena_gpio_state in regulator_register(). In _regulator_do_enable(),
_regulator_do_disable() and _regulator_is_enabled() is is clear that
ena_gpio_state should be the state of whether this particular rdev has
requested the GPIO be enabled. regulator_register() was initting it
as the actual state of the pin.
Fixes: 967cfb18c0 ("regulator: core: manage enable GPIO list")
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 0548bf4f5a ]
The _regulator_do_enable() call ought to be a no-op when called on an
already-enabled regulator. However, as an optimization
_regulator_enable() doesn't call _regulator_do_enable() on an already
enabled regulator. That means we never test the case of calling
_regulator_do_enable() during normal usage and there may be hidden
bugs or warnings. We have seen warnings issued by the tps65090 driver
and bugs when using the GPIO enable pin.
Let's match the same optimization that _regulator_enable() in
regulator_suspend_finish(). That may speed up suspend/resume and also
avoids exposing hidden bugs.
[Use much clearer commit message from Doug Anderson]
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 28249b0c2f ]
The LDOs are documented in the rk808 datasheet to have a soft start
time of 400us. Add that to the driver. If this time takes longer on
a certain board the device tree should be able to override with
"regulator-enable-ramp-delay".
This fixes some dw_mmc probing problems (together with other patches
posted to the mmc maiing lists) on rk3288.
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit da29370056 ]
EEH recovery for bnx2x based adapters is not reliable on all Power
systems using the default hot reset, which can result in an
unrecoverable EEH error. Forcing the use of fundamental reset
during EEH recovery fixes this.
Cc: stable<stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8dad0386b9 ]
The NDDB register holds the data that are needed by the read and write
commands.
However, during a read PIO access, the datasheet specifies that after each 32
bytes read in that register, when BCH is enabled, we have to make sure that the
RDDREQ bit is set in the NDSR register.
This fixes an issue that was seen on the Armada 385, and presumably other mvebu
SoCs, when a read on a newly erased page would end up in the driver reporting a
timeout from the NAND.
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cc261738ad ]
The commit [ef403edb75: ALSA: hda - Don't access stereo amps for
mono channel widgets] fixed the handling of mono widgets in general,
but it still misses an exceptional case: namely, a mono mixer widget
taking a single stereo input. In this case, it has stereo volumes
although it's a mono widget, and thus we have to take care of both
left and right input channels, as stated in HD-audio spec ("7.1.3
Widget Interconnection Rules").
This patch covers this missing piece by adding proper checks of stereo
amps in both the generic parser and the proc output codes.
Reported-by: Raymond Yau <superquad.vortex2@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a1f3f1ca66 ]
The commit [63e51fd708: ALSA: hda - Don't take unresponsive D3
transition too serious] introduced a conditional fallback behavior to
the HD-audio controller depending on the flag set. However, it
introduced a silly bug, too, that the flag was evaluated in a reverse
way. This resulted in a regression of HD-audio controller driver
where it can't go to the fallback mode at communication errors.
Unfortunately (or fortunately?) this didn't come up until recently
because the affected code path is an error handling that happens only
on an unstable hardware chip. Most of recent chips work stably, thus
they didn't hit this problem. Now, we've got a regression report with
a VIA chip, and this seems indeed requiring the fallback to the
polling mode, and finally the bug was revealed.
The fix is a oneliner to remove the wrong logical NOT in the check.
(Lesson learned - be careful about double negation.)
The bug should be backported to stable, but the patch won't be
applicable to 3.13 or earlier because of the code splits. The stable
fix patches for earlier kernels will be posted later manually.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94021
Fixes: 63e51fd708 ('ALSA: hda - Don't take unresponsive D3 transition too serious')
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2ddee91abe ]
MacBook Air 5,2 has the same problem as MacBook Pro 8,1 where the
built-in mic records only the right channel. Apply the same
workaround as MBP8,1 to spread the mono channel via a Cirrus codec
vendor-specific COEF setup.
Reported-and-tested-by: Vasil Zlatanov <vasil.zlatanov@gmail.com>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit bad994f5b4 ]
CS420x codecs seem to deal only the single amps of ADC nodes even
though the nodes receive multiple inputs. This leads to the
inconsistent amp value after S3/S4 resume, for example.
The fix is just to set codec->single_adc_amp flag. Then the driver
handles these ADC amps as if single connections.
Reported-and-tested-by: Vasil Zlatanov <vasil.zlatanov@gmail.com>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ef403edb75 ]
The current HDA generic parser initializes / modifies the amp values
always in stereo, but this seems causing the problem on ALC3229 codec
that has a few mono channel widgets: namely, these mono widgets react
to actions for both channels equally.
In the driver code, we do care the mono channel and create a control
only for the left channel (as defined in HD-audio spec) for such a
node. When the control is updated, only the left channel value is
changed. However, in the resume, the right channel value is also
restored from the initial value we took as stereo, and this overwrites
the left channel value. This ends up being the silent output as the
right channel has been never touched and remains muted.
This patch covers the places where unconditional stereo amp accesses
are done and converts to the conditional accesses.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94581
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit be3bb8236d ]
There was no check about the id string of user control elements, so we
accepted even a control element with an empty string, which is
obviously bogus. This patch adds more sanity checks of id strings.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fcdcd1dec6 ]
The device complies to the UAC1 standard but hides that fact with
proprietary descriptors. The autodetect quirk for Roland devices
catches the audio interface but misses the MIDI part, so a specific
quirk is needed.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rafa Lafuente <rafalafuente@gmail.com>
Tested-by: Raphaël Doursenaud <raphael@doursenaud.fr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cd6fa8d2ca ]
Commit fd316941c ("spi/pl022: disable port when unused") introduced a race,
which leads to possible driver lock up (easily reproducible on SMP).
The problem happens in giveback() function where the completion of the transfer
is signalled to SPI subsystem and then the HW SPI controller is disabled. Another
transfer might be setup in between, which brings driver in locked-up state.
Exact event sequence on SMP:
core0 core1
=> pump_transfers()
/* message->state == STATE_DONE */
=> giveback()
=> spi_finalize_current_message()
=> pl022_unprepare_transfer_hardware()
=> pl022_transfer_one_message
=> flush()
=> do_interrupt_dma_transfer()
=> set_up_next_transfer()
/* Enable SSP, turn on interrupts */
writew((readw(SSP_CR1(pl022->virtbase)) |
SSP_CR1_MASK_SSE), SSP_CR1(pl022->virtbase));
...
=> pl022_interrupt_handler()
=> readwriter()
/* disable the SPI/SSP operation */
=> writew((readw(SSP_CR1(pl022->virtbase)) &
(~SSP_CR1_MASK_SSE)), SSP_CR1(pl022->virtbase));
Lockup! SPI controller is disabled and the data will never be received. Whole
SPI subsystem is waiting for transfer ACK and blocked.
So, only signal transfer completion after disabling the controller.
Fixes: fd316941c (spi/pl022: disable port when unused)
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 76e1d14b31 ]
Additionally to the current DMA transfer the PDC allows to set up a next DMA
transfer. This is useful for larger SPI transfers.
The driver currently waits for ENDRX as end of the transfer. But ENDRX is set
when the current DMA transfer is done (RCR = 0), i.e. it doesn't include the
next DMA transfer.
Thus a subsequent SPI transfer could be started although there is currently a
transfer in progress. This can cause invalid accesses to the SPI slave devices
and to SPI transfer errors.
This issue has been observed on a hardware with a M25P128 SPI NOR flash.
So instead of ENDRX we should wait for RXBUFF. This flag is set if there is
no more DMA transfer in progress (RCR = RNCR = 0).
Signed-off-by: Torsten Fleischer <torfl6749@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 62dfd912ab ]
Problem: When IMA and VTPM are both enabled in kernel config,
kernel hangs during bootup on LE OS.
Why?: IMA calls tpm_pcr_read() which results in tpm_ibmvtpm_send
and tpm_ibmtpm_recv getting called. A trace showed that
tpm_ibmtpm_recv was hanging.
Resolution: tpm_ibmtpm_recv was hanging because tpm_ibmvtpm_send
was sending CRQ message that probably did not make much sense
to phype because of Endianness. The fix below sends correctly
converted CRQ for LE. This was not caught before because it
seems IMA is not enabled by default in kernel config and
IMA exercises this particular code path in vtpm.
Tested with IMA and VTPM enabled in kernel config and VTPM
enabled on both a BE OS and a LE OS ppc64 lpar. This exercised
CRQ and TPM command code paths in vtpm.
Patch is against Peter's tpmdd tree on github which included
Vicky's previous vtpm le patches.
Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # eb71f8a5e3: "Added Little Endian support to vtpm module"
Cc: <stable@vger.kernel.org>
Reviewed-by: Ashley Lai <ashley@ahsleylai.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 283cb41f42 ]
The cpuset.sched_relax_domain_level can control how far we do
immediate load balancing on a system. However, it was found on recent
kernels that echo'ing a value into cpuset.sched_relax_domain_level
did not reduce any immediate load balancing.
The reason this occurred was because the update_domain_attr_tree() traversal
did not update for the "top_cpuset". This resulted in nothing being changed
when modifying the sched_relax_domain_level parameter.
This patch is able to address that problem by having update_domain_attr_tree()
allow updates for the root in the cpuset traversal.
Fixes: fc560a26ac ("cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre()")
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 790317e1b2 ]
If clone_children is enabled, effective masks won't be initialized
due to the bug:
# mount -t cgroup -o cpuset xxx /mnt
# echo 1 > cgroup.clone_children
# mkdir /mnt/tmp
# cat /mnt/tmp/
# cat cpuset.effective_cpus
# cat cpuset.cpus
0-15
And then this cpuset won't constrain the tasks in it.
Either the bug or the fix has no effect on unified hierarchy, as
there's no clone_chidren flag there any more.
Reported-by: Christian Brauner <christianvanbrauner@gmail.com>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8603e1b300 ]
cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.
try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation. In that case, try_to_grab_pending() returns -ENOENT. In
this case, __cancel_work_timer() currently invokes flush_work(). The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping
Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority. Let's say task A just got woken
up from flush_work() by the completion of the target work item. If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing. This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.
task A task B worker
executing work
__cancel_work_timer()
try_to_grab_pending()
set work CANCELING
flush_work()
block for work completion
completion, wakes up A
__cancel_work_timer()
while (forever) {
try_to_grab_pending()
-ENOENT as work is being canceled
flush_work()
false as work is no longer executing
}
This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.
Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
v3: bit_waitqueue() can't be used for work items defined in vmalloc
area. Switched to custom wake function which matches the target
work item and exclusive wait and wakeup.
v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
the target bit waitqueue has wait_bit_queue's on it. Use
DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu
Vizoso.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2fec5104f9 ]
The Kvaser firmware can only read and write messages that are
not crossing the USB endpoint's wMaxPacketSize boundary. While
receiving commands from the CAN device, if the next command in
the same URB buffer crossed that max packet size boundary, the
firmware puts a zero-length placeholder command in its place
then moves the real command to the next boundary mark.
The driver did not recognize such behavior, leading to missing
a good number of rx events during a heavy rx load session.
Moreover, a tx URB context only gets freed upon receiving its
respective tx ACK event. Over time, the free tx URB contexts
pool gets depleted due to the missing ACK events. Consequently,
the netif transmission queue gets __permanently__ stopped; no
frames could be sent again except after restarting the CAN
newtwork interface.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 969439016d ]
When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
this can lead to a skb_under_panic due to missing skb initialisations.
Add the missing initialisations at the CAN skbuff creation times on driver
level (rx path) and in the network layer (tx path).
Reported-by: Austin Schuh <austin@peloton-tech.com>
Reported-by: Daniel Steer <daniel.steer@mclaren.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 524a386825 ]
Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.
That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().
Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.
The problem could be seen by the following commands:
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace
The trace will show that function tracing was not active.
Cc: stable@vger.kernel.org # 2.6.27+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1619dc3f8f ]
When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().
Consider the following situation.
# echo 0 > /proc/sys/kernel/ftrace_enabled
After this ftrace_enabled = 0.
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.
# echo 1 > /proc/sys/kernel/ftrace_enabled
Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.
Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer
Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.
On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.
Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.
Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com
Cc: stable@vger.kernel.org # 2.6.31+
Signed-off-by: Pratyush Anand <panand@redhat.com>
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit b24d443b8f ]
When /proc/sys/kernel/ftrace_enabled is set to zero, all function
tracing is disabled. But the records that represent the functions
still hold information about the ftrace_ops that are hooked to them.
ftrace_ops may request "REGS" (have a full set of pt_regs passed to
the callback), or "TRAMP" (the ops has its own trampoline to use).
When the record is updated to represent the state of the ops hooked
to it, it sets "REGS_EN" and/or "TRAMP_EN" to state that the callback
points to the correct trampoline (REGS has its own trampoline).
When ftrace_enabled is set to zero, all ftrace locations are a nop,
so they do not point to any trampoline. But the _EN flags are still
set. This can cause the accounting to go wrong when ftrace_enabled
is cleared and an ops that has a trampoline is registered or unregistered.
For example, the following will cause ftrace to crash:
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo nop > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
As function_graph uses a trampoline, when ftrace_enabled is set to zero
the updates to the record are not done. When enabling function_graph
again, the record will still have the TRAMP_EN flag set, and it will
look for an op that has a trampoline other than the function_graph
ops, and fail to find one.
Cc: stable@vger.kernel.org # 3.17+
Reported-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit eeb8a7e8bb ]
when multiport is off, virtio console invokes config access from irq
context, config access is blocking on s390.
Fix this up by scheduling work from config irq - similar to what we do
for multiport configs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 30a22c215a upstream.
commit 6ae9200f2c ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2cf6258c28)
[ Upstream commit 7fd6f640f2 ]
Trying to write console output from within the serial console driver
while the port->lock is held causes recursive deadlock:
CPU 0
spin_lock_irqsave(&port->lock)
printk()
console_unlock()
call_console_drivers()
serial8250_console_write()
spin_lock_irqsave(&port->lock)
** DEADLOCK **
The 8250_dw i/o accessors try to write a console error message if the
LCR workaround was unsuccessful. When the port->lock is already held
(eg., when called from serial8250_set_termios()), this deadlocks.
Make the error message a FIXME until a general solution is devised.
Cc: Tim Kryger <tim.kryger@gmail.com>
Reported-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d5e7cafd69 ]
If the part of the compression data are corrupted, or the compression
data is totally fake, the memory access over the limit is possible.
This is the log from my system usning lz4 decompression.
[6502]data abort, halting
[6503]r0 0x00000000 r1 0x00000000 r2 0xdcea0ffc r3 0xdcea0ffc
[6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce80000
[6515]r8 0x00000000 r9 0x00000000 r10 0x00000000 r11 0xb9a98000
[6522]r12 0xdcea1000 usp 0x00000000 ulr 0x00000000 pc 0x820149bc
[6528]spsr 0x400001f3
and the memory addresses of some variables at the moment are
ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000
As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory
over @oend.
Signed-off-by: JeHyeon Yeon <tom.yeon@windriver.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a239118a24 ]
radeon_bo_create() calls radeon_ttm_placement_from_domain()
before ttm_bo_init() is called. radeon_ttm_placement_from_domain()
uses the ttm bo size to determine when to select top down
allocation but since the ttm bo is not initialized yet the
check is always false. It only took effect when buffers
were validated later. It also seemed to regress suspend
and resume on some systems possibly due to it not
taking effect in radeon_bo_create().
radeon_bo_create() and radeon_ttm_placement_from_domain()
need to be reworked substantially for this to be optimally
effective. Re-enable it at that point.
Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 285994a62c ]
The ARM architecture allows the caching of intermediate page table
levels and page table freeing requires a sequence like:
pmd_clear()
TLB invalidation
pte page freeing
With commit 5e5f6dc105 (arm64: mm: enable HAVE_RCU_TABLE_FREE logic),
the page table freeing batching was moved from tlb_remove_page() to
tlb_remove_table(). The former takes care of TLB invalidation as this is
also shared with pte clearing and page cache page freeing. The latter,
however, does not invalidate the TLBs for intermediate page table levels
as it probably relies on the architecture code to do it if required.
When the mm->mm_users < 2, tlb_remove_table() does not do any batching
and page table pages are freed before tlb_finish_mmu() which performs
the actual TLB invalidation.
This patch introduces __tlb_flush_pgtable() for arm64 and calls it from
the {pte,pmd,pud}_free_tlb() directly without relying on deferred page
table freeing.
Fixes: 5e5f6dc105 arm64: mm: enable HAVE_RCU_TABLE_FREE logic
Reported-by: Jon Masters <jcm@redhat.com>
Tested-by: Jon Masters <jcm@redhat.com>
Tested-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fb7332a9fe ]
On architectures with hardware broadcasting of TLB invalidation messages
, it makes sense to reduce the range of the mmu_gather structure when
unmapping page ranges based on the dirty address information passed to
tlb_remove_tlb_entry.
arm64 already does this by directly manipulating the start/end fields
of the gather structure, but this confuses the generic code which
does not expect these fields to change and can end up calculating
invalid, negative ranges when forcing a flush in zap_pte_range.
This patch moves the minimal range calculation out of the arm64 code
and into the generic implementation, simplifying zap_pte_range in the
process (which no longer needs to care about start/end, since they will
point to the appropriate ranges already). With the range being tracked
by core code, the need_flush flag is dropped in favour of checking that
the end of the range has actually been set.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 355a901e6c ]
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 91edd096e2 ]
Commit db31c55a6f (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b965 (net: socket: error on a negative msg_namelen).
In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae0 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd (fold verify_iovec() into
copy_msghdr_from_user()).
This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().
Fixes: db31c55a6f (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d22e153718 ]
tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:
ss example output (ss -amn | grep -B1 f4294):
tcp FIN-WAIT-1 0 1 192.168.0.1:45520 192.0.2.1:8080
skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0)
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 73ba57bfae ]
for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4
A simple testcase for verification is:
ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: Steven Barth <cyrus@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 8d006e0105 ]
This reverts commit 11ad714b98 because
it breaks cx82310_eth.
The custom USB_DEVICE_CLASS macro matches
bDeviceClass, bDeviceSubClass and bDeviceProtocol
but the common USB_DEVICE_AND_INTERFACE_INFO matches
bInterfaceClass, bInterfaceSubClass and bInterfaceProtocol instead, which are
not specified.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7d985ed1dc ]
[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]
MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there. It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.
It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested. If it is correct, it's
-stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3eeff778e0 ]
It should be checking flags, not msg->msg_flags. It's ->sendmsg()
instances that need to look for that in ->msg_flags, ->recvmsg() ones
(including the other ->recvmsg() instance in that file, as well as
unix_dgram_recvmsg() this one claims to be imitating) check in flags.
Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC
in receive") back in 2010, so it goes quite a while back.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit c8e2c80d7e ]
inet_diag_dump_one_icsk() allocates too small skb.
Add inet_sk_attr_size() helper right before inet_sk_diag_fill()
so that it can be updated if/when new attributes are added.
iproute2/ss currently does not use this dump_one() interface,
this might explain nobody noticed this problem yet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ab3971b1e7 ]
We don't delete napi from hash list during module exit. This will
cause the following panic when doing module load and unload:
BUG: unable to handle kernel paging request at 0000004e00000075
IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0
PGD 3c5d5067 PUD 0
Oops: 0000 [#1] SMP
...
Call Trace:
[<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net]
[<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net]
[<ffffffff8139e667>] virtio_dev_probe+0x137/0x200
[<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250
[<ffffffff814c81d3>] __driver_attach+0x93/0xa0
[<ffffffff814c8140>] ? __device_attach+0x40/0x40
[<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0
[<ffffffff814c7a79>] driver_attach+0x19/0x20
[<ffffffff814c76f0>] bus_add_driver+0x170/0x220
[<ffffffffa0a60000>] ? 0xffffffffa0a60000
[<ffffffff814c894f>] driver_register+0x5f/0xf0
[<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30
[<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net]
This patch fixes this by doing this in virtnet_free_queues(). And also
don't delete napi in virtnet_freeze() since it will call
virtnet_free_queues() which has already did this.
Fixes 91815639d8 ("virtio-net: rx busy polling support")
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit f862e07cf9 ]
The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:
net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit af5cbc9822 ]
The current driver support receive VLAN CTAG HW acceleration feature
(NETIF_F_HW_VLAN_CTAG_RX) through software simulation. There calls the
api .skb_copy_to_linear_data_offset() to skip the VLAN tag, but there
have overlap between the two memory data point range. The patch just fix
the issue.
V2:
Michael Grzeschik suggest to use memmove() instead of skb_copy_to_linear_data_offset().
Reported-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Fixes: 1b7bde6d65 ("net: fec: implement rx_copybreak to improve rx performance")
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5778d39d07 ]
We dynamically allocate divisor+1 entries for ->ht[] in tc_u_hnode:
ht = kzalloc(sizeof(*ht) + divisor*sizeof(void *), GFP_KERNEL);
So ->ht is supposed to be the last field of this struct, however
this is broken, since an rcu head is appended after it.
Fixes: 1ce87720d4 ("net: sched: make cls_u32 lockless")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2077cef4d5 ]
Firstly, handle zero length calls properly. Believe it or not there
are a few of these happening during early boot.
Next, we can't just drop to a memcpy() call in the forward copy case
where dst <= src. The reason is that the cache initializing stores
used in the Niagara memcpy() implementations can end up clearing out
cache lines before we've sourced their original contents completely.
For example, considering NG4memcpy, the main unrolled loop begins like
this:
load src + 0x00
load src + 0x08
load src + 0x10
load src + 0x18
load src + 0x20
store dst + 0x00
Assume dst is 64 byte aligned and let's say that dst is src - 8 for
this memcpy() call. That store at the end there is the one to the
first line in the cache line, thus clearing the whole line, which thus
clobbers "src + 0x28" before it even gets loaded.
To avoid this, just fall through to a simple copy only mildly
optimized for the case where src and dst are 8 byte aligned and the
length is a multiple of 8 as well. We could get fancy and call
GENmemcpy() but this is good enough for how this thing is actually
used.
Reported-by: David Ahern <david.ahern@oracle.com>
Reported-by: Bob Picco <bpicco@meloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 31aaa98c24 ]
With the increase in number of CPUs calls to functions that dump
output to console (e.g., arch_trigger_all_cpu_backtrace) can take
a long time to complete. If IRQs are disabled eventually the NMI
watchdog kicks in and creates more havoc. Avoid by telling the NMI
watchdog everything is ok.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d51291cb8f ]
Currently perf-stat (aka, counting mode) does not work:
$ perf stat ls
...
Performance counter stats for 'ls':
1.585665 task-clock (msec) # 0.580 CPUs utilized
24 context-switches # 0.015 M/sec
0 cpu-migrations # 0.000 K/sec
86 page-faults # 0.054 M/sec
<not supported> cycles
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.002735100 seconds time elapsed
The reason is that state is never reset (stays with PERF_HES_UPTODATE set).
Add a call to sparc_pmu_enable_event during the added_event handling.
Clean up the encoding since pmu_start calls sparc_pmu_enable_event which
does the same. Passing PERF_EF_RELOAD to sparc_pmu_start means the call
to sparc_perf_event_set_period can be removed as well.
With this patch:
$ perf stat ls
...
Performance counter stats for 'ls':
1.552890 task-clock (msec) # 0.552 CPUs utilized
24 context-switches # 0.015 M/sec
0 cpu-migrations # 0.000 K/sec
86 page-faults # 0.055 M/sec
5,748,997 cycles # 3.702 GHz
<not supported> stalled-cycles-frontend:HG
<not supported> stalled-cycles-backend:HG
1,684,362 instructions:HG # 0.29 insns per cycle
295,133 branches:HG # 190.054 M/sec
28,007 branch-misses:HG # 9.49% of all branches
0.002815665 seconds time elapsed
Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 5b0d4b5514 ]
perf_pmu_disable is called by core perf code before pmu->del and the
enable function is called by core perf code afterwards. No need to
call again within sparc_pmu_del.
Ditto for pmu->add and sparc_pmu_add.
Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 53eb251697 ]
A bug was reported that the semtimedop() system call was always
failing eith ENOSYS.
Since SEMCTL is defined as 3, and SEMTIMEDOP is defined as 4,
the comparison "call <= SEMCTL" will always prevent SEMTIMEDOP
from getting through to the semaphore ops switch statement.
This is corrected by changing the comparison to "call <= SEMTIMEDOP".
Orabug: 20633375
Signed-off-by: Rob Gardner <rob.gardner@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 66d0f7ec9f ]
Load balancing can be triggered in the critical sections protected by
srmmu_context_spinlock in destroy_context() and switch_mm() and can hang
the cpu waiting for the rq lock of another cpu that in turn has called
switch_mm hangning on srmmu_context_spinlock leading to deadlock.
So, disable interrupt while taking srmmu_context_spinlock in
destroy_context() and switch_mm() so we don't deadlock.
See also commit 77b838fa1e ("[SPARC64]: destroy_context() needs to disable
interrupts.")
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit a6130ed253 upstream.
We were missing a return statement in the PSL interrupt handler in the
case of an AFU error, which would trigger an "Unhandled CXL PSL IRQ"
warning. We do actually handle these type of errors (by notifying
userspace), so add the missing return IRQ_HANDLED so we don't throw
unecessary warnings.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f963ec2d6 upstream.
When unbinding and rebinding the driver on a system with a card in PHB0, this
error condition is reached after a few attempts:
ERROR: Bad of_node_put() on /pciex@3fffe40000000
CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
Call Trace:
[c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
[c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
[c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
[c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
[c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
[c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
[c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
[c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
[c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
[c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
[c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
[c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
[c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
[c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
[c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98
We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4beb5421ba upstream.
Select defaults such that a PERST causes flash image reload. Select which
image based on what the card is set up to load.
CXL_VSEC_PERST_LOADS_IMAGE selects whether PERST assertion causes flash image
load.
CXL_VSEC_PERST_SELECT_USER selects which image is loaded on the next PERST.
cxl_update_image_control writes these bits into the VSEC.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e9dcdae40 upstream.
In case CLK_GATE_HIWORD_MASK flag is passed to clk_register_gate(), the bit #
should be no higher than 15, however the corresponding check is obviously off-
by-one.
Fixes: 045779942c ("clk: gate: add CLK_GATE_HIWORD_MASK")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fe89e1b6d upstream.
Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.
In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.
Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.
Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.
Reported-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Fixes: 8323f26ce3 ("sched: Fix race in task_group()")
Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba4877b9ca upstream.
Vinayak Menon has reported that an excessive number of tasks was throttled
in the direct reclaim inside too_many_isolated() because NR_ISOLATED_FILE
was relatively high compared to NR_INACTIVE_FILE. However it turned out
that the real number of NR_ISOLATED_FILE was 0 and the per-cpu
vm_stat_diff wasn't transferred into the global counter.
vmstat_work which is responsible for the sync is defined as deferrable
delayed work which means that the defined timeout doesn't wake up an idle
CPU. A CPU might stay in an idle state for a long time and general effort
is to keep such a CPU in this state as long as possible which might lead
to all sorts of troubles for vmstat consumers as can be seen with the
excessive direct reclaim throttling.
This patch basically reverts 39bf6270f5 ("VM statistics: Make timer
deferrable") but it shouldn't cause any problems for idle CPUs because
only CPUs with an active per-cpu drift are woken up since 7cc36bbddd
("vmstat: on-demand vmstat workers v8") and CPUs which are idle for a
longer time shouldn't have per-cpu drift.
Fixes: 39bf6270f5 (VM statistics: Make timer deferrable)
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34027ca2bb upstream.
The pin id for a given tuple listed in a fsl,pins property is calculated
by dividing the first entry (which is also a register offset) by 4.
As the first available register is at offset 0x8 and configures the pad
MX25_PAD_A10 the right id for this pin is 2. All other pins are off by
one, too.
This patch drops the definition MX25_PAD_RESERVE1 (together with its
only use) and decrements all following values by 1.
Fixes: b4a87c9b96 ("pinctrl: pinctrl-imx: add imx25 pinctrl driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ff0f034e9 upstream.
The right check for conf_reg to be invalid it testing against -1 not 0
as is done in the rest of the driver.
This fixes an oops that can be triggered by:
cat /sys/kernel/debug/pinctrl/43fac000.iomuxc/*
Fixes: ae75ff8145 ("pinctrl: pinctrl-imx: add imx pinctrl core driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8bfae4f993 upstream.
Sometimes while CPU have some load and ath5k doing the wireless
interface reset the whole WiSoC completely freezes. Set of tests shows
that using atomic delay function while we wait interface reset helps to
avoid such freezes.
The easiest way to reproduce this issue: create a station interface,
start continous scan with wpa_supplicant and load CPU by something. Or
just create multiple station interfaces and put them all in continous
scan.
This patch partially reverts the commit 1846ac3dbe ("ath5k: Use
usleep_range where possible"), which replaces initial udelay()
by usleep_range().
I do not know actual source of this issue, but all looks like that HW
freeze is caused by transaction on internal SoC bus, while wireless
block is in reset state.
Also I should note that I do not know how many chips are affected, but I
did not see this issue with chips, other than AR5312.
CC: Jiri Slaby <jirislaby@gmail.com>
CC: Nick Kossifidis <mickflemm@gmail.com>
CC: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Fixes: 1846ac3dbe ("ath5k: Use usleep_range where possible")
Reported-by: Christophe Prevotaux <c.prevotaux@rural-networks.com>
Tested-by: Christophe Prevotaux <c.prevotaux@rural-networks.com>
Tested-by: Eric Bree <ebree@nltinc.com>
Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2be00c0fb upstream.
In the function of_pci_get_host_bridge_resources() if the parsing of ranges
fails, previously allocated resources inclusive of bus_range are not freed
and are not expected to be freed by the function caller on error return.
This patch fixes the issues by adding code that properly frees resources
and bus_range before exiting the function with an error return value.
Fixes: cbe4097f8a ("of/pci: Add support for parsing PCI host bridge resources from DT")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5de61e7aa1 upstream.
The current organization of Documentation/stable_kernel_rules.txt
doesn't clearly differentiate the mutually exclusive options for
submission to the -stable review process. As I understand it, patches
are not actually required to be mailed directly to
stable@vger.kernel.org, but the instructions do not make this clear.
Also, there are some established processes that are not listed --
specifically, what I call Option 2 below.
This patch updates and reorganizes a bit, to make things clearer.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d51199a83a upstream.
DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be
set to 32.
The 64 was introduced by commit (in 2009):
a152ff24b9 ASoC: OMAP: Make DMA 64 aligned
But the dma_mask and coherent_dma_mask can not be used to specify alignment.
Fixes: a152ff24b9 (ASoC: OMAP: Make DMA 64 aligned)
Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c441c254e upstream.
If we're traversing a directory which contains a submounted filesystem,
or one that has a referral, the NFS server that is processing the READDIR
request will often return information for the underlying (mounted-on)
directory. It may, or may not, also return filehandle information.
If this happens, and the lookup in nfs_prime_dcache() returns the
dentry for the submounted directory, the filehandle comparison will
fail, and we call d_invalidate(). Post-commit 8ed936b567
("vfs: Lazily remove mounts on unlinked files and directories."), this
means the entire subtree is unmounted.
The following minimal patch addresses this problem by punting on
the invalidation if there is a submount.
Kudos to Neil Brown <neilb@suse.de> for having tracked down this
issue (see link).
Reported-by: Nix <nix@esperi.org.uk>
Link: http://lkml.kernel.org/r/87iofju9ht.fsf@spindle.srvr.nix
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcf0789a96 upstream.
Commit 7d78cbefaa (serial: 8250_dw: add ability to handle
the peripheral clock) introduces handling for a second clk
to 8250_dw.c which is the driver also for LPSS UART. The
second clk forces us to provide identifier (con_id) for the
clkdev we create.
This fixes an issue where 8250_dw.c is getting the same
handler for both clocks.
Fixes: 7d78cbefaa (serial: 8250_dw: add ability to handle the peripheral clock)
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e17cb1288 upstream.
i915.ko depends upon the acpi/video.ko module and so refuses to load if
ACPI is disabled at runtime if for example the BIOS is broken beyond
repair. acpi/video provides an optional service for i915.ko and so we
should just allow the modules to load, but do no nothing in order to let
the machines boot correctly.
Reported-by: Bill Augur <bill-auger@programmer.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Aaron Lu <aaron.lu@intel.com>
[ rjw: Fixed up the new comment in acpi_video_init() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d65261a09 upstream.
eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.
One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.
This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.
https://bugzilla.kernel.org/show_bug.cgi?id=93691https://launchpad.net/bugs/1305335
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ed620bb34 upstream.
While adding support loading kernel and initrd above 4G to grub2 in legacy
mode, I was referring to efi_high_alloc().
That will allocate buffer for kernel and then initrd, and initrd will
use kernel buffer start as limit.
During testing found two buffers will be overlapped when initrd size is
very big like 400M.
It turns out efi_high_alloc() boundary checking is not right.
end - size will be the new start, and should not compare new
start with max, we need to make sure end is smaller than max.
[ Basically, with the current efi_high_alloc() code it's possible to
allocate memory above 'max', because efi_high_alloc() doesn't check
that the tail of the allocation is below 'max'.
If you have an EFI memory map with a single entry that looks like so,
[0xc0000000-0xc0004000]
And want to allocate 0x3000 bytes below 0xc0003000 the current code
will allocate [0xc0001000-0xc0004000], not [0xc0000000-0xc0003000]
like you would expect. - Matt ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c876486be1 upstream.
commit 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is
protected by clp->cl_lock") removed the use of the reaplist to
clean out clp->cl_revoked. It failed to change list_entry() to
walk clp->cl_revoked.next instead of reaplist.next
Fixes: 2d4a532d38 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Tested-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dd2a883aa upstream.
Atm, it's possible that the interrupt handler is called when the device
is in D3 or some other low-power state. It can be due to another device
that is still in D0 state and shares the interrupt line with i915, or on
some platforms there could be spurious interrupts even without sharing
the interrupt line. The latter case was reported by Klaus Ethgen using a
Lenovo x61p machine (gen 4). He noticed this issue via a system
suspend/resume hang and bisected it to the following commit:
commit e11aa36230
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Wed Jun 18 09:52:55 2014 -0700
drm/i915: use runtime irq suspend/resume in freeze/thaw
This is a problem, since in low-power states IIR will always read
0xffffffff resulting in an endless IRQ servicing loop.
Fix this by handling interrupts only when the driver explicitly enables
them and so it's guaranteed that the interrupt registers return a valid
value.
Note that this issue existed even before the above commit, since during
runtime suspend/resume we never unregistered the handler.
v2:
- clarify the purpose of smp_mb() vs. synchronize_irq() in the
code comment (Chris)
v3:
- no need for an explicit smp_mb(), we can assume that synchronize_irq()
and the mmio read/writes in the install hooks provide for this (Daniel)
- remove code comment as the remaining synchronize_irq() is self
explanatory (Daniel)
v4:
- drm_irq_uninstall() implies synchronize_irq(), so no need to call it
explicitly (Daniel)
Reference: https://lkml.org/lkml/2015/2/11/205
Reported-and-bisected-by: Klaus Ethgen <Klaus@Ethgen.ch>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0dc6f20b98 upstream.
When reviewing patch that fixes VGA on BDW Halo Jani noticed that
we also had other ULT IDs that weren't listed there.
So this follow-up patch add these pci-ids as halo and fix comments
on i915_pciids.h
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbfb00c3e7 upstream.
The logic was reversed from what the hw actually exposed.
Fixes graphics corruption in certain harvest configurations.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a26f9ad1b upstream.
Commit b7bc596ebb ("drm/radeon: disable native
backlight control on pre-r6xx asics (v2)") accidently
broke backlight control on old mac laptops that use the
on-GPU backlight controller.
Signed-off-by: Nathan-J. Hirschauer <nathanhi@deepserve.info>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33e5df0e0e upstream.
It appears that the Cintiq Companion Hybrid does not send an ABS_MISC event to
userspace when any of its ExpressKeys are pressed. This is not strictly
necessary now that the pad exists on its own device, but should be fixed for
consistency's sake.
Traditionally both the stylus and pad shared the same device node, and
xf86-input-wacom would use ABS_MISC for disambiguation. Not sending this causes
the Hybrid to behave incorrectly with xf86-input-wacom beginning with its
8f44f3 commit.
Signed-off-by: Jason Gerecke <killertofu@gmail.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e7b341037 upstream.
The ignore check that got added in 6ce901eb61 ("HID: input: fix confusion
on conflicting mappings") needs to properly check for VARIABLE reports
as well (ARRAY reports should be ignored), otherwise legitimate keyboards
might break.
Fixes: 6ce901eb61 ("HID: input: fix confusion on conflicting mappings")
Reported-by: Fredrik Hallenberg <megahallon@gmail.com>
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ce901eb61 upstream.
On an PC-101/103/104 keyboard (American layout) the 'Enter' key and its
neighbours look like this:
+---+ +---+ +-------+
| 1 | | 2 | | 5 |
+---+ +---+ +-------+
+---+ +-----------+
| 3 | | 4 |
+---+ +-----------+
On a PC-102/105 keyboard (European layout) it looks like this:
+---+ +---+ +-------+
| 1 | | 2 | | |
+---+ +---+ +-+ 4 |
+---+ +---+ | |
| 3 | | 5 | | |
+---+ +---+ +-----+
(Note that the number of keys is the same, but key '5' is moved down and
the shape of key '4' is changed. Keys '1' to '3' are exactly the same.)
The keys 1-4 report the same scan-code in HID in both layouts, even though
the keysym they produce is usually different depending on the XKB-keymap
used by user-space.
However, key '5' (US 'backslash'/'pipe') reports 0x31 for the upper layout
and 0x32 for the lower layout, as defined by the HID spec. This is highly
confusing as the linux-input API uses a single keycode for both.
So far, this was never a problem as there never has been a keyboard with
both of those keys present at the same time. It would have to look
something like this:
+---+ +---+ +-------+
| 1 | | 2 | | x31 |
+---+ +---+ +-------+
+---+ +---+ +-----+
| 3 | |x32| | 4 |
+---+ +---+ +-----+
HID can represent such a keyboard, but the linux-input API cannot.
Furthermore, any user-space mapping would be confused by this and,
luckily, no-one ever produced such hardware.
Now, the HID input layer fixed this mess by mapping both 0x31 and 0x32 to
the same keycode (KEY_BACKSLASH==0x2b). As only one of both physical keys
is present on a hardware, this works just fine.
Lets introduce hardware-vendors into this:
------------------------------------------
Unfortunately, it seems way to expensive to produce a different device for
American and European layouts. Therefore, hardware-vendors put both keys,
(0x31 and 0x32) on the same keyboard, but only one of them is hooked up
to the physical button, the other one is 'dead'.
This means, they can use the same hardware, with a different button-layout
and automatically produce the correct HID events for American *and*
European layouts. This is unproblematic for normal keyboards, as the
'dead' key will never report any KEY-DOWN events. But RollOver keyboards
send the whole matrix on each key-event, allowing n-key roll-over mode.
This means, we get a 0x31 and 0x32 event on each key-press. One of them
will always be 0, the other reports the real state. As we map both to the
same keycode, we will get spurious key-events, even though the real
key-state never changed.
The easiest way would be to blacklist 'dead' keys and never handle those.
We could simply read the 'country' tag of USB devices and blacklist either
key according to the layout. But... hardware vendors... want the same
device for all countries and thus many of them set 'country' to 0 for all
devices. Meh..
So we have to deal with this properly. As we cannot know which of the keys
is 'dead', we either need a heuristic and track those keys, or we simply
make use of our value-tracking for HID fields. We simply ignore HID events
for absolute data if the data didn't change. As HID tracks events on the
HID level, we haven't done the keycode translation, yet. Therefore, the
'dead' key is tracked independently of the real key, therefore, any events
on it will be ignored.
This patch simply discards any HID events for absolute data if it didn't
change compared to the last report. We need to ignore relative and
buffered-byte reports for obvious reasons. But those cannot be affected by
this bug, so we're fine.
Preferably, we'd do this filtering on the HID-core level. But this might
break a lot of custom drivers, if they do not follow the HID specs.
Therefore, we do this late in hid-input just before we inject it into the
input layer (which does the exact same filtering, but on the keycode
level).
If this turns out to break some devices, we might have to limit filtering
to EV_KEY events. But lets try to do the Right Thing first, and properly
filter any absolute data that didn't change.
This patch is tagged for 'stable' as it fixes a lot of n-key RollOver
hardware. We might wanna wait with backporting for a while, before we know
it doesn't break anything else, though.
Reported-by: Adam Goode <adam@spicenitz.org>
Reported-by: Fredrik Hallenberg <megahallon@gmail.com>
Tested-by: Fredrik Hallenberg <megahallon@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be8e89087e upstream.
The hardware range code values and list of valid ranges for the AI
subdevice is incorrect for several supported boards. The hardware range
code values for all boards except PCI-DAS4020/12 is determined by
calling `ai_range_bits_6xxx()` based on the maximum voltage of the range
and whether it is bipolar or unipolar, however it only returns the
correct hardware range code for the PCI-DAS60xx boards. For
PCI-DAS6402/16 (and /12) it returns the wrong code for the unipolar
ranges. For PCI-DAS64/Mx/16 it returns the wrong code for all the
ranges and the comedi range table is incorrect.
Change `ai_range_bits_6xxx()` to use a look-up table pointed to by new
member `ai_range_codes` of `struct pcidas64_board` to map the comedi
range table indices to the hardware range codes. Use a new comedi range
table for the PCI-DAS64/Mx/16 boards (and the commented out variants).
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22aa66a3ee upstream.
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero. Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.
pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences s->origin. These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.
This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bec1f4a88 upstream.
The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.
dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.
There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.
To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37527b8692 upstream.
I created a dm-raid1 device backed by a device that supports DISCARD
and another device that does NOT support DISCARD with the following
dm configuration:
# echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 4K 1G 0
`-moo (dm-0) 0 4K 1G 0
sdb 0 0B 0B 0
`-moo (dm-0) 0 4K 1G 0
Notice that the mirror device /dev/mapper/moo advertises DISCARD
support even though one of the mirror halves doesn't.
If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
loop in do_region() when it tries to issue a DISCARD request to sdb.
The problem is that when we call do_region() against sdb, num_sectors
is set to zero because q->limits.max_discard_sectors is zero.
Therefore, "remaining" never decreases and the loop never terminates.
To fix this: before entering the loop, check for the combination of
REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
the mirror device.
This bug was found by the unfortunate coincidence of pvmove and a
discard operation in the RHEL 6.5 kernel; upstream is also affected.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2ed51ac64 upstream.
It may be possible that a device claims discard support but it rejects
discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3
filesystem driven by the ext4 driver. It may also happen if the
underlying devices are moved from one disk on another.
If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
not degrade the array.
This patch fixes failed test shell/lvconvert-repair-transient.sh in the
lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
filesystem and it is being driven by the ext4 driver.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42b8ce6f55 upstream.
`do_cmd_ioctl()` in "comedi_fops.c" handles the `COMEDI_CMD` ioctl.
This returns `-EAGAIN` if it has copied a modified `struct comedi_cmd`
back to user-space. (This occurs when the low-level Comedi driver's
`do_cmdtest()` handler returns non-zero to indicate a problem with the
contents of the `struct comedi_cmd`, or when the `struct comedi_cmd` has
the `CMDF_BOGUS` flag set.)
`compat_cmd()` in "comedi_compat32.c" handles the 32-bit compatible
version of the `COMEDI_CMD` ioctl. Currently, it never copies a 32-bit
compatible version of `struct comedi_cmd` back to user-space, which is
at odds with the way the regular `COMEDI_CMD` ioctl is handled. To fix
it, change `compat_cmd()` to copy a 32-bit compatible version of the
`struct comedi_cmd` back to user-space when the main ioctl handler
returns `-EAGAIN`.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76820fcf7a upstream.
For all pll-s on sun6i n == 0 means use a multiplier of 1, rather then 0 as
it means on sun4i / sun5i / sun7i. n_start = 1 is already correctly set
for sun6i pll6, but was missing for pll1, this commit fixes this.
Cc: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52bba9809a upstream.
Some of the clks can be registered & unregistered before the clk related debugfs
entries are initialized at late_initcall. In the unregister path checking for only
dentry before clk_debug_init() would lead dangling pointers in the debug clk list,
because the list is already populated in register path and the clk pointer freed in
unregister path.
The side effect of not removing it from the list is either a null pointer
dereference or if lucky to boot the system, the number of clk entries in
debugfs disappear.
We could add more checks like if (inited && !clk->dentry) but just removing
the check for dentry made more sense as debugfs_remove_recursive() seems to be
safe with null pointers. This will ensure that the unregistering clk would be
removed from the debug list in all the code paths.
Without this patch kernel would crash with log:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0204000
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G B 3.19.0-rc3-00007-g412f9ba-dirty #840
Hardware name: Qualcomm (Flattened Device Tree)
task: ed948000 ti: ed944000 task.ti: ed944000
PC is at strlen+0xc/0x40
LR is at __create_file+0x64/0x1dc
pc : [<c04ee604>] lr : [<c049f1c4>] psr: 60000013
sp : ed945e40 ip : ed945e50 fp : ed945e4c
r10: 00000000 r9 : c1006094 r8 : 00000000
r7 : 000041ed r6 : 00000000 r5 : ed4af998 r4 : c11b5e28
r3 : 00000000 r2 : ed945e38 r1 : a0000013 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c5787d Table: 8020406a DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0xed944248)
Stack: (0xed945e40 to 0xed946000)
5e40: ed945e7c ed945e50 c049f1c4 c04ee604 c0fc2fa4 00000000 ecb748c0 c11c2b80
5e60: c0beec04 0000011c c0fc2fa4 00000000 ed945e94 ed945e80 c049f3e0 c049f16c
5e80: 00000000 00000000 ed945eac ed945e98 c08cbc50 c049f3c0 ecb748c0 c11c2b80
5ea0: ed945ed4 ed945eb0 c0fc3080 c08cbc30 c0beec04 c107e1d8 ecdf0600 c107e1d8
5ec0: c107e1d8 ecdf0600 ed945f54 ed945ed8 c0208ed4 c0fc2fb0 c026a784 c04ee628
5ee0: ed945f0c ed945ef0 c0f5d600 c04ee604 c0f5d5ec ef7fcc7d c0b40ecc 0000011c
5f00: ed945f54 ed945f10 c026a994 c0f5d5f8 c04ecc00 00000007 ef7fcc95 00000007
5f20: c0e90744 c0dd0884 ed945f54 c106cde0 00000007 c117f8c0 0000011c c0f5d5ec
5f40: c1006094 c100609c ed945f94 ed945f58 c0f5de34 c0208e50 00000007 00000007
5f60: c0f5d5ec be9b5ae0 00000000 c117f8c0 c0af1680 00000000 00000000 00000000
5f80: 00000000 00000000 ed945fac ed945f98 c0af169c c0f5dd2c ed944000 00000000
5fa0: 00000000 ed945fb0 c020f298 c0af168c 00000000 00000000 00000000 00000000
5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ebcc6d33 bfffca73
[<c04ee604>] (strlen) from [<c049f1c4>] (__create_file+0x64/0x1dc)
[<c049f1c4>] (__create_file) from [<c049f3e0>] (debugfs_create_dir+0x2c/0x34)
[<c049f3e0>] (debugfs_create_dir) from [<c08cbc50>] (clk_debug_create_one+0x2c/0x16c)
[<c08cbc50>] (clk_debug_create_one) from [<c0fc3080>] (clk_debug_init+0xdc/0x144)
[<c0fc3080>] (clk_debug_init) from [<c0208ed4>] (do_one_initcall+0x90/0x1e0)
[<c0208ed4>] (do_one_initcall) from [<c0f5de34>] (kernel_init_freeable+0x114/0x1e0)
[<c0f5de34>] (kernel_init_freeable) from [<c0af169c>] (kernel_init+0x1c/0xfc)
[<c0af169c>] (kernel_init) from [<c020f298>] (ret_from_fork+0x14/0x3c)
Code: c0b40ecc e1a0c00d e92dd800 e24cb004 (e5d02000)
---[ end trace b940e45b5e25c1e7 ]---
Fixes: 6314b6796e "clk: Don't hold prepare_lock across debugfs creation"
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3dccfecdb8 upstream.
The CPU_2X clock does not have a classical in-kernel user, but is,
amongst other things, required for OCM and debug access. Make sure this
clock is not mistakenly disabled during boot up by enabling it in the
platform's clock driver.
Fixes: 0ee52b157b 'clk: zynq: Add clock controller driver'
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 957ed60b53 upstream.
Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
have a memory overrun issue:
Each b-tree node of nilfs2 stores a set of key-value pairs and the number
of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
a few other "bn_*" members.
Since the value of "bn_nchildren" is used for operations on the key-values
within the b-tree node, it can cause memory access overrun if a large
number is incorrectly set to "bn_nchildren".
For instance, nilfs_btree_node_lookup() function determines the range of
binary search with it, and too large "bn_nchildren" leads
nilfs_btree_node_get_key() in that function to overrun.
As for intermediate b-tree nodes, this is prevented by a sanity check
performed when each node is read from a drive, however, no sanity check
has been done for root nodes stored in inodes.
This patch fixes the issue by adding missing sanity check against b-tree
root nodes so that it's called when on-memory inodes are read from ifile,
inode metadata file.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit c2be9dc0e0 upstream.
When marshaling a user path to the kernel struct ib_sa_path, we need
to zero smac and dmac and set the vlan id to the "no vlan" value.
This is to ensure that Ethernet attributes are not used with
InfiniBand QPs.
Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 0fb8bcf022 upstream.
The deadlock occurs in __uverbs_modify_qp: we take a lock (idr_read_qp)
and in case of failure in ib_resolve_eth_l2_attrs we don't release
it (put_qp_read). Fix that.
Fixes: ed4c54e5b4 ("IB/core: Resolve Ethernet L2 addresses when modifying QP")
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit e9a7faf11a upstream.
The MLX4_PROT_IB_IPV4 protocol should only be used with RoCEv2 and such.
Removing this wrong usage allows to run multicast applications over RoCE.
Fixes: d487ee7774 ("IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table")
Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f614fc15ae upstream.
The current code returns success when kmalloc() fails. It should
return an error code, -ENOMEM.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit c6c95ef4ce upstream.
We always unmap SGs with the same direction instead of unmapping
with the direction the mapping was done, fix that.
Fixes: 9a8b08fad2 ("IB/iser: Generalize iser_unmap_task_data and [...]")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6606e6a2ff upstream.
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).
Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.
Fixes: 5426b1711f ("IB/iser: Collapse cleanup and disconnect handlers")
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 18c0b82a3e upstream.
This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.
These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 8d1e5a1a1c upstream.
With task_blocks_on_rt_mutex() returning early -EDEADLK we never
add the waiter to the waitqueue. Later, we try to remove it via
remove_waiter() and go boom in rt_mutex_top_waiter() because
rb_entry() gives a NULL pointer.
( Tested on v3.18-RT where rtmutex is used for regular mutex and I
tried to get one twice in a row. )
Not sure when this started but I guess 397335f004 ("rtmutex: Fix
deadlock detector for real") or commit 3d5c9340d1 ("rtmutex:
Handle deadlock detection smarter").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit de5d0ad506 upstream.
This is essentially a partial revert of the commit [b1920c2110:
'ALSA: hda - Enable runtime PM on Panther Point']. There was a bug
report showing the HD-audio bus hang during runtime PM on HP Spectre
XT.
Reported-by: Dang Sananikone <dang.sananikone@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 37ed398839 upstream.
It is a bad idea to export static functions. GCC for some platforms
shows errors like:
error: __ksymtab_azx_get_response causes a section type conflict
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6426460e5d upstream.
BIOS doesn't seem to set up pins for 5.1 and the SPDIF out, so we need
to give explicitly here.
Reported-and-tested-by: Misan Thropos <misanthropos@gmx.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 70372a7566 upstream.
When a PCM draining is performed to an empty stream that has been
already in PREPARED state, the current code just ignores and leaves as
it is, although the drain is supposed to set all such streams to SETUP
state. This patch covers that overlooked case.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit ca8bb4aefb upstream.
This reverts commit 0aa525d118.
The conditional RX-FIFO read seems to cause spurious interrupts and we
see just:
|serial8250: too much work for irq29
The previous behaviour was "default" for decades and Marvell's 88f6282 SoC
might not be the only that relies on it. Therefore the Omap fix is
reverted for now.
Fixes: 0aa525d118 ("tty: serial: 8250_core: read only RX if there is
something in the FIFO")
Reported-By: Nicolas Schichan <nschichan@freebox.fr>
Debuged-By: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f0bf0bd079 upstream.
This problem was taken care of three times already in
* b0de59b573 (TTY: do not update
atime/mtime on read/write),
* 37b7f3c765 (TTY: fix atime/mtime
regression), and
* b0b885657b (tty: fix up atime/mtime
mess, take three)
But it still misses one point. As John Paul correctly points out, we
do not care about setting date. If somebody ever changes wall
time backwards (by mistake for example), tty timestamps are never
updated until the original wall time passes.
So check the absolute difference of times and if it large than "8
seconds or so", always update the time. That means we will update
immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
check, but it was always that way.
Thanks John for serving me this so nicely debugged.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: John Paul Perry <john_paul.perry@alcatel-lucent.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 13648b0118 upstream.
/proc/<pid>/maps currently don't annotate stack vma with "[stack]"
This is because KSTK_ESP ie expected to return usermode SP of tsk while
currently it returns the kernel mode SP of a sleeping tsk.
While the fix is trivial, we also need to adjust the ARC kernel stack
unwinder to not use KSTK_SP and friends any more.
Reported-and-suggested-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 0db59e5929 upstream.
As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.
And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit ca4383a394 upstream.
Add missing error handling when registering the tty device at port
probe. This avoids trying to remove an uninitialised character device
when the port device is removed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 07fdfc5e9f upstream.
Fix return value in probe error path, which could end up returning
success (0) on errors. This could in turn lead to use-after-free or
double free (e.g. in port_remove) when the port device is removed.
Fixes: c706ebdfc8 ("USB: usb-serial: call port_probe and port_remove
at the right times")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 79fbf4a550 upstream.
Fix overflow bug in tty_wait_until_sent on 64-bit machines, where an
infinite timeout (0) would be passed to the underlying tty-driver's
wait_until_sent-operation as a negative timeout (-1), causing it to
return immediately.
This manifests itself for example as tcdrain() returning immediately,
drivers not honouring the drain flags when setting terminal attributes,
or even dropped data on close as a requested infinite closing-wait
timeout would be ignored.
The first symptom was reported by Asier LLANO who noted that tcdrain()
returned prematurely when using the ftdi_sio usb-serial driver.
Fix this by passing 0 rather than MAX_SCHEDULE_TIMEOUT (LONG_MAX) to the
underlying tty driver.
Note that the serial-core wait_until_sent-implementation is not affected
by this bug due to a lucky chance (comparison to an unsigned maximum
timeout), and neither is the cyclades one that had an explicit check for
negative timeouts, but all other tty drivers appear to be affected.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: ZIV-Asier Llano Palacios <asier.llano@cgglobal.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f528bf4f57 upstream.
Make sure to handle an infinite timeout (0).
Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.
Fixes: dcf0105039 ("USB: serial: add generic wait_until_sent
implementation")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 2c3fbe3cf2 upstream.
In case an infinite timeout (0) is requested, the irda wait_until_sent
implementation would use a zero poll timeout rather than the default
200ms.
Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 9c1c98a3bb upstream.
The current minstrel_ht rate control behavior is somewhat optimistic in
trying to find optimum TX rate. While this is usually fine for normal
Data frames, there are cases where a more conservative set of retry
parameters would be beneficial to make the connection more robust.
EAPOL frames are critical to the authentication and especially the
EAPOL-Key message 4/4 (the last message in the 4-way handshake) is
important to get through to the AP. If that message is lost, the only
recovery mechanism in many cases is to reassociate with the AP and start
from scratch. This can often be avoided by trying to send the frame with
more conservative rate and/or with more link layer retries.
In most cases, minstrel_ht is currently using the initial EAPOL-Key
frames for probing higher rates and this results in only five link layer
transmission attempts (one at high(ish) MCS and four at MCS0). While
this works with most APs, it looks like there are some deployed APs that
may have issues with the EAPOL frames using HT MCS immediately after
association. Similarly, there may be issues in cases where the signal
strength or radio environment is not good enough to be able to get
frames through even at couple of MCS 0 tries.
The best approach for this would likely to be to reduce the TX rate for
the last rate (3rd rate parameter in the set) to a low basic rate (say,
6 Mbps on 5 GHz and 2 or 5.5 Mbps on 2.4 GHz), but doing that cleanly
requires some more effort. For now, we can start with a simple one-liner
that forces the minimum rate to be used for EAPOL frames similarly how
the TX rate is selected for the IEEE 802.11 Management frames. This does
result in a small extra latency added to the cases where the AP would be
able to receive the higher rate, but taken into account how small number
of EAPOL frames are used, this is likely to be insignificant. A future
optimization in the minstrel_ht design can also allow this patch to be
reverted to get back to the more optimized initial TX rate.
It should also be noted that many drivers that do not use minstrel as
the rate control algorithm are already doing similar workarounds by
forcing the lowest TX rate to be used for EAPOL frames.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit b8cb91e058 upstream.
The xhci in Intel Sunrisepoint and Cherryview platforms need a driver
workaround for a Stuck PME that might either block PME events in suspend,
or create spurious PME events preventing runtime suspend.
Workaround is to clear a internal PME flag, BIT(28) in a vendor specific
PMCTRL register at offset 0x80a4, in both suspend resume callbacks
Without this, xhci connected usb devices might never be able to wake up the
system from suspend, or prevent device from going to suspend (xhci d3)
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 45ba2154d1 upstream.
When a control transfer has a short data stage, the xHCI controller generates
two transfer events: a COMP_SHORT_TX event that specifies the untransferred
amount, and a COMP_SUCCESS event. But when the data stage is not short, only the
COMP_SUCCESS event occurs. Therefore, xhci-hcd must set urb->actual_length to
urb->transfer_buffer_length while processing the COMP_SUCCESS event, unless
urb->actual_length was set already by a previous COMP_SHORT_TX event.
The driver checks this by seeing whether urb->actual_length == 0, but this alone
is the wrong test, as it is entirely possible for a short transfer to have an
urb->actual_length = 0.
This patch changes the xhci driver to rely on a new td->urb_length_set flag,
which is set to true when a COMP_SHORT_TX event is received and the URB length
updated at that stage.
This fixes a bug which affected the HSO plugin, which relies on URBs with
urb->actual_length == 0 to halt re-submitting the RX URB in the control
endpoint.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6596a926b0 upstream.
Include the high order bit fields for Max scratchpad buffers when
calculating how many scratchpad buffers are needed.
I'm suprised this hasn't caused more issues, we never allocated more than
32 buffers even if xhci needed more. Either we got lucky and xhci never
really used past that area, or then we got enough zeroed dma memory anyway.
Should be backported as far back as possible
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 1e7e4fb664 upstream.
The commit 9737479285 ("usb: host: xhci-plat: add support for the Armada
375/38x XHCI controllers") extended the xhci-plat driver to support the Armada
375/38x SoCs, mostly by adding a quirk configuring the MBUS window.
However, that quirk was run before the clock the controllers needs has been
enabled. This usually worked because the clock was first enabled by the
bootloader, and left as such until the driver is probe, where it tries to
access the MBUS configuration registers before enabling the clock.
Things get messy when EPROBE_DEFER is involved during the probe, since as part
of its error path, the driver will rightfully disable the clock. When the
driver will be reprobed, it will retry to access the MBUS registers, but this
time with the clock disabled, which hangs forever.
Fix this by running the quirks after the clock has been enabled by the driver.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit a0456399fb upstream.
The "Extended Compat ID OS Feature Descriptor Specification" does not
require the (sub)compatible ids to be NUL-terminated, because they
are placed in a fixed-size buffer and only unused parts of it should
contain NULs. If the buffer is fully utilized, there is no place for NULs.
Consequently, the code which uses desc->ext_compat_id never expects the
data contained to be NUL terminated.
If the compatible id is stored after sub-compatible id, and the compatible
id is full length (8 bytes), the (useless) NUL terminator overwrites the
first byte of the sub-compatible id.
If the sub-compatible id is full length (8 bytes), the (useless) NUL
terminator ends up out of the buffer. The situation can happen in the RNDIS
function, where the buffer is a part of struct f_rndis_opts. The next
member of struct f_rndis_opts is a mutex, so its first byte gets
overwritten. The said byte is a part of a mutex'es member which contains
the information on whether the muext is locked or not. This can lead to a
deadlock, because, in a configfs-composed gadget when a function is linked
into a configuration with config_usb_cfg_link(), usb_get_function()
is called, which then calls rndis_alloc(), which tries locking the same
mutex and (wrongly) finds it already locked.
This patch eliminates NUL terminating of the (sub)compatible id.
Fixes: da4243145f: "usb: gadget: configfs: OS Extended Compatibility descriptors support"
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 96e5d31244 upstream.
In the wrapper the IRQ disable should be done by writing 1's to the
IRQ*_CLR register. Existing code is broken because it instead writes
zeros to IRQ*_SET register.
Fix this by adding functions dwc3_omap_write_irqmisc_clr() and
dwc3_omap_write_irq0_clr() which do the right thing.
Fixes: 72246da40f ("usb: Introduce DesignWare USB3 DRD Driver")
Signed-off-by: George Cherian <george.cherian@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit c7d373c3f0 upstream.
This patch integrates Cyber Cortex AV boards with the existing
ftdi_jtag_quirk in order to use serial port 0 with JTAG which is
required by the manufacturers' software.
Steps: 2
[ftdi_sio_ids.h]
1. Defined the device PID
[ftdi_sio.c]
2. Added a macro declaration to the ids array, in order to enable the
jtag quirk for the device.
Signed-off-by: Max Mansfield <max.m.mansfield@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f6950344d3 upstream.
These product identifiers (PID) all deal with marine NMEA format data
used on motor boats and yachts. We supply the programmed devices to
Chetco, for use inside their equipment. The PIDs are a direct copy of
our Windows device drivers (FTDI drivers with altered PIDs).
Signed-off-by: Mark Glover <mark@actisense.com>
[johan: edit commit message slightly ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f0c2b68198 upstream.
When a signal is delivered, the information in the siginfo structure
is copied to userspace. Good security practice dicatates that the
unused fields in this structure should be initialized to 0 so that
random kernel stack data isn't exposed to the user. This patch adds
such an initialization to the two places where usbfs raises signals.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dave Mielke <dave@mielke.cc>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit db81de767e upstream.
Fix null-pointer dereference at probe when the device is used as a
console, in which case the tty argument to open will be NULL.
Fixes: ee467a1f20 ("USB: serial: add Moxa UPORT 12XX/14XX/16XX
driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 675af70856 upstream.
These device ID's are not associated with the cp210x module currently,
but should be. This patch allows the devices to operate upon connecting
them to the usb bus as intended.
Signed-off-by: Michiel van de Garde <mgparser@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit bc4b1f486f upstream.
This reverts commit 5083fd7bdf.
A bulk-out size smaller than the end-point size is indeed valid. The
offending commit broke the usb-debug driver for EHCI debug devices,
which use 8-byte buffers.
Fixes: 5083fd7bdf ("USB: serial: make bulk_out_size a lower limit")
Reported-by: "Li, Elvin" <elvin.li@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 59e980efaf upstream.
Like the JMicron JMS567 enclosures with the JMS539 choke on report-opcodes,
so avoid it.
Tested-and-reported-by: Tom Arild Naess <tanaess@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit b3cffac04e upstream.
Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.
Lets save the actual PC in the structure so that the correct value is
accessible later.
Fixes: 669e846e6c ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 4ff6f8e61e upstream.
This has been broken for a long time: it broke first in 2.6.35, then was
almost fixed in 2.6.36 but this one-liner slipped through the cracks.
The bug shows up as an infinite loop in Windows 7 (and newer) boot on
32-bit hosts without EPT.
Windows uses CMPXCHG8B to write to page tables, which causes a
page fault if running without EPT; the emulator is then called from
kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are
not 0; the common case for this is that the NX bit (bit 63) is 1.
Fixes: 6550e1f165
Fixes: 16518d5ada
Reported-by: Erik Rull <erik.rull@rdsoftware.de>
Tested-by: Erik Rull <erik.rull@rdsoftware.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit dd9ef135e3 upstream.
Improper arithmetics when calculting the address of the extended ref could
lead to an out of bounds memory read and kernel panic.
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 3a8b36f378 upstream.
When using the fast file fsync code path we can miss the fact that new
writes happened since the last file fsync and therefore return without
waiting for the IO to finish and write the new extents to the fsync log.
Here's an example scenario where the fsync will miss the fact that new
file data exists that wasn't yet durably persisted:
1. fs_info->last_trans_committed == N - 1 and current transaction is
transaction N (fs_info->generation == N);
2. do a buffered write;
3. fsync our inode, this clears our inode's full sync flag, starts
an ordered extent and waits for it to complete - when it completes
at btrfs_finish_ordered_io(), the inode's last_trans is set to the
value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
btrfs_set_inode_last_trans);
4. transaction N is committed, so fs_info->last_trans_committed is now
set to the value N and fs_info->generation remains with the value N;
5. do another buffered write, when this happens btrfs_file_write_iter
sets our inode's last_trans to the value N + 1 (that is
fs_info->generation + 1 == N + 1);
6. transaction N + 1 is started and fs_info->generation now has the
value N + 1;
7. transaction N + 1 is committed, so fs_info->last_trans_committed
is set to the value N + 1;
8. fsync our inode - because it doesn't have the full sync flag set,
we only start the ordered extent, we don't wait for it to complete
(only in a later phase) therefore its last_trans field has the
value N + 1 set previously by btrfs_file_write_iter(), and so we
have:
inode->last_trans <= fs_info->last_trans_committed
(N + 1) (N + 1)
Which made us not log the last buffered write and exit the fsync
handler immediately, returning success (0) to user space and resulting
in data loss after a crash.
This can actually be triggered deterministically and the following excerpt
from a testcase I made for xfstests triggers the issue. It moves a dummy
file across directories and then fsyncs the old parent directory - this
is just to trigger a transaction commit, so moving files around isn't
directly related to the issue but it was chosen because running 'sync' for
example does more than just committing the current transaction, as it
flushes/waits for all file data to be persisted. The issue can also happen
at random periods, since the transaction kthread periodicaly commits the
current transaction (about every 30 seconds by default).
The body of the test is:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create our main test file 'foo', the one we check for data loss.
# By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
# bit from its flags (btrfs inode specific flags).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io
# Now create one other file and 2 directories. We will move this second file
# from one directory to the other later because it forces btrfs to commit its
# currently open transaction if we fsync the old parent directory. This is
# necessary to trigger the data loss bug that affected btrfs.
mkdir $SCRATCH_MNT/testdir_1
touch $SCRATCH_MNT/testdir_1/bar
mkdir $SCRATCH_MNT/testdir_2
# Make sure everything is durably persisted.
sync
# Write more 8Kb of data to our file.
$XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io
# Move our 'bar' file into a new directory.
mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
# Fsync our first directory. Because it had a file moved into some other
# directory, this made btrfs commit the currently open transaction. This is
# a condition necessary to trigger the data loss bug.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
# Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
# data we wrote previously to be persisted and available if a crash happens.
# This did not happen with btrfs, because of the transaction commit that
# happened when we fsynced the parent directory.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# Now check that all data we wrote before are available.
echo "File content after log replay:"
od -t x1 $SCRATCH_MNT/foo
status=0
exit
The expected golden output for the test, which is what we get with this
fix applied (or when running against ext3/4 and xfs), is:
wrote 8192/8192 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
wrote 8192/8192 bytes at offset 8192
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
File content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0040000
Without this fix applied, the output shows the test file does not have
the second 8Kb extent that we successfully fsynced:
wrote 8192/8192 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
wrote 8192/8192 bytes at offset 8192
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
File content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
0020000
So fix this by skipping the fsync only if we're doing a full sync and
if the inode's last_trans is <= fs_info->last_trans_committed, or if
the inode is already in the log. Also remove setting the inode's
last_trans in btrfs_file_write_iter since it's useless/unreliable.
Also because btrfs_file_write_iter no longer sets inode->last_trans to
fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
bail out if last_trans is 0, otherwise something as simple as the following
example wouldn't log the second write on the last fsync:
1. write to file
2. fsync file
3. fsync file
|--> btrfs_inode_in_log() returns true and it set last_trans to 0
4. write to file
|--> btrfs_file_write_iter() no longers sets last_trans, so it
remained with a value of 0
5. fsync
|--> inode->last_trans == 0, so it bails out without logging the
second write
A test case for xfstests will be sent soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 1932b7be97 upstream.
A block-local variable stores error code but btrfs_get_blocks_direct may
not return it in the end as there's a ret defined in the function scope.
Fixes: d187663ef2 ("Btrfs: lock extents as we map them in DIO")
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 4d884fceaa upstream.
We can have multiple fsync operations against the same file during the
same transaction and they can collect the same ordered extents while they
don't complete (still accessible from the inode's ordered tree). If this
happens, those ordered extents will never get their reference counts
decremented to 0, leading to memory leaks and inode leaks (an iput for an
ordered extent's inode is scheduled only when the ordered extent's refcount
drops to 0). The following sequence diagram explains this race:
CPU 1 CPU 2
btrfs_sync_file()
btrfs_sync_file()
mutex_lock(inode->i_mutex)
btrfs_log_inode()
btrfs_get_logged_extents()
--> collects ordered extent X
--> increments ordered
extent X's refcount
btrfs_submit_logged_extents()
mutex_unlock(inode->i_mutex)
mutex_lock(inode->i_mutex)
btrfs_sync_log()
btrfs_wait_logged_extents()
--> list_del_init(&ordered->log_list)
btrfs_log_inode()
btrfs_get_logged_extents()
--> Adds ordered extent X
to logged_list because
at this point:
list_empty(&ordered->log_list)
&& test_bit(BTRFS_ORDERED_LOGGED,
&ordered->flags) == 0
--> Increments ordered extent
X's refcount
--> check if ordered extent's io is
finished or not, start it if
necessary and wait for it to finish
--> sets bit BTRFS_ORDERED_LOGGED
on ordered extent X's flags
and adds it to trans->ordered
btrfs_sync_log() finishes
btrfs_submit_logged_extents()
btrfs_log_inode() finishes
mutex_unlock(inode->i_mutex)
btrfs_sync_file() finishes
btrfs_sync_log()
btrfs_wait_logged_extents()
--> Sees ordered extent X has the
bit BTRFS_ORDERED_LOGGED set in
its flags
--> X's refcount is untouched
btrfs_sync_log() finishes
btrfs_sync_file() finishes
btrfs_commit_transaction()
--> called by transaction kthread for e.g.
btrfs_wait_pending_ordered()
--> waits for ordered extent X to
complete
--> decrements ordered extent X's
refcount by 1 only, corresponding
to the increment done by the fsync
task ran by CPU 1
In the scenario of the above diagram, after the transaction commit,
the ordered extent will remain with a refcount of 1 forever, leaking
the ordered extent structure and preventing the i_count of its inode
from ever decreasing to 0, since the delayed iput is scheduled only
when the ordered extent's refcount drops to 0, preventing the inode
from ever being evicted by the VFS.
Fix this by using the flag BTRFS_ORDERED_LOGGED differently. Use it to
mean that an ordered extent is already being processed by an fsync call,
which will attach it to the current transaction, preventing it from being
collected by subsequent fsync operations against the same inode.
This race was introduced with the following change (added in 3.19 and
backported to stable 3.18 and 3.17):
Btrfs: make sure logged extents complete in the current transaction V3
commit 50d9aa99bd
I ran into this issue while running xfstests/generic/113 in a loop, which
failed about 1 out of 10 runs with the following warning in dmesg:
[ 2612.440038] WARNING: CPU: 4 PID: 22057 at fs/btrfs/disk-io.c:3558 free_fs_root+0x36/0x133 [btrfs]()
[ 2612.442810] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop processor parport_pc parport psmouse therma
l_sys i2c_piix4 serio_raw pcspkr evdev microcode button i2c_core ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic virtio_pci ata_piix virtio_ring libata virtio flo
ppy e1000 scsi_mod [last unloaded: btrfs]
[ 2612.452711] CPU: 4 PID: 22057 Comm: umount Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1
[ 2612.454921] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2612.457709] 0000000000000009 ffff8801342c3c78 ffffffff8142425e ffff88023ec8f2d8
[ 2612.459829] 0000000000000000 ffff8801342c3cb8 ffffffff81045308 ffff880046460000
[ 2612.461564] ffffffffa036da56 ffff88003d07b000 ffff880046460000 ffff880046460068
[ 2612.463163] Call Trace:
[ 2612.463719] [<ffffffff8142425e>] dump_stack+0x4c/0x65
[ 2612.464789] [<ffffffff81045308>] warn_slowpath_common+0xa1/0xbb
[ 2612.466026] [<ffffffffa036da56>] ? free_fs_root+0x36/0x133 [btrfs]
[ 2612.467247] [<ffffffff810453c5>] warn_slowpath_null+0x1a/0x1c
[ 2612.468416] [<ffffffffa036da56>] free_fs_root+0x36/0x133 [btrfs]
[ 2612.469625] [<ffffffffa036f2a7>] btrfs_drop_and_free_fs_root+0x93/0x9b [btrfs]
[ 2612.471251] [<ffffffffa036f353>] btrfs_free_fs_roots+0xa4/0xd6 [btrfs]
[ 2612.472536] [<ffffffff8142612e>] ? wait_for_completion+0x24/0x26
[ 2612.473742] [<ffffffffa0370bbc>] close_ctree+0x1f3/0x33c [btrfs]
[ 2612.475477] [<ffffffff81059d1d>] ? destroy_workqueue+0x148/0x1ba
[ 2612.476695] [<ffffffffa034e3da>] btrfs_put_super+0x19/0x1b [btrfs]
[ 2612.477911] [<ffffffff81153e53>] generic_shutdown_super+0x73/0xef
[ 2612.479106] [<ffffffff811540e2>] kill_anon_super+0x13/0x1e
[ 2612.480226] [<ffffffffa034e1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
[ 2612.481471] [<ffffffff81154307>] deactivate_locked_super+0x3b/0x50
[ 2612.482686] [<ffffffff811547a7>] deactivate_super+0x3f/0x43
[ 2612.483791] [<ffffffff8116b3ed>] cleanup_mnt+0x59/0x78
[ 2612.484842] [<ffffffff8116b44c>] __cleanup_mnt+0x12/0x14
[ 2612.485900] [<ffffffff8105d019>] task_work_run+0x8f/0xbc
[ 2612.486960] [<ffffffff810028d8>] do_notify_resume+0x5a/0x6b
[ 2612.488083] [<ffffffff81236e5b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 2612.489333] [<ffffffff8142a17f>] int_signal+0x12/0x17
[ 2612.490353] ---[ end trace 54a960a6bdcb8d93 ]---
[ 2612.557253] VFS: Busy inodes after unmount of sdb. Self-destruct in 5 seconds. Have a nice day...
Kmemleak confirmed the ordered extent leak (and btrfs inode specific
structures such as delayed nodes):
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff880154290db0 (size 576):
comm "btrfsck", pid 21980, jiffies 4295542503 (age 1273.412s)
hex dump (first 32 bytes):
01 40 00 00 01 00 00 00 b0 1d f1 4e 01 88 ff ff .@.........N....
00 00 00 00 00 00 00 00 c8 0d 29 54 01 88 ff ff ..........)T....
backtrace:
[<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a
[<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83
[<ffffffff8122fb26>] __radix_tree_create+0x109/0x190
[<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac
[<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs]
[<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs]
[<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs]
[<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs]
[<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs]
[<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed
[<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa
[<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff88014ef11db0 (size 576):
comm "rm", pid 22009, jiffies 4295542593 (age 1273.052s)
hex dump (first 32 bytes):
02 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 c8 1d f1 4e 01 88 ff ff ...........N....
backtrace:
[<ffffffff8141d74d>] kmemleak_update_trace+0x4c/0x6a
[<ffffffff8122f2c0>] radix_tree_node_alloc+0x6d/0x83
[<ffffffff8122fb26>] __radix_tree_create+0x109/0x190
[<ffffffff8122fbdd>] radix_tree_insert+0x30/0xac
[<ffffffffa03b9bde>] btrfs_get_or_create_delayed_node+0x130/0x187 [btrfs]
[<ffffffffa03bb82d>] btrfs_delayed_delete_inode_ref+0x32/0xac [btrfs]
[<ffffffffa0379dae>] __btrfs_unlink_inode+0xee/0x288 [btrfs]
[<ffffffffa037c715>] btrfs_unlink_inode+0x1e/0x40 [btrfs]
[<ffffffffa037c797>] btrfs_unlink+0x60/0x9b [btrfs]
[<ffffffff8115d7f0>] vfs_unlink+0x9c/0xed
[<ffffffff8115f5de>] do_unlinkat+0x12c/0x1fa
[<ffffffff811601a7>] SyS_unlinkat+0x29/0x2b
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff8800336feda8 (size 584):
comm "aio-stress", pid 22031, jiffies 4295543006 (age 1271.400s)
hex dump (first 32 bytes):
00 40 3e 00 00 00 00 00 00 00 8f 42 00 00 00 00 .@>........B....
00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 ................
backtrace:
[<ffffffff8114eb34>] create_object+0x172/0x29a
[<ffffffff8141d790>] kmemleak_alloc+0x25/0x41
[<ffffffff81141ae6>] kmemleak_alloc_recursive.constprop.52+0x16/0x18
[<ffffffff81145288>] kmem_cache_alloc+0xf7/0x198
[<ffffffffa0389243>] __btrfs_add_ordered_extent+0x43/0x309 [btrfs]
[<ffffffffa038968b>] btrfs_add_ordered_extent_dio+0x12/0x14 [btrfs]
[<ffffffffa03810e2>] btrfs_get_blocks_direct+0x3ef/0x571 [btrfs]
[<ffffffff81181349>] do_blockdev_direct_IO+0x62a/0xb47
[<ffffffff8118189a>] __blockdev_direct_IO+0x34/0x36
[<ffffffffa03776e5>] btrfs_direct_IO+0x16a/0x1e8 [btrfs]
[<ffffffff81100373>] generic_file_direct_write+0xb8/0x12d
[<ffffffffa038615c>] btrfs_file_write_iter+0x24b/0x42f [btrfs]
[<ffffffff8118bb0d>] aio_run_iocb+0x2b7/0x32e
[<ffffffff8118c99a>] do_io_submit+0x26e/0x2ff
[<ffffffff8118ca3b>] SyS_io_submit+0x10/0x12
[<ffffffff81429e92>] system_call_fastpath+0x12/0x17
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6c15a8516b upstream.
Set the internal device state to to disabled after hardware reset in stop flow.
This will cover cases when driver was not brought to disabled state because of
an error and in stop flow we wish not to retry the reset.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 9e128ced38 upstream.
This patch fixes uncorrect order of mcp3422_scales table, the values
was erroneously transposed.
It removes also an unused array and a wrong comment.
Signed-off-by: Angelo Compagnucci <angelo.compagnucci@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 89bb35e200 upstream.
Using the touchscreen while running buffered capture results in the
buffer reporting lots of wrong values, often just zeros. This is because
we push readings to the buffer every time a touchscreen interrupt
arrives, including when the buffer's own conversions have not yet
finished. So let's only push to the buffer when its conversions are
ready.
Signed-off-by: Kristina Martšenko <kristina.martsenko@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 6abe0300a1 upstream.
Reading a channel through sysfs, or starting a buffered capture, can
occasionally turn off the touchscreen.
This is because the read_raw() and buffer preenable()/postdisable()
callbacks unschedule current conversions on all channels. If a delay
channel happens to schedule a touchscreen conversion at the same time,
the conversion gets cancelled and the touchscreen sequence stops.
This is probably related to this note from the reference manual:
"If a delay group schedules channels to be sampled and a manual
write to the schedule field in CTRL0 occurs while the block is
discarding samples, the LRADC will switch to the new schedule
and will not sample the channels that were previously scheduled.
The time window for this to happen is very small and lasts only
while the LRADC is discarding samples."
So make the callbacks only unschedule conversions for the channels they
use. This means channel 0 for read_raw() and channels 0-5 for the buffer
(if the touchscreen is enabled). Since the touchscreen uses different
channels (6 and 7), it no longer gets turned off.
This is tested and fixes the issue on i.MX28, but hasn't been tested on
i.MX23.
Signed-off-by: Kristina Martšenko <kristina.martsenko@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 86bf7f3ef7 upstream.
Reading a channel through sysfs, or starting a buffered capture, will
currently turn off the touchscreen. This is because the read_raw() and
buffer preenable()/postdisable() callbacks disable interrupts for all
LRADC channels, including those the touchscreen uses.
So make the callbacks only disable interrupts for the channels they use.
This means channel 0 for read_raw() and channels 0-5 for the buffer (if
the touchscreen is enabled). Since the touchscreen uses different
channels (6 and 7), it no longer gets turned off.
Note that only i.MX28 is affected by this issue, i.MX23 should be fine.
Signed-off-by: Kristina Martšenko <kristina.martsenko@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f81197b8a3 upstream.
The touchscreen was initially designed [1] to map all of its physical
channels to one virtual channel, leaving buffered capture to use the
remaining 7 virtual channels. When the touchscreen was reimplemented
[2], it was made to use four virtual channels, which overlap and
conflict with the channels the buffer uses.
As a result, when the buffer is enabled, the touchscreen's virtual
channels are remapped to whichever physical channels the buffer was
configured with, causing the touchscreen to read those instead of the
touch measurement channels. Effectively the touchscreen stops working.
So here we separate the channels again, giving the touchscreen 2 virtual
channels and the buffer 6. We can't give the touchscreen just 1 channel
as before, as the current pressure calculation requires 2 channels to be
read at the same time.
This makes the touchscreen continue to work during buffered capture. It
has been tested on i.MX28, but not on i.MX23.
[1] 06ddd353f5 ("iio: mxs: Implement support for touchscreen")
[2] dee05308f6 ("Staging/iio/adc/touchscreen/MXS: add interrupt driven
touch detection")
Signed-off-by: Kristina Martšenko <kristina.martsenko@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 19e353f2b3 upstream.
The intention is obviously to sign-extend a 12 bit quantity. But
because of C's promotion rules, the assignment is equivalent to "val16
&= 0xfff;". Use the proper API for this.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 03305e535c upstream.
Since commit c8231a9af8 ("iio: mxs-lradc: compute temperature
from channel 8 and 9") with the removal of adc channel 9 there is
no 1-1 mapping in the channel spec.
All hwmon channel values above 9 are accessible via there index minus
one. So add a hidden iio channel 9 to fix this issue.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 06c8173eb9 upstream.
Commit:
f31a9f7c71 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")
introduced alternative instructions for XSAVES/XRSTORS and commit:
adb9d526e9 ("x86/xsaves: Add xsaves and xrstors support for booting time")
added support for the XSAVES/XRSTORS instructions at boot time.
Unfortunately both failed to properly protect them against faulting:
The 'xstate_fault' macro will use the closest label named '1'
backward and that ends up in the .altinstr_replacement section
rather than in .text. This means that the kernel will never find
in the __ex_table the .text address where this instruction might
fault, leading to serious problems if userspace manages to
trigger the fault.
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
[ Improved the changelog, fixed some whitespace noise. ]
Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Allan Xavier <mr.a.xavier@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: adb9d526e9 ("x86/xsaves: Add xsaves and xrstors support for booting time")
Fixes: f31a9f7c71 ("x86/xsaves: Use xsaves/xrstors to save and restore xsave area")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit aa179935ed upstream.
This patch adds a check to sbc_parse_cdb() in order to detect when
an LBA + sector vs. end-of-device calculation wraps when the LBA is
sufficently large enough (eg: 0xFFFFFFFFFFFFFFFF).
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 8e575c50a1 upstream.
This patch adds a check to sbc_setup_write_same() to verify
the incoming WRITE_SAME LBA + number of blocks does not exceed
past the end-of-device.
Also check for potential LBA wrap-around as well.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f161d4b44d upstream.
This patch addresses the original PR_APTPL_BUF_LEN = 8k limitiation
for write-out of PR APTPL metadata that Martin has recently been
running into.
It changes core_scsi3_update_and_write_aptpl() to use vzalloc'ed
memory instead of kzalloc, and increases the default hardcoded
length to 256k.
It also adds logic in core_scsi3_update_and_write_aptpl() to double
the original length upon core_scsi3_update_aptpl_buf() failure, and
retries until the vzalloc'ed buffer is large enough to accommodate
the outgoing APTPL metadata.
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit d180d2bbb6 upstream.
As per the specififcation, the SB_DevFn is the PCI_DEVFN of the target
device and not the source. So PCI_DEVFN(2,0) is not correct. Further the
port ID should be enough to identify devices unless they are MFD. The
SB_DevFn was intended to remove ambiguity in case of these MFD devices.
For non MFD devices the recommendation for the target device IP was to
ignore these fields, but not all of them followed the recommendation.
Some like CCK ignore these fields and hence PCI_DEVFN(2, 0) works and so
does PCI_DEVFN(0, 0) as it works for DPIO. The issue came to light because
of GPIONC which was not getting programmed correctly with PCI_DEVFN(2, 0).
It turned out that this did not follow the recommendation and expected 0
in this field.
In general the recommendation is to use SB_DevFn as PCI_DEVFN(0, 0) for
all devices except target PCI devices.
Signed-off-by: Shobhit Kumar <shobhit.kumar@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 460822b0b1 upstream.
It's possible for invalidate_range_start mmu notifier callback to race
against userptr object release. If the gem object was released prior to
obtaining the spinlock in invalidate_range_start we're hitting null
pointer dereference.
Testcase: igt/gem_userptr_blits/stress-mm-invalidate-close
Testcase: igt/gem_userptr_blits/stress-mm-invalidate-close-overlap
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Jani: added code comment suggested by Chris]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 0ca0968554 upstream.
Nothing in Bspec seems to indicate that we actually needs this, and it
looks like can't work since by this point the pipe is off and so
vblanks won't really happen any more.
Note that Bspec mentions that it takes a vblank for this bit to
change, but _only_ when enabling.
Dropping this code quenches an annoying backtrace introduced by the
more anal checking since
commit 51e31d49c8
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 15 12:36:02 2014 +0200
drm/i915: Use generic vblank wait
Note: This fixes the fallout from the above commit, but does not address
the shortcomings of the IBX transcoder select workaround implementation
discussed during review [1].
[1] http://mid.gmane.org/87y4o7usxf.fsf@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86095
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit f0a1fb10e5 upstream.
This looked like an odd regression from
commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
Also fix the invalidate_domains mask in gen8_emit_flush() for ring !=
VCS.
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 31f40f8652 upstream.
When copying a relocation from userspace, copy the correct target
offset.
Signed-off-by: David Ung <davidu@nvidia.com>
Fixes: 961e3beae3 ("drm/tegra: Make job submission 64-bit safe")
[treding@nvidia.com: provide a better commit message]
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit ff59909a07 upstream.
The vmstat interfaces are good at hiding negative counts (at least when
CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.
I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.
The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.
It is possible that the bug whcih this patch fixes could cause OOM kills
when the system still has a lot of reclaimable page cache.
Fixes: edc2ca6124 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 9cb12d7b4c upstream.
For whatever reason, generic_access_phys() only remaps one page, but
actually allows to access arbitrary size. It's quite easy to trigger
large reads, like printing out large structure with gdb, which leads to a
crash. Fix it by remapping correct size.
Fixes: 28b2ee20c7 ("access_process_vm device memory infrastructure")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 372549c2a3 upstream.
What we want to check here is whether there is highorder freepage in buddy
list of other migratetype in order to steal it without fragmentation.
But, current code just checks cc->order which means allocation request
order. So, this is wrong.
Without this fix, non-movable synchronous compaction below pageblock order
would not stopped until compaction is complete, because migratetype of
most pageblocks are movable and high order freepage made by compaction is
usually on movable type buddy list.
There is some report related to this bug. See below link.
http://www.spinics.net/lists/linux-mm/msg81666.html
Although the issued system still has load spike comes from compaction,
this makes that system completely stable and responsive according to his
report.
stress-highalloc test in mmtests with non movable order 7 allocation
doesn't show any notable difference in allocation success rate, but, it
shows more compaction success rate.
Compaction success rate (Compaction success * 100 / Compaction stalls, %)
18.47 : 28.94
Fixes: 1fb3f8ca0e ("mm: compaction: capture a suitable high-order page immediately when it is made available")
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 8138a67a55 upstream.
I noticed that "allowed" can easily overflow by falling below 0, because
(total_vm / 32) can be larger than "allowed". The problem occurs in
OVERCOMMIT_NONE mode.
In this case, a huge allocation can success and overcommit the system
(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
(system-wide), so system become unusable.
The problem was masked out by commit c9b1d0981f
("mm: limit growth of 3% hardcoded other user reserve"),
but it's easy to reproduce it on older kernels:
1) set overcommit_memory sysctl to 2
2) mmap() large file multiple times (with VM_SHARED flag)
3) try to malloc() large amount of memory
It also can be reproduced on newer kernels, but miss-configured
sysctl_user_reserve_kbytes is required.
Fix this issue by switching to signed arithmetic here.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 5703b087dc upstream.
I noticed, that "allowed" can easily overflow by falling below 0,
because (total_vm / 32) can be larger than "allowed". The problem
occurs in OVERCOMMIT_NONE mode.
In this case, a huge allocation can success and overcommit the system
(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
(system-wide), so system become unusable.
The problem was masked out by commit c9b1d0981f
("mm: limit growth of 3% hardcoded other user reserve"),
but it's easy to reproduce it on older kernels:
1) set overcommit_memory sysctl to 2
2) mmap() large file multiple times (with VM_SHARED flag)
3) try to malloc() large amount of memory
It also can be reproduced on newer kernels, but miss-configured
sysctl_user_reserve_kbytes is required.
Fix this issue by switching to signed arithmetic here.
[akpm@linux-foundation.org: use min_t]
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 99592d598e upstream.
When studying page stealing, I noticed some weird looking decisions in
try_to_steal_freepages(). The first I assume is a bug (Patch 1), the
following two patches were driven by evaluation.
Testing was done with stress-highalloc of mmtests, using the
mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
often page stealing occurs for individual migratetypes, and what
migratetypes are used for fallbacks. Arguably, the worst case of page
stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
so the goal is to minimize these two cases.
The evaluation of v2 wasn't always clear win and Joonsoo questioned the
results. Here I used different baseline which includes RFC compaction
improvements from [1]. I found that the compaction improvements reduce
variability of stress-highalloc, so there's less noise in the data.
First, let's look at stress-highalloc configured to do sync compaction,
and how these patches reduce page stealing events during the test. First
column is after fresh reboot, other two are reiterations of test without
reboot. That was all accumulater over 5 re-iterations (so the benchmark
was run 5x3 times with 5 fresh restarts).
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Page alloc extfrag event 10264225 8702233 10244125
Extfrag fragmenting 10263271 8701552 10243473
Extfrag fragmenting for unmovable 13595 17616 15960
Extfrag fragmenting unmovable placed with movable 7989 12193 8447
Extfrag fragmenting for reclaimable 658 1840 1817
Extfrag fragmenting reclaimable placed with movable 558 1677 1679
Extfrag fragmenting for movable 10249018 8682096 10225696
With Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-nothp-1 6-nothp-2 6-nothp-3
Page alloc extfrag event 11834954 9877523 9774860
Extfrag fragmenting 11833993 9876880 9774245
Extfrag fragmenting for unmovable 7342 16129 11712
Extfrag fragmenting unmovable placed with movable 4191 10547 6270
Extfrag fragmenting for reclaimable 373 1130 923
Extfrag fragmenting reclaimable placed with movable 302 906 738
Extfrag fragmenting for movable 11826278 9859621 9761610
With Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-nothp-1 7-nothp-2 7-nothp-3
Page alloc extfrag event 4725990 3668793 3807436
Extfrag fragmenting 4725104 3668252 3806898
Extfrag fragmenting for unmovable 6678 7974 7281
Extfrag fragmenting unmovable placed with movable 2051 3829 4017
Extfrag fragmenting for reclaimable 429 1208 1278
Extfrag fragmenting reclaimable placed with movable 369 976 1034
Extfrag fragmenting for movable 4717997 3659070 3798339
With Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-nothp-1 8-nothp-2 8-nothp-3
Page alloc extfrag event 5016183 4700142 3850633
Extfrag fragmenting 5015325 4699613 3850072
Extfrag fragmenting for unmovable 1312 3154 3088
Extfrag fragmenting unmovable placed with movable 1115 2777 2714
Extfrag fragmenting for reclaimable 437 1193 1097
Extfrag fragmenting reclaimable placed with movable 330 969 879
Extfrag fragmenting for movable 5013576 4695266 3845887
In v2 we've seen apparent regression with Patch 1 for unmovable events,
this is now gone, suggesting it was indeed noise. Here, each patch
improves the situation for unmovable events. Reclaimable is improved by
patch 1 and then either the same modulo noise, or perhaps sligtly worse -
a small price for unmovable improvements, IMHO. The number of movable
allocations falling back to other migratetypes is most noisy, but it's
reduced to half at Patch 2 nevertheless. These are least critical as
compaction can move them around.
If we look at success rates, the patches don't affect them, that didn't change.
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-nothp-1 5-nothp-2 5-nothp-3
Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%)
Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%)
Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%)
Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%)
Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%)
Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%)
Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%)
Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-nothp-1 6-nothp-2 6-nothp-3
Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%)
Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%)
Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%)
Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%)
Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%)
Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%)
Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%)
Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-nothp-1 7-nothp-2 7-nothp-3
Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%)
Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%)
Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%)
Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%)
Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%)
Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%)
Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%)
Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%)
Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%)
Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-nothp-1 8-nothp-2 8-nothp-3
Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%)
Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%)
Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%)
Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%)
Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%)
Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%)
Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%)
Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%)
Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%)
While there's no improvement here, I consider reduced fragmentation events
to be worth on its own. Patch 2 also seems to reduce scanning for free
pages, and migrations in compaction, suggesting it has somewhat less work
to do:
Patch 1:
Compaction stalls 4153 3959 3978
Compaction success 1523 1441 1446
Compaction failures 2630 2517 2531
Page migrate success 4600827 4943120 5104348
Page migrate failure 19763 16656 17806
Compaction pages isolated 9597640 10305617 10653541
Compaction migrate scanned 77828948 86533283 87137064
Compaction free scanned 517758295 521312840 521462251
Compaction cost 5503 5932 6110
Patch 2:
Compaction stalls 3800 3450 3518
Compaction success 1421 1316 1317
Compaction failures 2379 2134 2201
Page migrate success 4160421 4502708 4752148
Page migrate failure 19705 14340 14911
Compaction pages isolated 8731983 9382374 9910043
Compaction migrate scanned 98362797 96349194 98609686
Compaction free scanned 496512560 469502017 480442545
Compaction cost 5173 5526 5811
As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
of unmovable and reclaimable pageblocks.
Configuring the benchmark to allocate like THP page fault (i.e. no sync
compaction) gives much noisier results for iterations 2 and 3 after
reboot. This is not so surprising given how [1] offers lower improvements
in this scenario due to less restarts after deferred compaction which
would change compaction pivot.
Baseline:
3.19-rc4 3.19-rc4 3.19-rc4
5-thp-1 5-thp-2 5-thp-3
Page alloc extfrag event 8148965 6227815 6646741
Extfrag fragmenting 8147872 6227130 6646117
Extfrag fragmenting for unmovable 10324 12942 15975
Extfrag fragmenting unmovable placed with movable 5972 8495 10907
Extfrag fragmenting for reclaimable 601 1707 2210
Extfrag fragmenting reclaimable placed with movable 520 1570 2000
Extfrag fragmenting for movable 8136947 6212481 6627932
Patch 1:
3.19-rc4 3.19-rc4 3.19-rc4
6-thp-1 6-thp-2 6-thp-3
Page alloc extfrag event 8345457 7574471 7020419
Extfrag fragmenting 8343546 7573777 7019718
Extfrag fragmenting for unmovable 10256 18535 30716
Extfrag fragmenting unmovable placed with movable 6893 11726 22181
Extfrag fragmenting for reclaimable 465 1208 1023
Extfrag fragmenting reclaimable placed with movable 353 996 843
Extfrag fragmenting for movable 8332825 7554034 6987979
Patch 2:
3.19-rc4 3.19-rc4 3.19-rc4
7-thp-1 7-thp-2 7-thp-3
Page alloc extfrag event 3512847 3020756 2891625
Extfrag fragmenting 3511940 3020185 2891059
Extfrag fragmenting for unmovable 9017 6892 6191
Extfrag fragmenting unmovable placed with movable 1524 3053 2435
Extfrag fragmenting for reclaimable 445 1081 1160
Extfrag fragmenting reclaimable placed with movable 375 918 986
Extfrag fragmenting for movable 3502478 3012212 2883708
Patch 3:
3.19-rc4 3.19-rc4 3.19-rc4
8-thp-1 8-thp-2 8-thp-3
Page alloc extfrag event 3181699 3082881 2674164
Extfrag fragmenting 3180812 3082303 2673611
Extfrag fragmenting for unmovable 1201 4031 4040
Extfrag fragmenting unmovable placed with movable 974 3611 3645
Extfrag fragmenting for reclaimable 478 1165 1294
Extfrag fragmenting reclaimable placed with movable 387 985 1030
Extfrag fragmenting for movable 3179133 3077107 2668277
The improvements for first iteration are clear, the rest is much noisier
and can appear like regression for Patch 1. Anyway, patch 2 rectifies it.
Allocation success rates are again unaffected so there's no point in
making this e-mail any longer.
[1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
This patch (of 3):
When __rmqueue_fallback() is called to allocate a page of order X, it will
find a page of order Y >= X of a fallback migratetype, which is different
from the desired migratetype. With the help of try_to_steal_freepages(),
it may change the migratetype (to the desired one) also of:
1) all currently free pages in the pageblock containing the fallback page
2) the fallback pageblock itself
3) buddy pages created by splitting the fallback page (when Y > X)
These decisions take the order Y into account, as well as the desired
migratetype, with the goal of preventing multiple fallback allocations
that could e.g. distribute UNMOVABLE allocations among multiple
pageblocks.
Originally, decision for 1) has implied the decision for 3). Commit
47118af076 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
(probably unintentionally) so that the buddy pages in case 3) are always
changed to the desired migratetype, except for CMA pageblocks.
Commit fef903efcf ("mm/page_allo.c: restructure free-page stealing code
and fix a bug") did some refactoring and added a comment that the case of
3) is intended. Commit 0cbef29a78 ("mm: __rmqueue_fallback() should
respect pageblock type") removed the comment and tried to restore the
original behavior where 1) implies 3), but due to the previous
refactoring, the result is instead that only 2) implies 3) - and the
conditions for 2) are less frequently met than conditions for 1). This
may increase fragmentation in situations where the code decides to steal
all free pages from the pageblock (case 1)), but then gives back the buddy
pages produced by splitting.
This patch restores the original intended logic where 1) implies 3).
During testing with stress-highalloc from mmtests, this has shown to
decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
In some cases it has increased the number of events when MOVABLE
allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
fixable by sync compaction and thus less harmful.
Note that evaluation has shown that the behavior introduced by
47118af076 for buddy pages in case 3) is actually even better than the
original logic, so the following patch will introduce it properly once
again. For stable backports of this patch it makes thus sense to only fix
versions containing 0cbef29a78.
[iamjoonsoo.kim@lge.com: tracepoint fix]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit 0f792cf949 upstream.
When running the test which causes the race as shown in the previous patch,
we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
This race happens when pte turns into migration entry just after the first
check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
To fix this, we need to check pte_present() again after huge_ptep_get().
This patch also reorders taking ptl and doing pte_page(), because
pte_page() should be done in ptl. Due to this reordering, we need use
trylock_page() in page != pagecache_page case to respect locking order.
Fixes: 66aebce747 ("hugetlb: fix race condition in hugetlb_fault()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9215f437b8 ]
Currently the list is traversed using rcu variant. That is not correct
since dev_set_mac_address can be called which eventually calls
rtmsg_ifinfo_build_skb and there, skb allocation can sleep. So fix this
by remove the rcu usage here.
Fixes: 3d249d4ca7 "net: introduce ethernet teaming device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 9145736d48 ]
1. For an IPv4 ping socket, ping_check_bind_addr does not check
the family of the socket address that's passed in. Instead,
make it behave like inet_bind, which enforces either that the
address family is AF_INET, or that the family is AF_UNSPEC and
the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
if the socket family is not AF_INET6. Return EAFNOSUPPORT
instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
instead of EINVAL if an incorrect socket address structure is
passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
IPv4, and it cannot easily be made to support IPv4 because
the protocol numbers for ICMP and ICMPv6 are different. This
makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
of making the socket unusable.
Among other things, this fixes an oops that can be triggered by:
int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
struct sockaddr_in6 sin6 = {
.sin6_family = AF_INET6,
.sin6_addr = in6addr_any,
};
bind(s, (struct sockaddr *) &sin6, sizeof(sin6));
Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit acf8dd0a9d ]
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 42c972a1f3 ]
The National Instruments USB Host-to-Host Cable is based on the Prolific
PL-25A1 chipset. Add its VID/PID so the plusb driver will recognize it.
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit cac5e65e8a ]
We did a failed attempt in the past to only use rcu in rtnl dump
operations (commit e67f88dd12 "net: dont hold rtnl mutex during
netlink dump callbacks")
Now that dumps are holding RTNL anyway, there is no need to also
use rcu locking, as it forbids any scheduling ability, like
GFP_KERNEL allocations that controlling path should use instead
of GFP_ATOMIC whenever possible.
This should fix following splat Cong Wang reported :
[ INFO: suspicious RCU usage. ]
3.19.0+ #805 Tainted: G W
include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/771:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
#1: (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e
stack backtrace:
CPU: 3 PID: 771 Comm: ip Tainted: G W 3.19.0+ #805
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
Call Trace:
[<ffffffff81a27457>] dump_stack+0x4c/0x65
[<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
[<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
[<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
[<ffffffff8109e67d>] __might_sleep+0x78/0x80
[<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
[<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
[<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
[<ffffffff817c1383>] alloc_netid+0x61/0x69
[<ffffffff817c14c3>] __peernet2id+0x79/0x8d
[<ffffffff817c1ab7>] peernet2id+0x13/0x1f
[<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
[<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
[<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
[<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
[<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
[<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
[<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
[<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
[<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
[<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
[<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
[<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
[<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
[<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
[<ffffffff811ac556>] ? __fget_light+0x2d/0x52
[<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
[<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 0c7aecd4bd ("netns: add rtnl cmd to add and get peer netns ids")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 4092e6acf5 ]
This patch adds bcmgenet_tx_poll for the tx_rings. This can reduce the
interrupt load and send xmit in network stack on time. This also
separated for the completion of tx_ring16 from bcmgenet_poll.
The bcmgenet_tx_reclaim of tx_ring[{0,1,2,3}] operative by an interrupt
is to be not more than a certain number TxBDs. It is caused by too
slowly reclaiming the transmitted skb. Therefore, performance
degradation of xmit after 605ad7f ("tcp: refine TSO autosizing").
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 2f1d8b9e8a ]
Brian reported crashes using IPv6 traffic with macvtap/veth combo.
I tracked the crashes in neigh_hh_output()
-> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
Neighbour code assumes headroom to push Ethernet header is
at least 16 bytes.
It appears macvtap has only 14 bytes available on arches
where NET_IP_ALIGN is 0 (like x86)
Effect is a corruption of 2 bytes right before skb->head,
and possible crashes if accessing non existing memory.
This fix should also increase IPv4 performance, as paranoid code
in ip_finish_output2() wont have to call skb_realloc_headroom()
Reported-by: Brian Rak <brak@vultr.com>
Tested-by: Brian Rak <brak@vultr.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit d720d8cec5 ]
With commit a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg), the
MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
changing the kernel compat behaviour from the one before the commit it
was trying to fix (1be374a051, net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg).
On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
the kernel). However, on a 64-bit kernel, the compat ABI is different
with commit a7526eb5d0.
This patch changes the compat_sys_{send,recv}msg behaviour to the one
prior to commit 1be374a051.
The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
but the general rule is not to break user ABI (even when the user
behaviour is not entirely sane).
Fixes: a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 57e5956319 ]
Currently following race is possible in team:
CPU0 CPU1
team_port_del
team_upper_dev_unlink
priv_flags &= ~IFF_TEAM_PORT
team_handle_frame
team_port_get_rcu
team_port_exists
priv_flags & IFF_TEAM_PORT == 0
return NULL (instead of port got
from rx_handler_data)
netdev_rx_handler_unregister
The thing is that the flag is removed before rx_handler is unregistered.
If team_handle_frame is called in between, team_port_exists returns 0
and team_port_get_rcu will return NULL.
So do not check the flag here. It is guaranteed by netdev_rx_handler_unregister
that team_handle_frame will always see valid rx_handler_data pointer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 52d6c8c6ca ]
Trying to use burst capability (aka xmit_more) on a virtual device
like bonding is not supported.
For example, skb might be queued multiple times on a qdisc, with
various list corruptions.
Fixes: 38b2cf2982 ("net: pktgen: packet bursting via skb->xmit_more")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
This reverts commit 1e91887685.
Revert BQL support in r8169 driver as several regressions
point to this commit and we cannot figure out the real
cause yet.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit a4176a9391 ]
colons are used as a separator in netdev device lookup in dev_ioctl.c
Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 997d5c3f44 ]
Non NAPI drivers can call skb_tstamp_tx() and then sock_queue_err_skb()
from hard IRQ context.
Therefore, sock_dequeue_err_skb() needs to block hard irq or
corruptions or hangs can happen.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 364a9e9324 ("sock: deduplicate errqueue dequeue")
Fixes: cb820f8e4b ("net: Provide a generic socket error queue delivery method for Tx time stamps.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7b4577a9da ]
Open vSwitch allows moving internal vport to different namespace
while still connected to the bridge. But when namespace deleted
OVS does not detach these vports, that results in dangling
pointer to netdevice which causes kernel panic as follows.
This issue is fixed by detaching all ovs ports from the deleted
namespace at net-exit.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch]
Oops: 0000 [#1] SMP
Call Trace:
[<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch]
[<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch]
[<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0
[<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0
[<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0
[<ffffffff8167deac>] genl_rcv+0x2c/0x40
[<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0
[<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0
[<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0
[<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420
[<ffffffff8162f541>] __sys_sendmsg+0x51/0x90
[<ffffffff8162f592>] SyS_sendmsg+0x12/0x20
[<ffffffff81764ee9>] system_call_fastpath+0x12/0x17
Reported-by: Assaf Muller <amuller@redhat.com>
Fixes: 46df7b81454("openvswitch: Add support for network namespaces.")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@noironetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 34eea79e26 ]
In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 54da5a8be3 ]
phy_init_eee uses phy_find_setting(phydev->speed, phydev->duplex)
to find a valid entry in the settings array for the given speed
and duplex value. For full duplex 1000baseT, this will return
the first matching entry, which is the entry for 1000baseKX_Full.
If the phy eee does not support 1000baseKX_Full, this entry will not
match, causing phy_init_eee to fail for no good reason.
Fixes: 9a9c56cb34 ("net: phy: fix a bug when verify the EEE support")
Fixes: 3e7077067e ("phy: Expand phy speed/duplex settings array")
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3e32e733d1 ]
ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit fba04a9e0c ]
skb_copy_bits() returns zero on success and negative value on error,
so it is needed to invert the condition in ip_check_defrag().
Fixes: 1bf3751ec9 ("ipv4: ip_check_defrag must not modify skb before unsharing")
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 1c4cff0cf5 ]
The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.
[xiyou.wangcong@gmail.com: remove a useless kfree()]
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7afb8886a0 ]
Ignacy reported that when eth0 is down and add a vlan device
on top of it like:
ip link add link eth0 name eth0.1 up type vlan id 1
We will get a refcount leak:
unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2
The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().
Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 3b4711757d ]
ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec63917 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit ba34e6d9d3 ]
IPv6 can keep a copy of SYN message using skb_get() in
tcp_v6_conn_request() so that caller wont free the skb when calling
kfree_skb() later.
Therefore TCP fast open has to clone the skb it is queuing in
child->sk_receive_queue, as all skbs consumed from receive_queue are
freed using __kfree_skb() (ie assuming skb->users == 1)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: 5b7ed0892f ("tcp: move fastopen functions to tcp_fastopen.c")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 364d5716a7 ]
ifla_vf_policy[] is wrong in advertising its individual member types as
NLA_BINARY since .type = NLA_BINARY in combination with .len declares the
len member as *max* attribute length [0, len].
The issue is that when do_setvfinfo() is being called to set up a VF
through ndo handler, we could set corrupted data if the attribute length
is less than the size of the related structure itself.
The intent is exactly the opposite, namely to make sure to pass at least
data of minimum size of len.
Fixes: ebc08a6f47 ("rtnetlink: Add VF config code to rtnetlink")
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 7744b5f369 ]
This patch fixes two issues in UDP checksum computation in pktgen.
First, the pseudo-header uses the source and destination IP
addresses. Currently, the ports are used for IPv4.
Second, the UDP checksum covers both header and data. So we need to
generate the data earlier (move pktgen_finalize_skb up), and compute
the checksum for UDP header + data.
Fixes: c26bf4a513 ("pktgen: Add UDPCSUM flag to support UDP checksums")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 11b1f8288d ]
We still need a validate_link_af() handler with an appropriate nla policy,
similarly as we have in IPv4 case, otherwise size validations are not being
done properly in that case.
Fixes: f53adae4ea ("net: ipv6: add tokenized interface identifier support")
Fixes: bc91b0f07a ("ipv6: addrconf: implement address generation modes")
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[ Upstream commit 233c96fc07 ]
flow_cache_flush_task references a structure member flow_cache_gc_work
where it should reference flow_cache_flush_task instead.
Kernel panic occurs on kernels using IPsec during XFRM garbage
collection. The garbage collection interval can be shortened using the
following sysctl settings:
net.ipv4.xfrm4_gc_thresh=4
net.ipv6.xfrm6_gc_thresh=4
With the default settings, our productions servers crash approximately
once a week. With the settings above, they crash immediately.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Tomáš Charvát <tc@excello.cz>
Tested-by: Jan Hejl <jh@excello.cz>
Signed-off-by: Miroslav Urbanek <mu@miroslavurbanek.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
commit b10a08194c upstream.
Currently maximum space limit quota format supports is in blocks however
since we store space limits in bytes, this is somewhat confusing. So
store the maximum limit in bytes as well. Also rename the field to match
the new unit and related inode field to match the new naming scheme.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e7c22d447 upstream.
The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.
The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":
static unsigned long randomize_stack_top(unsigned long stack_top)
{
unsigned int random_variable = 0;
if ((current->flags & PF_RANDOMIZE) &&
!(current->personality & ADDR_NO_RANDOMIZE)) {
random_variable = get_random_int() & STACK_RND_MASK;
random_variable <<= PAGE_SHIFT;
}
return PAGE_ALIGN(stack_top) + random_variable;
return PAGE_ALIGN(stack_top) - random_variable;
}
Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
random_variable <<= PAGE_SHIFT;
then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.
These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).
This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().
The successful fix can be tested with:
$ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack]
7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack]
7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack]
7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack]
...
Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Ismael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96738c69a7 upstream.
Andy pointed out that if an NMI or MCE is received while we're in the
middle of an EFI mixed mode call a triple fault will occur. This can
happen, for example, when issuing an EFI mixed mode call while running
perf.
The reason for the triple fault is that we execute the mixed mode call
in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
installed throughout the call.
At Andy's suggestion, stop playing the games we currently do at runtime,
such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
can simply switch to the __KERNEL32_CS descriptor before invoking
firmware services, and run in compatibility mode. This way, if an
NMI/MCE does occur the kernel IDT handler will execute correctly, since
it'll jump to __KERNEL_CS automatically.
However, this change is only possible post-ExitBootServices(). Before
then the firmware "owns" the machine and expects for its 32-bit IDT
handlers to be left intact to service interrupts, etc.
So, we now need to distinguish between early boot and runtime
invocations of EFI services. During early boot, we need to restore the
GDT that the firmware expects to be present. We can only jump to the
__KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
has been invoked.
A liberal sprinkling of comments in the thunking code should make the
differences in early and late environments more apparent.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a4bcf470c upstream.
We have a scenario where after the fsync log replay we can lose file data
that had been previously fsync'ed if we added an hard link for our inode
and after that we sync'ed the fsync log (for example by fsync'ing some
other file or directory).
This is because when adding an hard link we updated the inode item in the
log tree with an i_size value of 0. At that point the new inode item was
in memory only and a subsequent fsync log replay would not make us lose
the file data. However if after adding the hard link we sync the log tree
to disk, by fsync'ing some other file or directory for example, we ended
up losing the file data after log replay, because the inode item in the
persisted log tree had an an i_size of zero.
This is easy to reproduce, and the following excerpt from my test for
xfstests shows this:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create one file with data and fsync it.
# This made the btrfs fsync log persist the data and the inode metadata with
# a correct inode->i_size (4096 bytes).
$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now add one hard link to our file. This made the btrfs code update the fsync
# log, in memory only, with an inode metadata having a size of 0.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
# Now force persistence of the fsync log to disk, for example, by fsyncing some
# other file.
touch $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
# Before a power loss or crash, we could read the 4Kb of data from our file as
# expected.
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# After the fsync log replay, because the fsync log had a value of 0 for our
# inode's i_size, we couldn't read anymore the 4Kb of data that we previously
# wrote and fsync'ed. The size of the file became 0 after the fsync log replay.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
Another alternative test, that doesn't need to fsync an inode in the same
transaction it was created, is:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create our test file with some data.
$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Make sure the file is durably persisted.
sync
# Append some data to our file, to increase its size.
$XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Fsync the file, so from this point on if a crash/power failure happens, our
# new data is guaranteed to be there next time the fs is mounted.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Add one hard link to our file. This made btrfs write into the in memory fsync
# log a special inode with generation 0 and an i_size of 0 too. Note that this
# didn't update the inode in the fsync log on disk.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
# Now make sure the in memory fsync log is durably persisted.
# Creating and fsync'ing another file will do it.
touch $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
# As expected, before the crash/power failure, we should be able to read the
# 12Kb of file data.
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# After mounting the fs again, the fsync log was replayed.
# The btrfs fsync log replay code didn't update the i_size of the persisted
# inode because the inode item in the log had a special generation with a
# value of 0 (and it couldn't know the correct i_size, since that inode item
# had a 0 i_size too). This made the last 4Kb of file data inaccessible and
# effectively lost.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
This isn't a new issue/regression. This problem has been around since the
log tree code was added in 2008:
Btrfs: Add a write ahead tree log to optimize synchronous operations
(commit e02119d5a7)
Test cases for xfstests follow soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 381cf6587f upstream.
If btrfs_find_item is called with NULL path it allocates one locally but
does not free it. Affected paths are inserting an orphan item for a file
and for a subvol root.
Move the path allocation to the callers.
Fixes: 3f870c2899 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality")
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5efa0490cc upstream.
This has been confusing people for too long, the message is really just
informative.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7eb71e0351 upstream.
It turns out it's possible to get __remove_osd() called twice on the
same OSD. That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory. One scenario that I was able to reproduce is as follows:
<osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
osd_reset()
<osdmap - osd3 down>
ceph_osdc_handle_map()
<takes map_sem>
kick_requests()
<takes request_mutex>
reset_changed_osds()
__reset_osd()
__remove_osd()
<releases request_mutex>
<releases map_sem>
<takes map_sem>
<takes request_mutex>
__kick_osd_requests()
__reset_osd()
__remove_osd() <-- !!!
A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().
Fixes: http://tracker.ceph.com/issues/8087
Cc: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4690555e13 upstream.
Since kernel 3.14 the backlight control has been broken on various Samsung
Atom based netbooks. This has been bisected and this problem happens since
commit b35684b8fa ("drm/i915: do full backlight setup at enable time")
This has been reported and discussed in detail here:
http://lists.freedesktop.org/archives/intel-gfx/2014-July/049395.html
Unfortunately no-one has been able to fix this. This only affects Samsung
Atom netbooks, and the Linux kernel and the BIOS of those laptops have never
worked well together. All affected laptops already have a quirk to avoid using
the standard acpi-video interface and instead use the samsung specific SABI
interface which samsung-laptop uses. It seems that recent fixes to the i915
driver have also broken backlight control through the SABI interface.
The intel_backlight driver OTOH works fine, and also allows for finer grained
backlight control. So add a new use_native_backlight quirk, and replace the
broken_acpi_video quirk with this quirk for affected models. This new quirk
disables acpi-video as before and also stops samsung-laptop from registering
the SABI based samsung_laptop backlight interface, leaving only the working
intel_backlight interface.
This commit enables this new quirk for 3 models which are known to be affected,
chances are that it needs to be used on other models too.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1094948 # N145P
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1115713 # N250P
Reported-by: Bertrik Sikken <bertrik@sikken.nl> # N150P
Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11249e7399 upstream.
d0585cd815 ("sb_edac: Claim a different PCI device") changed the
probing of sb_edac to look for PCI device 0x3ca0:
3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
...
but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
in sbridge_probe() therefore the probing fails.
Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
.i.e., the 14.0 device, fixes the issue and driver loads successfully
again:
[ 2449.013120] EDAC DEBUG: sbridge_init:
[ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
...
Add a debug printk while at it to be able to catch the failure in the
future and dump driver version on successful load.
Fixes: d0585cd815 ("sb_edac: Claim a different PCI device")
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1901ef099 upstream.
When a drive is marked write-mostly it should only be the
target of reads if there is no other option.
This behaviour was broken by
commit 9dedf60313
md/raid1: read balance chooses idlest disk for SSD
which causes a write-mostly device to be *preferred* is some cases.
Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.
We only need to test one of these as they are both changed
from -1 or >=0 at the same time.
As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.
Reported-by: Tomáš Hodek <tomas.hodek@volny.cz>
Reported-by: Dark Penguin <darkpenguin@yandex.ru>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26ac107378 upstream.
Commit a7854487cd:
md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
Causes an RCW cycle to be forced even when the array is degraded.
A degraded array cannot support RCW as that requires reading all data
blocks, and one may be missing.
Forcing an RCW when it is not possible causes a live-lock and the code
spins, repeatedly deciding to do something that cannot succeed.
So change the condition to only force RCW on non-degraded arrays.
Reported-by: Manibalan P <pmanibalan@amiindia.co.in>
Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: a7854487cd
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48536c9195 upstream.
Commit f6edb53c49 converted the probe to
a CPU wide event first (pid == -1). For kernels that do not support
the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this
errno is not handled pid is not reset to 0 and the subsequent use of
pid = -1 as an argument brings in an additional failure path if
perf_event_paranoid > 0:
$ perf record -- sleep 1
perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ]
Also, ensure the fd of the confirmation check is closed and comment why
pid = -1 is used.
Needs to go to 3.18 stable tree as well.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Based-on-patch-by: David Ahern <david.ahern@oracle.com>
Acked-by: David Ahern <david.ahern@oracle.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/54EC610C.8000403@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4a19eb3b1 upstream.
We have two race conditions in the probe code which could lead to a null
pointer dereference in the interrupt handler.
The interrupt handler accesses the clockevent device, which may not yet be
registered.
First race condition happens when the interrupt handler gets registered before
the interrupts get disabled. The second race condition happens when the
interrupts get enabled, but the clockevent device is not yet registered.
Fix that by disabling the interrupts before we register the interrupt and enable
the interrupts after the clockevent device got registered.
Reported-by: Gongbae Park <yongbae2@gmail.com>
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2996cb29b upstream.
The KSTK_EIP() and KSTK_ESP() macros should return the user program
counter (PC) and stack pointer (A0StP) of the given task. These are used
to determine which VMA corresponds to the user stack in
/proc/<pid>/maps, and for the user PC & A0StP in /proc/<pid>/stat.
However for Meta the PC & A0StP from the task's kernel context are used,
resulting in broken output. For example in following /proc/<pid>/maps
output, the 3afff000-3b021000 VMA should be described as the stack:
# cat /proc/self/maps
...
100b0000-100b1000 rwxp 00000000 00:00 0 [heap]
3afff000-3b021000 rwxp 00000000 00:00 0
And in the following /proc/<pid>/stat output, the PC is in kernel code
(1074234964 = 0x40078654) and the A0StP is in the kernel heap
(1335981392 = 0x4fa17550):
# cat /proc/self/stat
51 (cat) R ... 1335981392 1074234964 ...
Fix the definitions of KSTK_EIP() and KSTK_ESP() to use
task_pt_regs(tsk)->ctx rather than (tsk)->thread.kernel_context. This
gets the registers from the user context stored after the thread info at
the base of the kernel stack, which is from the last entry into the
kernel from userland, regardless of where in the kernel the task may
have been interrupted, which results in the following more correct
/proc/<pid>/maps output:
# cat /proc/self/maps
...
0800b000-08070000 r-xp 00000000 00:02 207 /lib/libuClibc-0.9.34-git.so
...
100b0000-100b1000 rwxp 00000000 00:00 0 [heap]
3afff000-3b021000 rwxp 00000000 00:00 0 [stack]
And /proc/<pid>/stat now correctly reports the PC in libuClibc
(134320308 = 0x80190b4) and the A0StP in the [stack] region (989864576 =
0x3b002280):
# cat /proc/self/stat
51 (cat) R ... 989864576 134320308 ...
Reported-by: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfcc70a8c8 upstream.
For filesystems without separate project quota inode field in the
superblock we just reuse project quota file for group quotas (and vice
versa) if project quota file is allocated and we need group quota file.
When we reuse the file, quota structures on disk suddenly have wrong
type stored in d_flags though. Nobody really cares about this (although
structure type reported to userspace was wrong as well) except
that after commit 14bf61ffe6 (quota: Switch ->get_dqblk() and
->set_dqblk() to use bytes as space units) assertion in
xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
was testing without XFS_DEBUG so I didn't notice when submitting the
above commit).
Fix the problem by properly resetting ddq->d_flags when running quotacheck
for a quota file.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f97c20e5f upstream.
The gpio_chip operations receive a pointer the gpio_chip struct which is
contained in the driver's private struct, yet the container_of call in those
functions point to the mfd struct defined in include/linux/mfd/tps65912.h.
Signed-off-by: Nicolas Saenz Julienne <nicolassaenzj@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cf75e9e4d upstream.
The change:
7b8792bbdf
gpiolib: of: Correct error handling in of_get_named_gpiod_flags
assumed that only one gpio-chip is registred per of-node.
Some drivers register more than one chip per of-node, so
adjust the matching function of_gpiochip_find_and_xlate to
not stop looking for chips if a node-match is found and
the translation fails.
Fixes: 7b8792bbdf ("gpiolib: of: Correct error handling in of_get_named_gpiod_flags")
Signed-off-by: Hans Holmberg <hans.holmberg@intel.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Tyler Hall <tylerwhall@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d42d48a34 upstream.
The native (64-bit) sigval_t union contains sival_int (32-bit) and
sival_ptr (64-bit). When a compat application invokes a syscall that
takes a sigval_t value (as part of a larger structure, e.g.
compat_sys_mq_notify, compat_sys_timer_create), the compat_sigval_t
union is converted to the native sigval_t with sival_int overlapping
with either the least or the most significant half of sival_ptr,
depending on endianness. When the corresponding signal is delivered to a
compat application, on big endian the current (compat_uptr_t)sival_ptr
cast always returns 0 since sival_int corresponds to the top part of
sival_ptr. This patch fixes copy_siginfo_to_user32() so that sival_int
is copied to the compat_siginfo_t structure.
Reported-by: Bamvor Jian Zhang <bamvor.zhangjian@huawei.com>
Tested-by: Bamvor Jian Zhang <bamvor.zhangjian@huawei.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a52d209336 upstream.
Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped
working. This patch enables the "replacement" for REGULATOR_DUMMY and
allows the touchscreen to work even though there is no regulator for "vcc".
Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b568b8601f upstream.
Currently Xen Domain0 has special treatment for ACPI SCI interrupt,
that is initialize irq for ACPI SCI at early stage in a special way as:
xen_init_IRQ()
->pci_xen_initial_domain()
->xen_setup_acpi_sci()
Allocate and initialize irq for ACPI SCI
Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq
number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on
IOAPIC irqdomains through following path
acpi_gsi_to_irq()
->mp_map_gsi_to_irq()
->mp_map_pin_to_irq()
->check IOAPIC irqdomain
For PV domains, it uses Xen event based interrupt manangement and
doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC.
This causes Xen domain0 fail to install interrupt handler for ACPI SCI
and all ACPI events will be lost. Please refer to:
https://lkml.org/lkml/2014/12/19/178
So the fix is to get rid of special treatment for ACPI SCI, just treat
ACPI SCI as normal GSI interrupt as:
acpi_gsi_to_irq()
->acpi_register_gsi()
->acpi_register_gsi_xen()
->xen_register_gsi()
With above change, there's no need for xen_setup_acpi_sci() anymore.
The above change also works with bare metal kernel too.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Tony Luck <tony.luck@intel.com>
Cc: xen-devel@lists.xenproject.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ac96caf0f upstream.
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.
This patch changes our hrtimer to be based on a monotonic clock.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d00f75942 upstream.
Patch 0759d0681c ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.
This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.
As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.
As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.
A proper fix might be to introduce a RAW based hrtimer.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f187922dd upstream.
When the guest writes to the TSC, the masterclock TSC copy must be
updated as well along with the TSC_OFFSET update, otherwise a negative
tsc_timestamp is calculated at kvm_guest_time_update.
Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
"if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
becomes redundant, so remove the do_request boolean and collapse
everything into a single condition.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23b133bdc4 upstream.
Check length of extended attributes and allocation descriptors when
loading inodes from disk. Otherwise corrupted filesystems could confuse
the code and make the kernel oops.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7914495427 upstream.
Store blocksize in a local variable in udf_fill_inode() since it is used
a lot of times.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed4cbc81ad upstream.
activate_mm() and switch_mm() call get_new_mmu_context() which in turn
can enable the HTW before the entryhi is changed with the new ASID.
Since the latter will enable the HTW in local_flush_tlb_all(),
then there is a small timing window where the HTW is running with the
new ASID but with an old pgd since the TLBMISS_HANDLER_SETUP_PGD
hasn't assigned a new one yet. In order to prevent that, we introduce a
simple htw counter to avoid starting HTW accidentally due to nested
htw_{start,stop}() sequences. Moreover, since various IPI calls can
enforce TLB flushing operations on a different core, such an operation
may interrupt another htw_{stop,start} in progress leading inconsistent
updates of the htw_seq variable. In order to avoid that, we disable the
interrupts whenever we update that variable.
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9118/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06f34e1c28 upstream.
We used to calculate page address differently in 2 cases:
1. In virt_to_page(x) we do
--->8---
mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT
--->8---
2. In in pte_page(x) we do
--->8---
mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT
--->8---
That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE -
different pages will be selected depending on where and how we calculate
page address.
In particular in the STAR 9000853582 when gdb attempted to read memory
of another process it got improper page in get_user_pages() because this
is exactly one of the places where we search for a page by pte_page().
The fix is trivial - we need to calculate page address similarly in both
cases.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f1437f61a upstream.
When the UART is in DMA receive mode (RDMAS set) and one character
just arrived while another interrupt is handled (e.g. TX), the RDRF
(receiver data register full flag) is set due to the water level of
1. But since the DMA will take care of this character, there is no
need to handle it by calling lpuart_prepare_rx. Handling it leads to
adding the RX timeout timer twice:
[ 74.336698] Kernel BUG at 80053070 [verbose debug info unavailable]
[ 74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd
[ 74.347817] Modules linked in: 0 S 0.0 0.0 0:00.00 writeback
[ 74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788
[ 74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t
[ 74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd
[ 74.370002] PC is at add_timer+0x24/0x28.0 0.0 0:00.09 kworker/u2:1
[ 74.373960] LR is at lpuart_int+0x15c/0x3d8
[ 74.378171] pc : [<80053070>] lr : [<802e0d88>] psr: a0010193
[ 74.378171] sp : 8079de10 ip : 8079de20 fp : 8079de1c
[ 74.389694] r10: 807d44c0 r9 : 8688c300 r8 : 00000013
[ 74.394943] r7 : 20010193 r6 : 00000000 r5 : 000000a0 r4 : 86997210
[ 74.401498] r3 : ffffa7da r2 : 80817868 r1 : 86997210 r0 : 86997344
[ 74.408052] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 74.415489] Control: 10c5387d Table: 8611c059 DAC: 00000015
[ 74.421265] Process swapper (pid: 0, stack limit = 0x8079c230)
...
Solve this by only execute the receiver path (lpuart_prepare_rx) if
the DMA receive mode (RDMAS) is not set. Also, make sure the flag is
cleared on initialization, in case it has been left set.
This can be best reproduced using UART as a serial console, then
running top while dd'ing data into the terminal.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a8588a1cf upstream.
If the serial port gets closed while a RX transfer is in progress,
the timer might fire after the serial port shutdown finished. This
leads in a NULL pointer dereference:
[ 7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 7.516590] pgd = 86348000
[ 7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000
[ 7.526145] Internal error: Oops: 17 [#1] ARM
[ 7.530611] Modules linked in:
[ 7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778
[ 7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree)
[ 7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000
[ 7.553392] PC is at lpuart_timer_func+0x24/0xf8
[ 7.558127] LR is at lpuart_timer_func+0x20/0xf8
[ 7.562857] pc : [<802df99c>] lr : [<802df998>] psr: 600b0113
[ 7.562857] sp : 86ac9b90 ip : 86ac9b90 fp : 86ac9bbc
[ 7.574467] r10: 80817180 r9 : 80817b98 r8 : 80817998
[ 7.579803] r7 : 807acee0 r6 : 86989000 r5 : 00000100 r4 : 86997210
[ 7.586444] r3 : 86ac8000 r2 : 86ac9bc0 r1 : 86997210 r0 : 00000000
[ 7.593085] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 7.600341] Control: 10c5387d Table: 86348059 DAC: 00000015
[ 7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230)
Setup the timer on UART startup which allows to delete the timer
unconditionally on shutdown. This also saves the initialization
on each transfer.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1467559232 upstream.
The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.
A define of K(x) as
is defined in the code, but not used.
This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 165235180f upstream.
mvebu_armada375_smp_wa_init is only used on armada 375 but is defined
for all mvebu machines. As it calls a function that is only provided
sometimes, this can result in a link error:
arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init':
:(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa'
To solve this, we can just change the existing #ifdef around the
function to also check for Armada375 SMP platforms.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 305969fb62 ("ARM: mvebu: use the common function for Armada 375 SMP workaround")
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95fcedb027 upstream.
The vexpress tc2 power management code calls mcpm_loopback, which
is only available if ARM_CPU_SUSPEND is enabled, otherwise we
get a link error:
arch/arm/mach-vexpress/built-in.o: In function `tc2_pm_init':
arch/arm/mach-vexpress/tc2_pm.c:389: undefined reference to `mcpm_loopback'
This explicitly selects ARM_CPU_SUSPEND like other platforms that
need it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 3592d7e002 ("ARM: 8082/1: TC2: test the MCPM loopback during boot")
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bc78f32c2 upstream.
Add regulator_has_full_constraints() call to poodle board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.
This fixes the following warnings that can be seen on poodle if
regulators are enabled:
ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 271e80176a upstream.
Add regulator_has_full_constraints() call to corgi board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.
This fixes the following warnings that can be seen on corgi if
regulators are enabled:
ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517
corgi-audio corgi-audio: ASoC: failed to instantiate card -517
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19e3ae6b4f upstream.
The vcs device's poll/fasync support relies on the vt notifier to signal
changes to the screen content. Notifier invocations were missing for
changes that comes through the selection interface though. Fix that.
Tested with BRLTTY 5.2.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Cc: Dave Mielke <dave@mielke.cc>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 074f9dd55f upstream.
Currently the USB stack assumes that all host controller drivers are
capable of receiving wakeup requests from downstream devices.
However, this isn't true for the isp1760-hcd driver, which means that
it isn't safe to do a runtime suspend of any device attached to a
root-hub port if the device requires wakeup.
This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure
and sets the flag in isp1760-hcd. The core is modified to prevent a
direct child of the root hub from being put into runtime suspend with
wakeup enabled if the flag is set.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 524134d422 upstream.
The USB stack provides a mechanism for drivers to request an
asynchronous device reset (usb_queue_reset_device()). The mechanism
uses a work item (reset_ws) embedded in the usb_interface structure
used by the driver, and the reset is carried out by a work queue
routine.
The asynchronous reset can race with driver unbinding. When this
happens, we try to cancel the queued reset before unbinding the
driver, on the theory that the driver won't care about any resets once
it is unbound.
However, thanks to the fact that lockdep now tracks work queue
accesses, this can provoke a lockdep warning in situations where the
device reset causes another interface's driver to be unbound; see
http://marc.info/?l=linux-usb&m=141893165203776&w=2
for an example. The reason is that the work routine for reset_ws in
one interface calls cancel_queued_work() for the reset_ws in another
interface. Lockdep thinks this might lead to a work routine trying to
cancel itself. The simplest solution is not to cancel queued resets
when unbinding drivers.
This means we now need to acquire a reference to the usb_interface
when queuing a reset_ws work item and to drop the reference when the
work routine finishes. We also need to make sure that the
usb_interface structure doesn't outlive its parent usb_device; this
means acquiring and dropping a reference when the interface is created
and destroyed.
In addition, cancelling a queued reset can fail (if the device is in
the middle of an earlier reset), and this can cause usb_reset_device()
to try to rebind an interface that has been deallocated (see
http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details).
Acquiring the extra references prevents this failure.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Reported-by: Olivier Sobrie <olivier@sobrie.be>
Tested-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5efd2ea8c9 upstream.
the following error pops up during "testusb -a -t 10"
| musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma)
hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of
size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in
hcd_buffer_alloc() returning memory which is 32 bytes aligned and it
might by identified by buffer_offset() as another buffer. This means the
buffer which is on a 32 byte boundary will not get freed, instead it
tries to free another buffer with the error message.
This patch fixes the issue by creating the smallest DMA buffer with the
size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is
smaller). This might be 32, 64 or even 128 bytes. The next three pools
will have the size 128, 512 and 2048.
In case the smallest pool is 128 bytes then we have only three pools
instead of four (and zero the first entry in the array).
The last pool size is always 2048 bytes which is the assumed PAGE_SIZE /
2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where
we would end up with 8KiB buffer in case we have 16KiB pages.
Instead I think it makes sense to have a common size(s) and extend them
if there is need to.
There is a BUILD_BUG_ON() now in case someone has a minalign of more than
128 bytes.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c99197902d upstream.
The usb_hcd_unlink_urb() routine in hcd.c contains two possible
use-after-free errors. The dev_dbg() statement at the end of the
routine dereferences urb and urb->dev even though both structures may
have been deallocated.
This patch fixes the problem by storing urb->dev in a local variable
(avoiding the dereference of urb) and moving the dev_dbg() up before
the usb_put_dev() call.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 663b7ee951 upstream.
We might enter the interrupt handler with hw_ready already set,
but prior we actually started the reset flow.
To soleve this we move the reset release from the interrupt handler
to the HW start wait function which is part of the reset sequence.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6fbb9bdf0f upstream.
-EDEFER error wasn't handle properly by atmel_serial_probe().
As an example, when atmel_serial_probe() is called for the first time, we pass
the test_and_set_bit() test to check whether the port has already been
initalized. Then we call atmel_init_port(), which may return -EDEFER, possibly
returned before by clk_get(). Consequently atmel_serial_probe() used to return
this error code WITHOUT clearing the port bit in the "atmel_ports_in_use" mask.
When atmel_serial_probe() was called for the second time, it used to fail on
the test_and_set_bit() function then returning -EBUSY.
When atmel_serial_probe() fails, this patch make it clear the port bit in the
"atmel_ports_in_use" mask, if needed, before returning the error code.
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37480a0568 upstream.
Commit 26df6d1340 ("tty: Add EXTPROC support for LINEMODE")
allows a process which has opened a pty master to send _any_ signal
to the process group of the pty slave. Although potentially
exploitable by a malicious program running a setuid program on
a pty slave, it's unknown if this exploit currently exists.
Limit to signals actually used.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Howard Chu <hyc@symas.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d1cff2a88 upstream.
We hit use after free on dereferncing pointer to task_smack struct in
smk_of_task() called from smack_task_to_inode().
task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
task_cred_xxx() could be used only for non-pointer members of task's
credentials. It cannot be used for pointer members since what they point
to may disapper after dropping RCU read lock.
Mainly task_security() used this way:
smk_of_task(task_security(p))
Intead of this introduce function smk_of_task_struct() which
takes task_struct as argument and returns pointer to smk_known struct
and do this under RCU read lock.
Bogus task_security() macro is not used anymore, so remove it.
KASan's report for this:
AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
=============================================================================
BUG kmalloc-64 (Tainted: PO): kasan error
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
kmem_cache_alloc_trace+0x88/0x1bc
new_task_smack+0x44/0xd8
smack_cred_prepare+0x48/0x21c
security_prepare_creds+0x44/0x4c
prepare_creds+0xdc/0x110
smack_setprocattr+0x104/0x150
security_setprocattr+0x4c/0x54
proc_pid_attr_write+0x12c/0x194
vfs_write+0x1b0/0x370
SyS_write+0x5c/0x94
ret_fast_syscall+0x0/0x48
INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
kfree+0x270/0x290
smack_cred_free+0xc4/0xd0
security_cred_free+0x34/0x3c
put_cred_rcu+0x58/0xcc
rcu_process_callbacks+0x738/0x998
__do_softirq+0x264/0x4cc
do_softirq+0x94/0xf4
irq_exit+0xbc/0x120
handle_IRQ+0x104/0x134
gic_handle_irq+0x70/0xac
__irq_svc+0x44/0x78
_raw_spin_unlock+0x18/0x48
sync_inodes_sb+0x17c/0x1d8
sync_filesystem+0xac/0xfc
vdfs_file_fsync+0x90/0xc0
vfs_fsync_range+0x74/0x7c
INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
INFO: Object 0xc4635600 @offset=5632 fp=0x (null)
Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone c4635640: bb bb bb bb ....
Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 #1
Backtrace:
[<c00233a4>] (dump_backtrace+0x0/0x158) from [<c0023dec>] (show_stack+0x20/0x24)
r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
[<c0023dcc>] (show_stack+0x0/0x24) from [<c06d6d7c>] (dump_stack+0x20/0x28)
[<c06d6d5c>] (dump_stack+0x0/0x28) from [<c01c1d50>] (print_trailer+0x124/0x144)
[<c01c1c2c>] (print_trailer+0x0/0x144) from [<c01c1e88>] (object_err+0x3c/0x44)
r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
[<c01c1e4c>] (object_err+0x0/0x44) from [<c01cac18>] (kasan_report_error+0x2b8/0x538)
r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
[<c01ca960>] (kasan_report_error+0x0/0x538) from [<c01c9430>] (__asan_load4+0xd4/0xf8)
[<c01c935c>] (__asan_load4+0x0/0xf8) from [<c031e168>] (smack_task_to_inode+0x50/0x70)
r5:c4635600 r4:ca9da000
[<c031e118>] (smack_task_to_inode+0x0/0x70) from [<c031af64>] (security_task_to_inode+0x3c/0x44)
r5:cca25e80 r4:c0ba9780
[<c031af28>] (security_task_to_inode+0x0/0x44) from [<c023d614>] (pid_revalidate+0x124/0x178)
r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
[<c023d4f0>] (pid_revalidate+0x0/0x178) from [<c01db98c>] (lookup_fast+0x35c/0x43y4)
r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
[<c01db630>] (lookup_fast+0x0/0x434) from [<c01deec8>] (do_last.isra.24+0x1c0/0x1108)
[<c01ded08>] (do_last.isra.24+0x0/0x1108) from [<c01dff04>] (path_openat.isra.25+0xf4/0x648)
[<c01dfe10>] (path_openat.isra.25+0x0/0x648) from [<c01e1458>] (do_filp_open+0x3c/0x88)
[<c01e141c>] (do_filp_open+0x0/0x88) from [<c01ccb28>] (do_sys_open+0xf0/0x198)
r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
[<c01cca38>] (do_sys_open+0x0/0x198) from [<c01ccc00>] (SyS_open+0x30/0x34)
[<c01ccbd0>] (SyS_open+0x0/0x34) from [<c001db80>] (ret_fast_syscall+0x0/0x48)
Read of size 4 by thread T834:
Memory state around the buggy address:
c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e0d6714ac upstream.
When an application connects to the ring buffer via splice, it can only
read full pages. Splice does not work with partial pages. If there is
not enough data to fill a page, the splice command will either block
or return -EAGAIN (if set to nonblock).
Code was added where if the page is not full, to just sleep again.
The problem is, it will get woken up again on the next event. That
is, when something is written into the ring buffer, if there is a waiter
it will wake it up. The waiter would then check the buffer, see that
it still does not have enough data to fill a page and go back to sleep.
To make matters worse, when the waiter goes back to sleep, it could
cause another event, which would wake it back up again to see it
doesn't have enough data and sleep again. This produces a tremendous
overhead and fills the ring buffer with noise.
For example, recording sched_switch on an idle system for 10 seconds
produces 25,350,475 events!!!
Create another wait queue for those waiters wanting full pages.
When an event is written, it only wakes up waiters if there's a full
page of data. It does not wake up the waiter if the page is not yet
full.
After this change, recording sched_switch on an idle system for 10
seconds produces only 800 events. Getting rid of 25,349,675 useless
events (99.9969% of events!!), is something to take seriously.
Cc: Rabin Vincent <rabin@rab.in>
Fixes: e30f53aad2 "tracing: Do not busy wait in buffer splice"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04f81f0154 upstream.
Using the IPCB() macro to get the IPv4 options is convenient, but
unfortunately NetLabel often needs to examine the CIPSO option outside
of the scope of the IP layer in the stack. While historically IPCB()
worked above the IP layer, due to the inclusion of the inet_skb_param
struct at the head of the {tcp,udp}_skb_cb structs, recent commit
971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
reordered the tcp_skb_cb struct and invalidated this IPCB() trick.
This patch fixes the problem by creating a new function,
cipso_v4_optptr(), which locates the CIPSO option inside the IP header
without calling IPCB(). Unfortunately, this isn't as fast as a simple
lookup so some additional tweaks were made to limit the use of this
new function.
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6ce194325 upstream.
Hi,
If you can manage to submit an async write as the first async I/O from
the context of a process with realtime scheduling priority, then a
cfq_queue is allocated, but filed into the wrong async_cfqq bucket. It
ends up in the best effort array, but actually has realtime I/O
scheduling priority set in cfqq->ioprio.
The reason is that cfq_get_queue assumes the default scheduling class and
priority when there is no information present (i.e. when the async cfqq
is created):
static struct cfq_queue *
cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic,
struct bio *bio, gfp_t gfp_mask)
{
const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio);
cic->ioprio starts out as 0, which is "invalid". So, class of 0
(IOPRIO_CLASS_NONE) is passed to cfq_async_queue_prio like so:
async_cfqq = cfq_async_queue_prio(cfqd, ioprio_class, ioprio);
static struct cfq_queue **
cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio)
{
switch (ioprio_class) {
case IOPRIO_CLASS_RT:
return &cfqd->async_cfqq[0][ioprio];
case IOPRIO_CLASS_NONE:
ioprio = IOPRIO_NORM;
/* fall through */
case IOPRIO_CLASS_BE:
return &cfqd->async_cfqq[1][ioprio];
case IOPRIO_CLASS_IDLE:
return &cfqd->async_idle_cfqq;
default:
BUG();
}
}
Here, instead of returning a class mapped from the process' scheduling
priority, we get back the bucket associated with IOPRIO_CLASS_BE.
Now, there is no queue allocated there yet, so we create it:
cfqq = cfq_find_alloc_queue(cfqd, is_sync, cic, bio, gfp_mask);
That function ends up doing this:
cfq_init_cfqq(cfqd, cfqq, current->pid, is_sync);
cfq_init_prio_data(cfqq, cic);
cfq_init_cfqq marks the priority as having changed. Then, cfq_init_prio
data does this:
ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio);
switch (ioprio_class) {
default:
printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class);
case IOPRIO_CLASS_NONE:
/*
* no prio set, inherit CPU scheduling settings
*/
cfqq->ioprio = task_nice_ioprio(tsk);
cfqq->ioprio_class = task_nice_ioclass(tsk);
break;
So we basically have two code paths that treat IOPRIO_CLASS_NONE
differently, which results in an RT async cfqq filed into a best effort
bucket.
Attached is a patch which fixes the problem. I'm not sure how to make
it cleaner. Suggestions would be welcome.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Tested-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69abaffec7 upstream.
Cfq_lookup_create_cfqg() allocates struct blkcg_gq using GFP_ATOMIC.
In cfq_find_alloc_queue() possible allocation failure is not handled.
As a result kernel oopses on NULL pointer dereference when
cfq_link_cfqq_cfqg() calls cfqg_get() for NULL pointer.
Bug was introduced in v3.5 in commit cd1604fab4 ("blkcg: factor
out blkio_group creation"). Prior to that commit cfq group lookup
had returned pointer to root group as fallback.
This patch handles this error using existing fallback oom_cfqq.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Fixes: cd1604fab4 ("blkcg: factor out blkio_group creation")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7772855a99 upstream.
With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK
(a.k.a. EAGAIN) errors when submitting commands to the SCSI generic
driver. Fix by calling blk_get_request() with GFP_KERNEL instead of
GFP_ATOMIC.
Note: to avoid introducing a potential deadlock, this patch should be
applied after the patch titled "sg: fix unkillable I/O wait deadlock
with scsi-mq".
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7568615c10 upstream.
When using the write()/read() interface for submitting commands, the
SCSI generic driver does not call blk_put_request() on a completed SCSI
command until userspace calls read() to get the command completion.
Since scsi-mq uses a fixed number of preallocated requests, this makes
it possible for userspace to exhaust the entire preallocated supply of
requests. For places in the kernel that call blk_get_request() with
GFP_KERNEL, this can cause the calling process to deadlock in a
permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get().
For places in the kernel that call blk_get_request() with GFP_ATOMIC,
this can cause blk_get_request() always to return -EWOULDBLOCK. Note
that these problems happen only if scsi-mq is enabled. Prevent the
problems by calling blk_put_request() as soon as the SCSI command
completes instead of waiting for userspace to call read().
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8ba1f9714 upstream.
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.
Reported-by: David Ramos <daramos@stanford.edu>
Fixes: 4aece6a19c ("nfs41: cb_sequence xdr implementation")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb5d04bc39 upstream.
With pgio refactoring in v3.15, .init_read and .init_write can be
called with valid pgio->pg_lseg. file layout was fixed at that time
by commit c6194271f (pnfs: filelayout: support non page aligned
layouts). But the generic helper still needs to be fixed.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb71f8a5e3 upstream.
The tpm_ibmvtpm module is affected by an unaligned access problem.
ibmvtpm_crq_get_version failed with rc=-4 during boot when vTPM is
enabled in Power partition, which supports both little endian and
big endian modes.
We added little endian support to fix this problem:
1) added cpu_to_be64 calls to ensure BE data is sent from an LE OS.
2) added be16_to_cpu and be32_to_cpu calls to make sure data received
is in LE format on a LE OS.
Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Joy Latten <jmlatten@linux.vnet.ibm.com>
[phuewe: manually applied the patch :( ]
Reviewed-by: Ashley Lai <ashley@ahsleylai.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ba3b0b6f2 upstream.
When sending data in tpm_stm_i2c_send, each loop iteration send buf.
Send buf + i instead as the goal of this for loop is to send a number
of byte from buf that fit in burstcnt. Once those byte are sent, we are
supposed to send the next ones.
The driver was working because the burstcount value returns always the maximum size for a TPM
command or response. (0x800 for a command and 0x400 for a response).
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84eb186bc3 upstream.
There was an oops in tpm_ibmvtpm_get_desired_dma, which caused
kernel panic during boot when vTPM is enabled in Power partition
configured in AMS mode.
vio_bus_probe calls vio_cmo_bus_probe which calls
tpm_ibmvtpm_get_desired_dma to get the size needed for DMA allocation.
The problem is, vio_cmo_bus_probe is called before calling probe, which
for vtpm is tpm_ibmvtpm_probe and it's this function that initializes
and sets up vtpm's CRQ and gets required data values. Therefore,
since this has not yet been done, NULL is returned in attempt to get
the size for DMA allocation.
We added a NULL check. In addition, a default buffer size will
be set when NULL is returned.
Signed-off-by: Hon Ching (Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb95cd34ba upstream.
Currently these driver are missing a check on the return value of devm_kzalloc,
which would cause a NULL pointer dereference in a OOM situation.
This patch adds a missing check for tpm_i2c_atmel.c and tpm_i2c_nuvoton.c
Signed-off-by: Kiran Padwal <kiran.padwal@smartplayin.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 398a1e71dc upstream.
Add newly registered TPMs to the tail of the list, not the beginning, so that
things that are specifying TPM_ANY_NUM don't find that the device they're
using has inadvertently changed. Adding a second device would break IMA, for
instance.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 448e9c55c1 upstream.
Some machines, such as the Acer C720 and Toshiba CB35, have TPMs that do
not send IRQs while also having an ACPI TPM entry indicating that they
will be sent. These machines freeze on resume while the tpm_tis module
waits for an IRQ, eventually timing out.
When in interrupt mode, the tpm_tis module should receive an IRQ during
module init. Fall back to polling mode if none is received when expected.
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Tested-by: Michael Mullin <masmullin@gmail.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[phuewe: minor checkpatch fixed]
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9df11828d9 upstream.
The L2 cache properties were completely off with respect to what the
hardware is configured for. Fix the cache-size, cache-line-size and
cache-sets to reflect the L2 cache controller we have: 512KB, 16 ways
and 32 bytes per cache-line.
Fixes: 46d4bca044 ("ARM: BCM63XX: add BCM63138 minimal Device Tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de47699d00 upstream.
Commit 58ecb23f64 ("ARM: tegra: add missing unit addresses to DT") added
unit address and changed reg base for GR3D and DSI host1x modules, but these
addresses belongs to GR2D and TVO modules respectively. Fix it by changing
modules unit and reg base addresses to proper ones.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: 58ecb23f64 (ARM: tegra: add missing unit addresses to DT)
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c7e36bfc3 upstream.
With commit '7dedd34: ARM: OMAP2+: hwmod: Fix a crash in _setup_reset()
with DEBUG_LL' we moved from parsing cmdline to identify uart used
for earlycon to using the requsite hwmod CONFIG_DEBUG_OMAPxUARTy FLAGS.
On DRA7 UART3 hwmod doesn't have this flag enabled, and atleast on
BeagleBoard-X15, where we use UART3 for console, boot fails with
DEBUG_LL enabled. Enable DEBUG_OMAP4UART3_FLAGS for UART3 hwmod.
For using DEBUG_LL, enable CONFIG_DEBUG_OMAP4UART3 in menuconfig.
Fixes: 90020c7b2c ("ARM: OMAP: DRA7: hwmod: Create initial DRA7XX SoC data")
Reviewed-by: Felipe Balbi <balbi@ti.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e461894dc2 upstream.
StrongARM core uses RCSR SMR bit to tell to bootloader that it was reset
by entering the sleep mode. After we have resumed, there is little point
in having that bit enabled. Moreover, if this bit is set before reboot,
the bootloader can become confused. Thus clear the SMR bit on resume
just before clearing the scratchpad (resume address) register.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 564e559f2b upstream.
If the allocation of bt->bs fails, then bt->map can be freed twice, once
in blk_mq_init_bitmap_tags() -> bt_alloc(), and once in
blk_mq_init_bitmap_tags() -> bt_free(). Fix by setting the pointer to
NULL after the first free.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbef8478be upstream.
Migrating hugepages and hwpoisoned hugepages are considered as non-present
hugepages, and they are referenced via migration entries and hwpoison
entries in their page table slots.
This behavior causes race condition because pmd_huge() doesn't tell
non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is
one example where the kernel would call follow_page_pte() for such
hugepage while this function is supposed to handle only normal pages.
To avoid this, this patch makes pmd_huge() return true when pmd_none() is
true *and* pmd_present() is false. We don't have to worry about mixing up
non-present pmd entry with normal pmd (pointing to leaf level pte entry)
because pmd_present() is true in normal pmd.
The same race condition could happen in (x86-specific) gup_pmd_range(),
where this patch simply adds pmd_present() check instead of pmd_huge().
This is because gup_pmd_range() is fast path. If we have non-present
hugepage in this function, we will go into gup_huge_pmd(), then return 0
at flag mask check, and finally fall back to the slow path.
Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fde3538a8a upstream.
Whenever we modify a page table entry, we need to ensure that the HTW
will not fetch a stable entry. And for that to happen we need to ensure
that HTW is stopped before we modify the said entry otherwise the HTW
may already be in the process of reading that entry and fetching the
old information. As a result of which, we replace the htw_reset() calls
with htw_{stop,start} in more appropriate places. This also removes the
remaining users of htw_reset() and as a result we drop that macro
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9116/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69e4e63ec8 upstream.
The current code uses bits 0-6 of the sys_cpupll register to calculate
core clock speed. However this is only valid on Au1300, on all earlier
models the hardware only uses bits 0-5 to generate core clock.
This fixes clock calculation on the MTX1 (Au1500), where bit 6 of cpupll
is set as well, which ultimately lead the code to calculate a bogus cpu
core clock and also uart base clock down the line.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Reported-by: John Crispin <blogic@openwrt.org>
Tested-by: Bruno Randolf <br1@einfach.org>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/9279/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f798217dfd upstream.
The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
kvm_mips_set_c0_status() on a guest exit, presumably in case there is
active state that needs saving if pre-emption occurs. However neither of
these bits are cleared again when returning to the guest.
This effectively gives the guest access to the FPU/DSP hardware after
the first guest exit even though it is not aware of its presence,
allowing FP instructions in guest user code to intermittently actually
execute instead of trapping into the guest OS for emulation. It will
then read & manipulate the hardware FP registers which technically
belong to the user process (e.g. QEMU), or are stale from another user
process. It can also crash the guest OS by causing an FP exception, for
which a guest exception handler won't have been registered.
First lets save and disable the FPU (and MSA) state with lose_fpu(1)
before entering the guest. This simplifies the problem, especially for
when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
state being live when the FR bit gets cleared for the guest, which
according to the architecture causes the contents of the FPU and vector
registers to become UNPREDICTABLE.
We can then safely remove the enabling of the FPU in
kvm_mips_set_c0_status(), since there should never be any active FPU or
MSA state to save at pre-emption, which should plug the FPU leak.
DSP state is always live rather than being lazily restored, so for that
it is simpler to just clear the MX bit again when re-entering the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4c6f2cad9 upstream.
Ensure any hardware page table walker (HTW) is disabled while in KVM
guest mode, as KVM doesn't yet set up hardware page table walking for
guest mappings so the wrong mappings would get loaded, resulting in the
guest hanging or crashing once it reaches userland.
The HTW is disabled and re-enabled around the call to
__kvm_mips_vcpu_run() which does the initial switch into guest mode and
the final switch out of guest context. Additionally it is enabled for
the duration of guest exits (i.e. kvm_mips_handle_exit()), getting
disabled again before returning back to guest or host.
In all cases the HTW is only disabled in normal kernel mode while
interrupts are disabled, so that the HTW doesn't get left disabled if
the process is preempted.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f4086a3d78 upstream.
Commit 411a99adff (nfs: clear_request_commit while holding i_lock)
assumes that the nfs_commit_info always points to the inode->i_lock.
For historical reasons, that is not the case for O_DIRECT writes.
Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: 411a99adff ("nfs: clear_request_commit while holding i_lock")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ffa30d3f7 upstream.
Bruce reported seeing this warning pop when mounting using v4.1:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
Call Trace:
[<ffffffff8186ac78>] dump_stack+0x4c/0x65
[<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
[<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
[<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
[<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
[<ffffffff810d941e>] groups_alloc+0x3e/0x130
[<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
[<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
[<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
[<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
[<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
[<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
[<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
[<ffffffff810d45bf>] kthread+0x11f/0x140
[<ffffffff810ea815>] ? local_clock+0x15/0x30
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
[<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
---[ end trace 675220a11e30f4f2 ]---
nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.
Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05fbf357d9 upstream.
Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():
CPU A (pagemap) CPU B (migration)
lock_page()
try_to_unmap(page, TTU_MIGRATION...)
make_migration_entry()
set_pte_at()
<read *pte>
pte_to_pagemap_entry()
remove_migration_ptes()
unlock_page()
if(is_migration_entry())
migration_entry_to_page()
BUG_ON(!PageLocked(page))
Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.
Fixes: 052fb0d635 ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a39128bcd6 upstream.
According to erratum 'ERR-7878951' Armada 38x SDHCI controller has
different capabilities than the ones shown in its registers:
- it doesn't support the voltage switching: it can work either with
3.3V or 1.8V supply
- it doesn't support the SDR104 mode
- SDR50 mode doesn't need tuning
The SDHCI_QUIRK_MISSING_CAPS quirk is used for updating the
capabilities accordingly.
[gregory.clement@free-electrons.com: port from 3.10]
Fixes: 5491ce3f79 ("mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4b803c559 upstream.
According to erratum 'FE-2946959' both SDR50 and DDR50 modes require
specific clock adjustments in SDIO3 Configuration register. However,
this register was not part of the device tree binding. Even if the
binding can (and will) be extended we still need handling the case
where this register was not available. In this case we use the
SDHCI_QUIRK_MISSING_CAPS quirk remove them from the capabilities.
This commit is based on the work done by Marcin Wojtas<mw@semihalf.com>
Fixes: 5491ce3f79 ("mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller")
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14460dbaf7 upstream.
Current code checks "clk_delay_cycles > 0" to know whether the optional
"mrvl,clk_delay_cycles" is set or not. But of_property_read_u32() doesn't
touch clk_delay_cycles if the property is not set. And type of
clk_delay_cycles is u32, so we may always set pdata->clk_delay_cycles as a
random value.
This patch fix this problem by check the return value of of_property_read_u32()
to know whether the optional clk-delay-cycles is set or not.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62cf983ad8 upstream.
Commit 0dcaa2499b ("sdhci-pxav3: Fix runtime PM initialization") tries
to fix one hang issue caused by calling sdhci_add_host() on a suspended
device. The fix enables the clock twice, once by clk_prepare_enable() and
another by pm_runtime_get_sync(), meaning that the clock will never be
gated at runtime PM suspend. I observed the power consumption regression on
Marvell BG2Q SoCs.
In fact, the fix is not correct. There still be a very small window
during which a runtime suspend might somehow occur after pm_runtime_enable()
but before pm_runtime_get_sync().
This patch fixes all of the two problems by just incrementing the usage
counter before pm_runtime_enable(). It also adjust the order of disabling
runtime pm and storing the usage count in the error path to handle clock
gating properly.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0418ca6073 upstream.
The lockdep splat addressed in a previous commit revealed that at
least one message in em28xx-input.c was missing a new line:
em28178 #0: Closing input extensionINFO: trying to register non-static key.
Further inspection shows several other messages also miss a new line.
These will be fixed in a subsequent patch.
Fixes: aa929ad783 ("[media] em28xx: print a message at disconnect")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 244829226f upstream.
The timberdale media driver requires the use of the respective
dma engine driver, but that may not be enabled, causing a
Kconfig warning:
warning: (VIDEO_TIMBERDALE) selects TIMB_DMA which has unmet direct dependencies (DMADEVICES && MFD_TIMBERDALE)
This fixes the dependency by removing the inappropriate 'select'
statement and replacing it with a direct dependency on the
drivers that provide the services this needs.
Fixes: 7155043c2d ("[media] enable COMPILE_TEST for media drivers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 983c5bd26b upstream.
Since commit da6e162d6a ("[media] rc-core: simplify sysfs code"), when
the IR protocol is set using the sysfs interface to the same set of
protocols that are already set, store_protocols() does not refresh the
scancode filter with the new protocol, even if it has already called the
change_protocol() callback successfully. This results in the filter
being disabled in the hardware and not re-enabled until the filter is
set again using sysfs.
Fix in store_protocols() by still re-applying the filter whenever the
change_protocol() driver callback succeeded.
The problem can be reproduced with the img-ir driver by setting a
filter, and then setting the protocol to the same protocol that is
already set:
$ echo nec > protocols
$ echo 0xffff > filter_mask
$ echo nec > protocols
After this, messages which don't match the filter were still being
received.
Fixes: da6e162d6a ("[media] rc-core: simplify sysfs code")
Reported-by: Sifan Naeem <sifan.naeem@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2ced1719a upstream.
Update driver "mask_interrupts" before enable/disable hardware interrupt
in order to avoid missing interrupts because of "mask_interrupts" still
set to 1 and hardware interrupts are enabled.
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Chaitra Basappa <chaitra.basappa@avagotech.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab2f0608e1 upstream.
This patch will address the issue of SCSI device created at OS level for
non existing VD. ldTgtIdtoLd[] array has size 256 for Extended VD firmware
and 128 for legacy firmware. Accessing indices beyond array size (OS will
send TUR, INQUIRY.. commands upto device index 255), may return valid LD
value and that particular SCSI command will be SUCCESS and creating SCSI
device for non existing target(VD).
For legacy firmware (64 VD firmware), invalidates LD (by setting LD value
to 0xff) in LdTgtIdtoLd[] array for device index beyond 127, so that
invalid LD(0xff) value should be returned beyond device index beyond 127.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 200aed582d upstream.
This patch addresses below issues:
1) Few endianness bug fixes.
2) Break the iteration after (MAX_LOGICAL_DRIVES_EXT - 1)),
instead of MAX_LOGICAL_DRIVES_EXT.
3) Optimization in MFI INIT frame before firing.
4) MFI IO frame should be 256bytes aligned. Code is optimized to reduce
the size of frame for fusion adapters and make the MFI frame size
calculation a bit transparent and readable.
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Chaitra Basappa <chaitra.basappa@avagotech.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit faeed51bb6 upstream.
enable_irq_wakeup returns 0 in case it correctly enabled the IRQ to
generate the wakeup event (and thus resume should call disable_irq_wake).
Currently gpio-charger driver has this logic inverted. Correct that thus
correcting enable/disable_irq_wake() calls balance.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 478913fdbd upstream.
The driver mismatched 'num_supplicants' with 'num_supplies' of
power_supply structure.
It provided list of supplicants (power_supply.supplied_to) but did
not set the number of supplicants. Instead it set the num_supplies which
is used when iterating over number of supplies (power_supply.supplied_from).
As a result the list of supplicants was ignored by core because its size
was 0.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: d7bf353fd0 ("bq24190_charger: Add support for TI BQ24190 Battery Charger")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24727b45b4 upstream.
Driver forgot to unregister power supply if request_threaded_irq()
failed in probe(). In such case the memory associated with power supply
leaked.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: a830d28b48 ("power_supply: Enable battery-charger for 88pm860x")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0153c3d94 upstream.
RME RayDAT and AIO use a fixed buffer size of 16384 samples. With period
sizes of 32-4096, this translates to 4-512 periods.
The older RME cards have a variable buffer size but require exactly two
periods.
This patch enforces nperiods=2 on those cards.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4940626de upstream.
The problem here is that we check:
if (dev >= SNDRV_CARDS)
Then we increment "dev".
if (!joystick_port[dev++])
Then we use it as an offset into a array with SNDRV_CARDS elements.
if (!request_region(joystick_port[dev], 8, "Riptide gameport")) {
This has 3 effects:
1) If you use the module option to specify the joystick port then it has
to be shifted one space over.
2) The wrong error message will be printed on failure if you have over
32 cards.
3) Static checkers will correctly complain that are off by one.
Fixes: db1005ec6f ('ALSA: riptide - Fix joystick resource handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1ecc5d119 upstream.
w_scan complains about missing symbol rate limits:
This dvb driver is *buggy*: the symbol rate limits are undefined - please report to linuxtv.org
Chip supports 1 to 7.2 MSymbol/s on DVB-C.
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15e1ce3318 upstream.
A quirk of some older firmwares that report endpoint pipe type as PIPE_BULK
but the endpoint otheriwse functions as interrupt.
Check if usb_endpoint_type is USB_ENDPOINT_XFER_BULK and set as usb_rcvbulkpipe.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cfcd7b8258 upstream.
Ocassionally the device fails to report back an interrupt urb status which
results in false no lock trigger on the RS2000 demodulator.
Increase time from 60 msecs to 200 msecs.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3095794ae9 upstream.
On some Braswell systems BIOS leaves resets for SPI host controllers
active. This prevents the SPI driver from transferring messages on wire.
Fix this in similar way that we do for I2C already by deasserting resets
for the SPI host controllers.
Reported-by: Yang A Fang <yang.a.fang@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3293c7b8ec upstream.
On Baytrail and Braswell the BIOS might leave the I2C host controllers
enabled, probably because it uses them for its own purposes. This is fine
in normal cases because the I2C driver will disable the hardware when it
is probed anyway.
However, in case of suspend to disk it is different story. If the driver
happens to be compiled as a module the boot kernel never loads the driver
thus leaving host controllers enabled upon loading the hibernation image.
The I2C host controller interrupt mask register has default value of 0x8ff,
in other words it has most of the interrupts unmasked. When combined with
the fact that the host controller is enabled, the driver immediately starts
getting interrupts even before its resume hook is called (once IO-APIC is
resumed). Since the driver is not prepared for this it will crash the
kernel due to NULL pointer derefence because dev->msgs is NULL.
Unfortunately we were not able to get full backtrace to from the console
which could be reproduced here.
In order to fix this even when the driver is compiled as module, we disable
the I2C host controllers in byt_i2c_setup() before devices are created.
Reported-by: Yu Chen <yu.c.chen@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit facb5732b0 upstream.
A request in the ring buffer mustn't be read after it has been marked
as consumed. Otherwise it might already have been reused by the
frontend without violating the ring protocol.
To avoid inconsistencies in the backend only work on a private copy
of the request. This will ensure a malicious guest not being able to
bypass consistency checks of the backend by modifying an active
request.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67fadaa276 upstream.
Commit 32726d2d55 ("ARM: SAMSUNG: Remove legacy clock code")
already removed the callback pointer, but there was one remaining
user:
drivers/cpufreq/s3c24xx-cpufreq.c: In function 's3c_cpufreq_resume_clocks':
drivers/cpufreq/s3c24xx-cpufreq.c:149:14: error: 'struct s3c_cpufreq_info' has no member named 'resume_clocks'
cpu_cur.info->resume_clocks();
^
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 32726d2d55 ("ARM: SAMSUNG: Remove legacy clock code")
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61882b6317 upstream.
The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register
are marked init but are called from a context that might be run after
the __init sections are discarded, as the compiler points out:
WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init()
WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register()
This removes the __init markings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4d4eda237 upstream.
On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
speedstep-smi driver sometimes loads and sometimes doesn't load with
"change to state X failed" message.
The hardware sometimes refuses to change frequency and in this case, we
need to retry later. I found out that we need to enable interrupts while
waiting. When we enable interrupts, the hardware blockage that prevents
frequency transition resolves and the transition is possible. With
disabled interrupts, the blockage doesn't resolve (no matter how long do
we wait). The exact reasons for this hardware behavior are unknown.
This patch enables interrupts in the function speedstep_set_state that can
be called with disabled interrupts. However, this function is called with
disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
any problem.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ffae8c06f upstream.
In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs
to be cleared before calling kobject_put(&policy->kobj) and under
cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get()
in parallel with it, they can obtain a non-NULL policy from that after
kobject_put(&policy->kobj) was executed.
Consider this case:
Thread A Thread B
cpufreq_cpu_get()
acquire cpufreq_driver_lock
read-per-cpu cpufreq_cpu_data
kobject_put(&policy->kobj);
kobject_get(&policy->kobj);
...
per_cpu(&cpufreq_cpu_data, cpu) = NULL
And this will result in a warning like this one:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
Call Trace:
[<ffffffff81661b14>] dump_stack+0x46/0x58
[<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff812e16d1>] kobject_get+0x41/0x50
[<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[<ffffffff810b8cb2>] ? up+0x32/0x50
[<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[<ffffffff81089566>] ? move_linked_works+0x66/0x90
[<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[<ffffffff8108c910>] process_one_work+0x160/0x410
[<ffffffff8108d05b>] worker_thread+0x11b/0x520
[<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[<ffffffff81092421>] kthread+0xe1/0x100
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
---[ end trace 89e66eb9795efdf7 ]---
The actual code flow is as follows:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_finish()
cpufreq_policy_put_kobj()
kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data
under cpufreq_driver_lock, and once it gets a valid policy it expects it
to not be freed until cpufreq_cpu_put() is called.
But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aeb2d2a4c0 upstream.
In commit e9538cf4f9 ("rtlwifi: Fix error when accessing unmapped memory
in skb"), a printk was included to indicate that the condition had been
reached. There is now enough evidence from other users that the fix is
working. That logging statement can now be removed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d4beca377 upstream.
This driver utilizes a FIFO buffer for RX descriptors. There are four places
in the code where it calculates the number of free slots. Several of those
locations do the calculation incorrectly. To fix these and to prevent future
mistakes, a common inline routine is created.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92ff754240 upstream.
The firmware supplies two kinds of packets via the RX mechanism. Besides the
normal data received over the air, these packets may contain bluetooth status
and other information. The present code fails to detect which kind of
information was received.
Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e5f443616 upstream.
Initially, the routine to update the write point in the FIFO buffer was
coded to save CPU time by not doing the calculation every interrupt. This
was an error and results in TX hangs.
Signed-off-by: Troy Tan <troy_tan@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9a7ba3269 upstream.
An earlier bug fix of mine made the SND_DM365_VOICE_CODEC symbol
tristate to avoid creating an undefined reference from the
davinci-vcif.c driver to the davinci_soc_platform_register
function that may be in a module.
However, this may now lead to a different error on randconfig
kernels:
"warning: SND_DM365_VOICE_CODEC creates inconsistent choice state"
This happens because we now have a choice statement with
one bool and one tristate option, and the latter might not
support being set to 'y' because of dependencies.
This new change turns the other option into 'tristate' as well,
which avoids the problem.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 19926c6de0 ("ASoC: davinci: vcif must be a module if SND_DAVINCI_SOC is")
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7331ea474e upstream.
Commit f6b2a04590 ("ASoC: pxa: mioa701_wm9713: Convert to table based DAPM
setup") converted the driver to register the board level DAPM elements with
the card's DAPM context rather than the CODEC's DAPM context. The change
overlooked that the speaker widget event callback accesses the widget's
codec field which is only valid if the widget has been registered in a CODEC
DAPM context. This patch modifies the callback to take an alternative route
to get the CODEC.
Fixes: f6b2a04590 ("ASoC: pxa: mioa701_wm9713: Convert to table based DAPM
setup")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 145b3fe579 upstream.
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
Commit 89ec3dcf17 ("PCI: Generate uppercase hex for modalias interface
class") fixed only half of the problem. Some udev implementations rely on
the uevent file and not the modalias file.
Fixes: d1ded203ad ("PCI: add MODALIAS to hotplug event for pci devices")
Fixes: 89ec3dcf17 ("PCI: Generate uppercase hex for modalias interface class")
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d00f37e49 upstream.
d1c7e29e8d (HID: i2c-hid: prevent buffer overflow in early IRQ)
changed hid_get_input() to read ihid->bufsize bytes, which can be
more than wMaxInputLength. This is the case with the Dell XPS 13
9343, and it is causing events to be missed. In some cases the
missed events are releases, which can cause the cursor to jump or
freeze, among other problems. Limit the number of bytes read to
min(wMaxInputLength, ihid->bufsize) to prevent such problems.
Fixes: d1c7e29e8d "HID: i2c-hid: prevent buffer overflow in early IRQ"
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5523d11cc4 upstream.
We don't really need to use different mac colors when adding mac
contexts, because they're not used anywhere. In fact, the firmware
doesn't accept 255 as a valid color, so we get into a SYSASSERT 0x3401
when we reach that.
Remove the color increment to use always zero and avoid reaching 255.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cee4762c5 upstream.
These are coming from the FW and are used to access arrays.
Bad values can cause an out of bounds access so discard
such ba_notifs and warn.
Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd8f438405 upstream.
The base address of the scheduler in the device's memory
(SRAM) comes from two different sources. The periphery
register and the alive notification from the firmware.
We have a check in iwl_pcie_tx_start that ensures that
they are the same.
When we resume from WoWLAN, the firmware may have crashed
for whatever reason. In that case, the whole device may be
reset which means that the periphery register will hold a
meaningless value. When we come to compare
trans_pcie->scd_base_addr (which really holds the value we
had when we loaded the WoWLAN firmware upon suspend) and
the current value of the register, we don't see a match
unsurprisingly.
Trick the check to avoid a loud yet harmless WARN.
Note that when the WoWLAN has crashed, we will see that
in iwl_trans_pcie_d3_resume which will let the op_mode
know. Once the op_mode is informed that the WowLAN firmware
has crashed, it can't do much besides resetting the whole
device.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ee8e25fc3 upstream.
Commit e9fd702a58 ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit. Audit code
wants to update inode number of an inode corresponding to watched name
in a directory. When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened. That confuses audit and it starts watching parent directory
instead of a file in a directory.
This can be observed for example by doing:
cd /tmp
touch foo bar
auditctl -w /tmp/foo
touch foo
mv bar foo
touch foo
In audit log we see events like:
type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
...
type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
...
and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch. However we start seeing events for unrelated stuff
happening in /tmp.
Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:
fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
kernel/audit_tree.c doesn't care about passed 'data' at all.
kernel/audit_watch.c expects moved inode as 'data'.
Fixes: e9fd702a58 ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3443a3bca5 upstream.
When the superblock is modified in a transaction, the commonly
modified fields are not actually copied to the superblock buffer to
avoid the buffer lock becoming a serialisation point. However, there
are some other operations that modify the superblock fields within
the transaction that don't directly log to the superblock but rely
on the changes to be applied during the transaction commit (to
minimise the buffer lock hold time).
When we do this, we fail to mark the buffer log item as being a
superblock buffer and that can lead to the buffer not being marked
with the corect type in the log and hence causing recovery issues.
Fix it by setting the type correctly, similar to xfs_mod_sb()...
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe22d552b8 upstream.
Conversion from local to extent format does not set the buffer type
correctly on the new extent buffer when a symlink data is moved out
of line.
Fix the symlink code and leave a comment in the generic bmap code
reminding us that the format-specific data copy needs to set the
destination buffer type appropriately.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f19b872b08 upstream.
This leads to log recovery throwing errors like:
XFS (md0): Mounting V5 Filesystem
XFS (md0): Starting recovery (logdev: internal)
XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
ffff8800ffc53800: 58 41 47 49 .....
Which is the AGI buffer magic number.
Ensure that we set the type appropriately in both unlink list
addition and removal.
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d612fb570 upstream.
Jan Kara reported that log recovery was finding buffers with invalid
types in them. This should not happen, and indicates a bug in the
logging of buffers. To catch this, add asserts to the buffer
formatting code to ensure that the buffer type is in range when the
transaction is committed.
We don't set a type on buffers being marked stale - they are not
going to get replayed, the format item exists only for recovery to
be able to prevent replay of the buffer, so the type does not
matter. Hence that needs special casing here.
Reported-by: Jan Kara <jack@suse.cz>
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19acc77a36 upstream.
There was a bad typo in commit 43759d4f42 ("random: use an improved
fast_mix() function") and I didn't notice because it "looked right", so
I saw what I expected to see when I reviewed it.
Only months later did I look and notice it's not the Threefish-inspired
mix function that I had designed and optimized.
Mea Culpa. Each input bit still has a chance to affect each output bit,
and the fast pool is spilled *long* before it fills, so it's not a total
disaster, but it's definitely not the intended great improvement.
I'm still working on finding better rotation constants. These are good
enough, but since it's unrolled twice, it's possible to get better
mixing for free by using eight different constants rather than repeating
the same four.
Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e12af489b9 upstream.
According to the Bluetooth core specification valid identity addresses
are either Public Device Addresses or Static Random Addresses. IRKs
received with any other type of address should be discarded since we
cannot assume to know the permanent identity of the peer device.
This patch fixes a missing check for the Identity Address when receiving
the Identity Address Information SMP PDU.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c561a5753d upstream.
BugLink: https://bugs.launchpad.net/bugs/1400215
ath3k devices fail to load firmwares on xHCI buses, but work well on
EHCI, this might be a compatibility issue between xHCI and ath3k chips.
As my testing result, those chips will work on xHCI buses again with
this patch.
This workaround is from Qualcomm, they also did some workarounds in
Windows driver.
Signed-off-by: Adam Lee <adam.lee@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8f29e89f2 upstream.
Userspace expects to see a long space before the first pulse is sent on
the lirc device. Currently, if a long time has passed and a new packet
is started, the lirc codec just returns and doesn't send anything. This
makes lircd ignore many perfectly valid signals unless they are sent in
quick sucession. When a reset event is delivered, we cannot know
anything about the duration of the space. But it should be safe to
assume it has been a long time and we just set the duration to maximum.
Signed-off-by: Austin Lund <austin.lund@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d5b86e048 upstream.
As of v3.18, ext4 started rejecting a remount which changes the
journal_checksum option.
Prior to that, it was simply ignored; the problem here is that
if someone has this in their fstab for the root fs, now the box
fails to boot properly, because remount of root with the new options
will fail, and the box proceeds with a readonly root.
I think it is a little nicer behavior to accept the option, but
warn that it's being ignored, rather than failing the mount,
but that might be a subjective matter...
Reported-by: Cónräd <conradsand.arma@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d953ca4ddf ]
The existing code frees the skb in EAGAIN case, in which the skb will be
retried from upper layer and used again.
Also, the existing code doesn't free send buffer slot in error case, because
there is no completion message for unsent packets.
This patch fixes these problems.
(Please also include this patch for stable trees. Thanks!)
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cfbf654efc ]
When making use of RFC5061, section 4.2.4. for setting the primary IP
address, we're passing a wrong parameter header to param_type2af(),
resulting always in NULL being returned.
At this point, param.p points to a sctp_addip_param struct, containing
a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
the actual sctp_addr_param, which also contains a sctp_paramhdr, but
this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
param_type2af() can make use of. Since we already hold a pointer to
addr_param from previous line, just reuse it for param_type2af().
Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e2a4800e75 ]
When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.
When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.
This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.
Reported-by: Iain Douglas <centos@1n6.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bdbbb8527b ]
In commit be9f4a44e7 ("ipv4: tcp: remove per net tcp_sock")
I tried to address contention on a socket lock, but the solution
I chose was horrible :
commit 3a7c384ffd ("ipv4: tcp: unicast_sock should not land outside
of TCP stack") addressed a selinux regression.
commit 0980e56e50 ("ipv4: tcp: set unicast_sock uc_ttl to -1")
took care of another regression.
commit b5ec8eeac4 ("ipv4: fix ip_send_skb()") fixed another regression.
commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
was another shot in the dark.
Really, just use a proper socket per cpu, and remove the skb_orphan()
call, to re-enable flow control.
This solves a serious problem with FQ packet scheduler when used in
hostile environments, as we do not want to allocate a flow structure
for every RST packet sent in response to a spoofed packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 811230cd85 ]
When I added sk_pacing_rate field, I forgot to initialize its value
in the per cpu unicast_sock used in ip_send_unicast_reply()
This means that for sch_fq users, RST packets, or ACK packets sent
on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
once we reach the per flow limit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 95bd09eb27 ("tcp: TSO packets automatic sizing")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59ccaaaa49 ]
Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081
This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
handler does not return any bytes in the skb.
Alternately, the skb->len check can be moved inside rtnl_notify.
For the bridge vlan case described in 92081, there is also a fix needed
in bridge driver to generate a proper notification. Will fix that in
subsequent patch.
v2: rebase patch on net tree
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 24e579c889 ]
With the commit d75b1ade56 ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget. When in busy_poll is we return 0
in napi_poll. We should return budget.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e9e16e614 ]
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.
To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
| kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.
Fixes: 4a287eba2d ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fc752f1f43 ]
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().
-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish
Fix this incorrect free by cloning this skb and processing this cloned
skb instead.
This patch was suggested by Eric Dumazet
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 86f3cddbc3 ]
While working on rhashtable walking I noticed that the UDP diag
dumping code is buggy. In particular, the socket skipping within
a chain never happens, even though we record the number of sockets
that should be skipped.
As this code was supposedly copied from TCP, this patch does what
TCP does and resets num before we walk a chain.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df4d92549f ]
Not caching dst_entries which cause redirects could be exploited by hosts
on the same subnet, causing a severe DoS attack. This effect aggravated
since commit f886497212 ("ipv4: fix dst race in sk_dst_get()").
Lookups causing redirects will be allocated with DST_NOCACHE set which
will force dst_release to free them via RCU. Unfortunately waiting for
RCU grace period just takes too long, we can end up with >1M dst_entries
waiting to be released and the system will run OOM. rcuos threads cannot
catch up under high softirq load.
Attaching the flag to emit a redirect later on to the specific skb allows
us to cache those dst_entries thus reducing the pressure on allocation
and deallocation.
This issue was discovered by Marcelo Leitner.
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 600ddd6825 ]
When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c6 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c6 ...
[ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[ 533.940559] PGD 5030f2067 PUD 0
[ 533.957104] Oops: 0000 [#1] SMP
[ 533.974283] Modules linked in: sctp mlx4_en [...]
[ 534.939704] Call Trace:
[ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
... or depending on the the application, for example this one:
[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
With slab debugging enabled, we can see that the poison has been overwritten:
[ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten
[ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[ 669.826424] __slab_alloc+0x4bf/0x566
[ 669.826433] __kmalloc+0x280/0x310
[ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp]
[ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[ 669.826635] __slab_free+0x39/0x2a8
[ 669.826643] kfree+0x1d6/0x230
[ 669.826650] kzfree+0x31/0x40
[ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp]
[ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp]
[ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp]
Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).
Reference counting of auth keys revisited:
Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.
User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().
sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.
Fixes: 730fc3d05c ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6088beef3f ]
NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.
If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.
Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade56 ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <manish.chopra@qlogic.com>
Acked-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ac64da0b83 ]
softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.
A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.
Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.
This version is better because we do not slow down packet processing,
only make migration safer.
Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f812116b17 ]
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as
struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;
The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.
An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fb08eca45 upstream.
This replaces four copies in various stages of mm_fault_error() handling
with just a single one. It will also allow for more natural placement
of the unlocking after some further cleanup.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c8465a82a upstream.
When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.
Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:
...
Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting
===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting
...
By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Fixes: d17d8f9ded "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a05d59a567 upstream.
The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
When a CPU is offline, RCU is no longer watching that CPU and since the
tracepoint is protected by RCU, it must not be called. To prevent the
tlb_flush tracepoint from being called when the CPU is offline, it was
converted to a TRACE_EVENT_CONDITION where the condition checks if the
CPU is online before calling the tracepoint.
Unfortunately, this was not enough to stop lockdep from complaining about
it. Even though the RCU protected code of the tracepoint will never be
called, the condition is hidden within the tracepoint, and even though the
condition prevents RCU code from being called, the lockdep checks are
outside the tracepoint (this is to test tracepoints even when they are not
enabled).
Even though tracepoints should be checked to be RCU safe when they are not
enabled, the condition should still be considered when checking RCU.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Fixes: 3a630178fd "tracing: generate RCU warnings even when tracepoints are disabled"
Acked-by: Dave Hansen <dave@sr71.net>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d926c15d6 upstream.
I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.
This results in CLOCK_TAI timers expiring early on non-highres
capable machines.
This patch fixes the issue, calculating the tai time properly
from the monotonic base.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bee96860a upstream.
The following race exists in the smpboot percpu threads management:
CPU0 CPU1
cpu_up(2)
get_online_cpus();
smpboot_create_threads(2);
smpboot_register_percpu_thread();
for_each_online_cpu();
__smpboot_create_thread();
__cpu_up(2);
This results in a missing per cpu thread for the newly onlined cpu2 and
in a NULL pointer dereference on a consecutive offline of that cpu.
Proctect smpboot_register_percpu_thread() with get_online_cpus() to
prevent that.
[ tglx: Massaged changelog and removed the change in
smpboot_unregister_percpu_thread() because that's an
optimization and therefor not stable material. ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da63865a01 upstream.
Commits 65cef1311d ("x86, microcode: Add a disable chicken bit") and
a18a0f6850 ("x86, microcode: Don't initialize microcode code on
paravirt") allow microcode driver skip initialization when microcode
loading is not permitted.
However, they don't prevent the driver from being loaded since the
init code returns 0. If at some point later the driver gets unloaded
this will result in an oops while trying to deregister the (never
registered) device.
To avoid this, make init code return an error on paravirt or when
microcode loading is disabled. The driver will then never be loaded.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1422411669-25147-1-git-send-email-boris.ostrovsky@oracle.com
Reported-by: James Digwall <james@dingwall.me.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fddcd30073 upstream.
I2S1, I2S2 on Exynos4 SoC series have limited functionality compared
to I2S0, "samsung,s3c6410-i2s" compatible should be used for them.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4161b4505f upstream.
When ak4114 work calls its callback and the callback invokes
ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding
this, control the reentrance by introducing a refcount. Also
flush_delayed_work() is replaced with cancel_delayed_work_sync().
The exactly same bug is present in ak4113.c and fixed as well.
Reported-by: Pavel Hofman <pavel.hofman@ivitera.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Tested-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58cc9c9a17 upstream.
To quote from section 1.3.1 of the data sheet:
The SGTL5000 has an internal reset that is deasserted
8 SYS_MCLK cycles after all power rails have been brought
up. After this time, communication can start
...
1.0us represents 8 SYS_MCLK cycles at the minimum 8.0 MHz SYS_MCLK.
Signed-off-by: Eric Nelson <eric.nelson@boundarydevices.com>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a43bd7e125 upstream.
According to the I2S specification information as following:
- WS = 0, channel 1 (left)
- WS = 1, channel 2 (right)
So, the start event should be TF/RF falling edge.
Reported-by: Songjun Wu <songjun.wu@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44b82b7700 upstream.
Commit d7a49086f2 (arm64: cpuinfo: print info for all CPUs)
attempted to clean up /proc/cpuinfo, but due to concerns regarding
further changes was reverted in commit 5e39977edf (Revert "arm64:
cpuinfo: print info for all CPUs").
There are two major issues with the arm64 /proc/cpuinfo format
currently:
* The "Features" line describes (only) the 64-bit hwcaps, which is
problematic for some 32-bit applications which attempt to parse it. As
the same names are used for analogous ISA features (e.g. aes) despite
these generally being architecturally unrelated, it is not possible to
simply append the 64-bit and 32-bit hwcaps in a manner that might not
be misleading to some applications.
Various potential solutions have appeared in vendor kernels. Typically
the format of the Features line varies depending on whether the task
is 32-bit.
* Information is only printed regarding a single CPU. This does not
match the ARM format, and does not provide sufficient information in
big.LITTLE systems where CPUs are heterogeneous. The CPU information
printed is queried from the current CPU's registers, which is racy
w.r.t. cross-cpu migration.
This patch attempts to solve these issues. The following changes are
made:
* When a task with a LINUX32 personality attempts to read /proc/cpuinfo,
the "Features" line contains the decoded 32-bit hwcaps, as with the
arm port. Otherwise, the decoded 64-bit hwcaps are shown. This aligns
with the behaviour of COMPAT_UTS_MACHINE and COMPAT_ELF_PLATFORM. In
the absense of compat support, the Features line is empty.
The set of hwcaps injected into a task's auxval are unaffected.
* Properties are printed per-cpu, as with the ARM port. The per-cpu
information is queried from pre-recorded cpu information (as used by
the sanity checks).
* As with the previous attempt at fixing up /proc/cpuinfo, the hardware
field is removed. The only users so far are 32-bit applications tied
to particular boards, so no portable applications should be affected,
and this should prevent future tying to particular boards.
The following differences remain:
* No model_name is printed, as this cannot be queried from the hardware
and cannot be provided in a stable fashion. Use of the CPU
{implementor,variant,part,revision} fields is sufficient to identify a
CPU and is portable across arm and arm64.
* The following system-wide properties are not provided, as they are not
possible to provide generally. Programs relying on these are already
tied to particular (32-bit only) boards:
- Hardware
- Revision
- Serial
No software has yet been identified for which these remaining
differences are problematic.
Cc: Greg Hackmann <ghackmann@google.com>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: cross-distro@lists.linaro.org
Cc: linux-api@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d56030609 upstream.
Warning:
In file included from scripts/kconfig/zconf.tab.c:2537:0:
scripts/kconfig/menu.c: In function ‘get_symbol_str’:
scripts/kconfig/menu.c:590:18: warning: ‘jump’ may be used uninitialized in this function [-Wmaybe-uninitialized]
jump->offset = strlen(r->s);
Simplifies the test logic because (head && local) means (jump != 0)
and makes GCC happy when checking if the jump pointer was initialized.
Signed-off-by: Peter Kümmel <syntheticpp@gmx.net>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 544143f9e0 upstream.
If acceleration is disabled, it does not make sense
to init gpuvm since nothing will use it. Moreover,
if radeon_vm_init() gets called it uses accel to try
and clear the pde tables, etc. which results in a bug.
v2: handle vm_fini as well
v3: handle bo_open/close as well
Bug:
https://bugs.freedesktop.org/show_bug.cgi?id=88786
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ef3ff2fea upstream.
Nilfs2 eventually hangs in a stress test with fsstress program. This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():
nilfs_segctor_thread()
nilfs_segctor_thread_construct()
nilfs_segctor_unlock()
nilfs_dispose_list()
iput()
iput_final()
evict()
inode_wait_for_writeback() * wait for I_SYNC flag
writeback_sb_inodes()
* set I_SYNC flag on inode->i_state
__writeback_single_inode()
do_writepages()
nilfs_writepages()
nilfs_construct_dsync_segment()
nilfs_segctor_sync()
* wait for completion of segment constructor
inode_sync_complete()
* clear I_SYNC flag after __writeback_single_inode() completed
writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode->i_state. do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion. On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().
Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode->i_nlink has a
non-zero count. We can prevent evict() from being called in iput() by
implementing sop->drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation. So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5e03a4989 upstream.
It has been reported that 965GM might trigger
VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage)
in mem_cgroup_migrate when shmem wants to replace a swap cache page
because of shmem_should_replace_page (the page is allocated from an
inappropriate zone). shmem_replace_page expects that the oldpage is not
on LRU list and calls mem_cgroup_migrate without lrucare. This is
obviously incorrect because swapcache pages might be on the LRU list
(e.g. swapin readahead page).
Fix this by enabling lrucare for the migration in shmem_replace_page.
Also clarify that lrucare should be used even if one of the pages might
be on LRU list.
The BUG_ON will trigger only when CONFIG_DEBUG_VM is enabled but even
without that the migration code might leave the old page on an
inappropriate memcg' LRU which is not that critical because the page
would get removed with its last reference but it is still confusing.
Fixes: 0a31bc97c8 ("mm: memcontrol: rewrite uncharge API")
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Dave Airlie <airlied@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23aaed6659 upstream.
walk_page_range() silently skips vma having VM_PFNMAP set, which leads
to undesirable behaviour at client end (who called walk_page_range).
Userspace applications get the wrong data, so the effect is like just
confusing users (if the applications just display the data) or sometimes
killing the processes (if the applications do something with
misunderstanding virtual addresses due to the wrong data.)
For example for pagemap_read, when no callbacks are called against
VM_PFNMAP vma, pagemap_read may prepare pagemap data for next virtual
address range at wrong index.
Eventually userspace may get wrong pagemap data for a task.
Corresponding to a VM_PFNMAP marked vma region, kernel may report
mappings from subsequent vma regions. User space in turn may account
more pages (than really are) to the task.
In my case I was using procmem, procrack (Android utility) which uses
pagemap interface to account RSS pages of a task. Due to this bug it
was giving a wrong picture for vmas (with VM_PFNMAP set).
Fixes: a9ff785e44 ("mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas")
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1b02fe97f upstream.
If a non-page-aligned write is destined for a device which
is missing/faulty, we can deadlock.
As the target device is missing, a read-modify-write cycle
is not possible.
As the write is not for a full-page, a recontruct-write cycle
is not possible.
This should be handled by logic in fetch_block() which notices
there is a non-R5_OVERWRITE write to a missing device, and so
loads all blocks.
However since commit 67f455486d, that code requires
STRIPE_PREREAD_ACTIVE before it will active, and those circumstances
never set STRIPE_PREREAD_ACTIVE.
So: in handle_stripe_dirtying, if neither rmw or rcw was possible,
set STRIPE_DELAYED, which will cause STRIPE_PREREAD_ACTIVE be set
after a suitable delay.
Fixes: 67f455486d
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca7df8e0bb upstream.
Commit
c11f1df500
requires writers to wait for any pending oplock break handler to
complete before proceeding to write. This is done by waiting on bit
CIFS_INODE_PENDING_OPLOCK_BREAK in cifsFileInfo->flags. This bit is
cleared by the oplock break handler job queued on the workqueue once it
has completed handling the oplock break allowing writers to proceed with
writing to the file.
While testing, it was noticed that the filehandle could be closed while
there is a pending oplock break which results in the oplock break
handler on the cifsiod workqueue being cancelled before it has had a
chance to execute and clear the CIFS_INODE_PENDING_OPLOCK_BREAK bit.
Any subsequent attempt to write to this file hangs waiting for the
CIFS_INODE_PENDING_OPLOCK_BREAK bit to be cleared.
We fix this by ensuring that we also clear the bit
CIFS_INODE_PENDING_OPLOCK_BREAK when we remove the oplock break handler
from the workqueue.
The bug was found by Red Hat QA while testing using ltp's fsstress
command.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e64806672 upstream.
Commit e1a5848e33 ("ARM: 7924/1: mm: don't bother with reserved ttbr0
when running with LPAE") removed the use of the reserved TTBR0 value
for LPAE systems, since the ASID is held in the TTBR and can be updated
atomicly with the pgd of the next mm.
Unfortunately, this patch forgot to update flush_context, which
deliberately avoids marking the local active ASID as allocated, since we
used to switch via ASID zero and didn't need to allocate the ASID of
the previous mm. The side-effect of this is that we can allocate the
same ASID to the next mm and, between flushing the local TLB and updating
TTBR0, we can perform speculative TLB fills for userspace nG mappings
using the page table of the previous mm.
The consequence of this is that the next mm can erroneously hit some
mappings of the previous mm. Note that this was made significantly
harder to hit by a391263cd8 ("ARM: 8203/1: mm: try to re-use old ASID
assignments following a rollover") but is still theoretically possible.
This patch fixes the problem by removing the code from flush_context
that forces the allocated ASID to zero for the local CPU. Many thanks
to the Broadcom guys for tracking this one down.
Fixes: e1a5848e33 ("ARM: 7924/1: mm: don't bother with reserved ttbr0 when running with LPAE")
Reported-by: Raymond Ngun <rngun@broadcom.com>
Tested-by: Raymond Ngun <rngun@broadcom.com>
Reviewed-by: Gregory Fong <gregory.0xf0@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d76e9b9fc5 upstream.
Commit 842dfc11ea ("MIPS: Fix build with binutils 2.24.51+") in v3.18
enabled -msoft-float and sprinkled ".set hardfloat" where necessary to
use FP instructions. However it missed enable_restore_fp_context() which
since v3.17 does a ctc1 with inline assembly, causing the following
assembler errors on Mentor's 2014.05 toolchain:
{standard input}: Assembler messages:
{standard input}:2913: Error: opcode not supported on this processor: mips32r2 (mips32r2) `ctc1 $2,$31'
scripts/Makefile.build:257: recipe for target 'arch/mips/kernel/traps.o' failed
Fix that to use the new write_32bit_cp1_register() macro so that ".set
hardfloat" is automatically added when -msoft-float is in use.
Fixes 842dfc11ea ("MIPS: Fix build with binutils 2.24.51+")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9173/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e32033e14 upstream.
Add a write_32bit_cp1_register() macro to compliment the
read_32bit_cp1_register() macro. This is to abstract whether .set
hardfloat needs to be used based on GAS_HAS_SET_HARDFLOAT.
The implementation of _read_32bit_cp1_register() .sets mips1 due to
failure of gas v2.19 to assemble cfc1 for Octeon (see commit
25c3000300 ("MIPS: Override assembler target architecture for
octeon.")). I haven't copied this over to _write_32bit_cp1_register() as
I'm uncertain whether it applies to ctc1 too, or whether anybody cares
about that version of binutils any longer.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9172/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7754e7510 upstream.
As printk() invocation can cause e.g. a TLB miss, printk() cannot be
called before the exception handlers have been properly initialized.
This can happen e.g. when netconsole has been loaded as a kernel module
and the TLB table has been cleared when a CPU was offline.
Call cpu_report() in start_secondary() only after the exception handlers
have been initialized to fix this.
Without the patch the kernel will randomly either lockup or crash
after a CPU is onlined and the console driver is a module.
Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/8953/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3e6c1eff5 upstream.
If the irq_chip does not define .irq_disable, any call to disable_irq
will defer disabling the IRQ until it fires while marked as disabled.
This assumes that the handler function checks for this condition, which
handle_percpu_irq does not. In this case, calling disable_irq leads to
an IRQ storm, if the interrupt fires while disabled.
This optimization is only useful when disabling the IRQ is slow, which
is not true for the MIPS CPU IRQ.
Disable this optimization by implementing .irq_disable and .irq_enable
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8949/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ead8632bb upstream.
The following commits:
5890f70f15 (MIPS: Use dedicated exception handler if CPU supports RI/XI exceptions)
6575b1d417 (MIPS: kernel: cpu-probe: Detect unique RI/XI exceptions)
break the kernel for *all* existing MIPS CPUs that implement the
CP0_PageGrain[IEC] bit. They cause the TLB exception handlers to be
generated without the legacy execute-inhibit handling, but never set
the CP0_PageGrain[IEC] bit to activate the use of dedicated exception
vectors for execute-inhibit exceptions. The result is that upon
detection of an execute-inhibit violation, we loop forever in the TLB
exception handlers instead of sending SIGSEGV to the task.
If we are generating TLB exception handlers expecting separate
vectors, we must also enable the CP0_PageGrain[IEC] feature.
The bug was introduced in kernel version 3.17.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/8880/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a9794d329 upstream.
The following patch fixes an issue observed with 4k sector disks
where the max_hw_sectors attribute was getting set too large in
sd_revalidate_disk. Since sdkp->max_xfer_blocks is in units
of SCSI logical blocks and queue_max_hw_sectors is in units of
512 byte blocks, on a 4k sector disk, every time we went through
sd_revalidate_disk, we were taking the current value of
queue_max_hw_sectors and increasing it by a factor of 8. Fix
this by only shifting sdkp->max_xfer_blocks.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a02bb401f8 upstream.
For TKT238285 hardware issue which may cause txfifo store data twice can only
be caught on i.mx6dl, we use pio mode instead of DMA mode on i.mx6dl.
Fixes: f62caccd12 (spi: spi-imx: add DMA support)
Signed-off-by: Robin Gong <b38343@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 973fbce69e upstream.
devm_* API was supposed to be used only in probe function call.
Memory is allocated at 'probe' and free automatically at 'remove'.
Usage of devm_* functions outside probe sometimes leads to memory leak.
Avoid using devm_kzalloc in dspi_setup_transfer and use kzalloc instead.
Also add the dspi_cleanup function to free the controller data upon
cleanup.
Acked-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Bhuvanchandra DV <bhuvanchandra.dv@toradex.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06cf35f903 upstream.
Some AMD CS553x devices have read-only BARs because of a firmware or
hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no
longer works after 36e8164882 ("PCI: Restore detection of read-only
BARs"). Prior to 36e8164882, we filled in res->start; afterwards we
leave it zeroed out. The quirk only updated the size, so the driver tried
to use a region starting at zero, which didn't work.
Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
hard-code the sizes.
On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882,
we interpret that as a 512-byte BAR based on the lowest-order bit set. Per
datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
avoid clearing any address bits if a platform uses only 64-byte alignment.
[bhelgaas: changelog, reduce BAR 2 size to 64]
Fixes: 36e8164882 ("PCI: Restore detection of read-only BARs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf
Reported-and-tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19c5392eb1 upstream.
The DesignWare PCIe MSI hardware does not support MSI-X IRQs. Setting
those up failed as a side effect of a bug which was fixed by 91f8ae823f
("PCI: designware: Setup and clear exactly one MSI at a time").
Now that this bug is fixed, MSI-X IRQs need to be rejected explicitly;
otherwise devices trying to use them may end up with incorrectly working
interrupts.
Fixes: 91f8ae823f ("PCI: designware: Setup and clear exactly one MSI at a time")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49d2ca84e4 upstream.
Fix memory leak in the gpio sysfs interface due to failure to drop
reference to device returned by class_find_device when setting the
gpio-line polarity.
Fixes: 0769746183 ("gpiolib: add support for changing value polarity in sysfs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f303db08d upstream.
Fix memory leak in the gpio sysfs interface due to failure to drop
reference to device returned by class_find_device when creating a link.
Fixes: a4177ee7f1 ("gpiolib: allow exported GPIO nodes to be named using sysfs links")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4dba13089 upstream.
Introduce an arch specific function to find out whether a particular dma
mapping operation needs to bounce on the swiotlb buffer.
On ARM and ARM64, if the page involved is a foreign page and the device
is not coherent, we need to bounce because at unmap time we cannot
execute any required cache maintenance operations (we don't know how to
find the pfn from the mfn).
No change of behaviour for x86.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6ad369130 upstream.
Commit 0b46b8a718 (clocksource: arch_timer: Fix code to use physical
timers when requested) introduces the use of physical counters in the
ARM architected timer driver. However, he arm64 kernel uses CNTVCT in
VDSO. When booting in EL2, the kernel switches to the physical timers to
make things easier for KVM but it continues to use the virtual counter
both in user and kernel. While in such scenario CNTVCT == CNTPCT (since
CNTVOFF is initialised by the kernel to 0), we want to spot firmware
bugs corrupting CNTVOFF early (which would affect CNTVCT).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 0b46b8a718 ("clocksource: arch_timer: Fix code to use physical
timers when requested")
Cc: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8414947a20 upstream.
If a touchpad reports the F11 data40 register then this indicates that the touchpad reports
additional ACM (Accidental Contact Mitigation) data after the F11 data in the HID attention
report. These additional bytes shift the position of the F30 button data causing the driver
to incorrectly report button state when this functionality is present. This patch accounts
for the additional data in the report.
Fixes:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1398533
Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0346dadbf0 upstream.
Commit e61734c55c ("cgroup: remove cgroup->name") added two extra
newlines to memcg oom kill log messages. This makes dmesg hard to read
and parse. The issue affects 3.15+.
Example:
Task in /t <<< extra #1
killed as a result of limit of /t
<<< extra #2
memory: usage 102400kB, limit 102400kB, failcnt 274712
Remove the extra newlines from memcg oom kill messages, so the messages
look like:
Task in /t killed as a result of limit of /t
memory: usage 102400kB, limit 102400kB, failcnt 240649
Fixes: e61734c55c ("cgroup: remove cgroup->name")
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14bf61ffe6 upstream.
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
tracks space limits and usage in 512-byte blocks. However VFS quotas
track usage in bytes (as some filesystems require that) and we need to
somehow pass this information. Upto now it wasn't a problem because we
didn't do any unit conversion (thus VFS quota routines happily stuck
number of bytes into d_bcount field of struct fd_disk_quota). Only if
you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
/ Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
tried this but reportedly some Samba users hit the problem in practice.
So when we want interfaces compatible we need to fix this.
We bite the bullet and define another quota structure used for passing
information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
to have more conversion routines in fs/quota/quota.c and another copying
of quota structure slows down getting of quota information by about 2%
but it seems cleaner than overloading e.g. units of d_bcount to bytes.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13f3fbe827 upstream.
commit 6dda730e55
Author: Jani Nikula <jani.nikula@intel.com>
Date: Tue Jun 24 18:27:40 2014 +0300
drm/i915: respect the VBT minimum backlight brightness
introduced a bug which resulted in inconsistent brightness levels on
different machines. If a suspended was entered with the screen off some
machines would resume with the screen at minimum brightness and others
at maximum brightness.
The following commands can be used to produce this behavior.
xset dpms force off
sleep 1
sudo systemctl suspend
(resume ...)
The root cause of this problem is a comparison which checks to see if
the backlight level is zero when the panel is enabled. If it is zero,
it is set to the maximum level. Unfortunately, not all machines have a
minimum level of zero. On those machines the level is left at the
minimum instead of begin set to the maximum.
Fix the bug by updating the comparison to check for the minimum
backlight level instead of zero. Also, expand the comparison for
the possible case when the level is less than the minimum.
Fixes: 6dda730e55 ("respect the VBT minimum backlight brightness")
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f48a01651b upstream.
Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
ppgtt") introduced a regression on Broadwell, triggering the following
IOMMU fault at startup:
vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
dmar: DRHD: handling fault status reg 2
dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
DMAR:[fault reason 23] Unknown
fbcon: inteldrmfb (fb0) is primary device
Further commentary from Daniel:
I sugggested this change to David after staring at the offending patch
for a while. I have no idea and theory whatsoever why this would upset
the gpu less than the other way round. But it seems to work. David
promised to chase hw people a bit more to get a more meaningful answer.
Wrt the comment that this deletes: I've done some digging and afaict
loading context before ppgtt enable was once required before our recent
restructuring of the context/ppgtt init code: Before that context sw
setup (i.e. allocating the default context) and hw setup was smashed
together. Also the setup of the default context was the bit that
actually allocated the aliasing ppgtt structures. Which is the reason
for the context before ppgtt depency.
Or was, since with all the untangling there's no no real depency any
more (functional, who knows what the hw is doing), so the comment is
just stale.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af1a7301c7 upstream.
When creating a fence for a tiled object, only fence the area that
makes up the actual tiles. The object may be larger than the tiled
area and if we allow those extra addresses to be fenced, they'll
get converted to addresses beyond where the object is mapped. This
opens up the possiblity of writes beyond the end of object.
To prevent this, we adjust the size of the fence to only encompass
the area that makes up the actual tiles. The extra space is considered
un-tiled and now behaves as if it was a linear object.
Testcase: igt/gem_tiled_fence_overflow
Reported-by: Dan Hettena <danh@ghs.com>
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2148f18fdb upstream.
VT switch back/forth from console to xserver (for example) has potential
to go horribly wrong if a dynamic DP MST connector ends up in the saved
modeset that is restored when switching back to fbcon.
When removing a dynamic connector, don't forget to clean up the saved
state.
v1: original
v2: null out set->fb if no more connectors to avoid making i915 cranky
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1184968
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02a54164c5 upstream.
In Dual EMAC, the default VLANs are used to segregate Rx packets between
the ports, so adding the same default VLAN to the switch will affect the
normal packet transfers. So returning error on addition of dual EMAC
default VLANs.
Even if EMAC 0 default port VLAN is added to EMAC 1, it will lead to
break dual EMAC port separations.
Fixes: d9ba8f9e62 (driver: net: ethernet: cpsw: dual emac interface implementation)
Reported-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83b0302d34 upstream.
The regulator framework maintains a list of consumer regulators
for a regulator device and protects it from concurrent access using
the regulator device's mutex lock.
In the case of regulator_put() the consumer is removed and regulator
device's parameters are updated without holding the regulator device's
mutex. This would lead to a race condition between the regulator_put()
and any function which traverses the consumer list or modifies regulator
device's parameters.
Fix this race condition by holding the regulator device's mutex in case
of regulator_put.
Signed-off-by: Ashay Jaiswal <ashayj@codeaurora.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c957e8f084 upstream.
Once the current message is finished, the driver notifies SPI core about
this by calling spi_finalize_current_message(). This function queues next
message to be transferred. If there are more messages in the queue, it is
possible that the driver is asked to transfer the next message at this
point.
When spi_finalize_current_message() returns the driver clears the
drv_data->cur_chip pointer to NULL. The problem is that if the driver
already started the next message clearing drv_data->cur_chip will cause
NULL pointer dereference which crashes the kernel like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffffa0022bc8>] cs_deassert+0x18/0x70 [spi_pxa2xx_platform]
PGD 78bb8067 PUD 37712067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 1 PID: 11 Comm: ksoftirqd/1 Tainted: G O 3.18.0-rc4-mjo #5
Hardware name: Intel Corp. VALLEYVIEW B3 PLATFORM/NOTEBOOK, BIOS MNW2CRB1.X64.0071.R30.1408131301 08/13/2014
task: ffff880077f9f290 ti: ffff88007a820000 task.ti: ffff88007a820000
RIP: 0010:[<ffffffffa0022bc8>] [<ffffffffa0022bc8>] cs_deassert+0x18/0x70 [spi_pxa2xx_platform]
RSP: 0018:ffff88007a823d08 EFLAGS: 00010202
RAX: 0000000000000008 RBX: ffff8800379a4430 RCX: 0000000000000026
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff8800379a4430
RBP: ffff88007a823d18 R08: 00000000ffffffff R09: 000000007a9bc65a
R10: 000000000000028f R11: 0000000000000005 R12: ffff880070123e98
R13: ffff880070123de8 R14: 0000000000000100 R15: ffffc90004888000
FS: 0000000000000000(0000) GS:ffff880079a80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000048 CR3: 000000007029b000 CR4: 00000000001007e0
Stack:
ffff88007a823d58 ffff8800379a4430 ffff88007a823d48 ffffffffa0022c89
0000000000000000 ffff8800379a4430 0000000000000000 0000000000000006
ffff88007a823da8 ffffffffa0023be0 ffff88007a823dd8 ffffffff81076204
Call Trace:
[<ffffffffa0022c89>] giveback+0x69/0xa0 [spi_pxa2xx_platform]
[<ffffffffa0023be0>] pump_transfers+0x710/0x740 [spi_pxa2xx_platform]
[<ffffffff81076204>] ? pick_next_task_fair+0x744/0x830
[<ffffffff81049679>] tasklet_action+0xa9/0xe0
[<ffffffff81049a0e>] __do_softirq+0xee/0x280
[<ffffffff81049bc0>] run_ksoftirqd+0x20/0x40
[<ffffffff810646df>] smpboot_thread_fn+0xff/0x1b0
[<ffffffff810645e0>] ? SyS_setgroups+0x150/0x150
[<ffffffff81060f9d>] kthread+0xcd/0xf0
[<ffffffff81060ed0>] ? kthread_create_on_node+0x180/0x180
[<ffffffff8187a82c>] ret_from_fork+0x7c/0xb0
Fix this by clearing drv_data->cur_chip before we call spi_finalize_current_message().
Reported-by: Martin Oldfield <m@mjoldfield.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5636d2f842 upstream.
The GART table BO has to be moved out of VRAM for suspend/resume. Any
updates to the GART table during that time were silently dropped without
this change. This caused GPU lockups on resume in some cases, see the bug
reports referenced below.
This might also make GPU reset more robust in some cases, as we no longer
rely on the GART table in VRAM being preserved across the GPU
lockup/reset.
v2: Add logic to radeon_gart_table_vram_pin directly instead of
reinstating radeon_gart_restore
v3: Move code after assignment of rdev->gart.table_addr so that the GART
TLB flush can work as intended, add code comment explaining why we're
doing this
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85204
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86267
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 496eb6fd2c upstream.
Fixes a case where we call vmw_fifo_idle() from within a wait function with
task state !TASK_RUNNING, which is illegal.
In addition, make the locking fine-grained, so that it is performed once
for every read- and write operation. This is of course more costly, but we
don't perform much register access in the timing critical paths anyway. Instead
we have the extra benefit of being sure that we don't forget the hw lock around
register accesses. I think currently the kms code was quite buggy w r t this.
This fixes Red Hat Bugzilla Bug 1180796
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 766a78882d upstream.
Commit 9b1cc9f251 ("dm cache: share cache-metadata object across
inactive and active DM tables") mistakenly ignored the use of ERR_PTR
returns. Restore missing IS_ERR checks and ERR_PTR returns where
appropriate.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a7eaea02b upstream.
You can't modify the metadata in these modes. It's better to fail these
messages immediately than let the block-manager deny write locks on
metadata blocks. Otherwise these failed metadata changes will trigger
'needs_check' to get set in the metadata superblock -- requiring repair
using the thin_check utility.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcad68876c upstream.
Since commit f2c3c67f00 (merge commit that adds commit "ARM: mvebu:
completely disable hardware I/O coherency"), we disable I/O coherency
on Armada EBU platforms.
However, we continue to initialize the coherency fabric, because this
coherency fabric is needed on Armada XP for inter-CPU
coherency. Unfortunately, due to this, we also continued to execute
the coherency fabric initialization code for Armada 375/38x, which
switched the PL310 into I/O coherent mode. This has the effect of
disabling the outer cache sync operation: this is needed when I/O
coherency is enabled to work around a PCIe/L2 deadlock. But obviously,
when I/O coherency is disabled, having the outer cache sync operation
is crucial.
Therefore, this commit fixes the armada_375_380_coherency_init() so
that the PL310 is switched to I/O coherent mode only if I/O coherency
is enabled.
Without this fix, all devices using DMA are broken on Armada 375/38x.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0b957f306 upstream.
Today we expect that all the bank are enabled, and count the number of banks
used by the pinctrl based on it instead of using the last bank id enabled.
So switch to it, set the chained IRQ at runtime based on enabled banks
and wait only the number of enabled gpio controllers at probe time.
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fa7b39131 upstream.
In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.
Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.
Fixes: e31b82136d ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2af81d6718 upstream.
In normal cases (i.e. when we are fully associated), cfg80211 takes
care of removing all the stations before calling suspend in mac80211.
But in the corner case when we suspend during authentication or
association, mac80211 needs to roll back the station states. But we
shouldn't roll back the station states in the suspend function,
because this is taken care of in other parts of the code, except for
WDS interfaces. For AP types of interfaces, cfg80211 takes care of
disconnecting all stations before calling the driver's suspend code.
For station interfaces, this is done in the quiesce code.
For WDS interfaces we still need to do it here, so move the code into
a new switch case for WDS.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a5c5e81d8 upstream.
Fix a regression introduced by commit a5e70697d0 ("mac80211: add radiotap flag
and handling for 5/10 MHz") where the IEEE80211_CHAN_CCK channel type flag was
incorrectly replaced by the IEEE80211_CHAN_OFDM flag. This commit fixes that by
using the CCK flag again.
Fixes: a5e70697d0 ("mac80211: add radiotap flag and handling for 5/10 MHz")
Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee8a1a8b16 upstream.
We only support swap file calling nfs_direct_IO. However, application
might be able to get to nfs_direct_IO if it toggles O_DIRECT flag
during IO and it can deadlock because we grab inode->i_mutex in
nfs_file_direct_write(). So return 0 for such case. Then the generic
layer will fall back to buffer IO.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d90d6d552 upstream.
Without this the aux port does not get detected, and consequently the touchpad
will not work.
With this patch the touchpad is detected:
$ dmesg | grep -E "(SYN|i8042|serio)"
pnp 00:03: Plug and Play ACPI device, IDs SYN1d22 PNP0f13 (active)
i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd00123/0x840300/0x126800, board id: 2863, fw id: 1473085
input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input6
dmidecode excerpt for this laptop is:
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Medion
Product Name: Akoya E7225
Version: 1.0
Signed-off-by: Jochen Hein <jochen@jochen.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34e81ad5f0 upstream.
This patch solves deadlock between clock prepare mutex and regmap mutex reported
by Tomasz Figa in [1] by implementing solution from [2]: "always leave the clock
of the i2c controller in a prepared state".
[1] https://lkml.org/lkml/2014/7/2/171
[2] https://lkml.org/lkml/2014/7/2/207
On each i2c transfer handled by s3c24xx_i2c_xfer(), clk_prepare_enable() was
called, which calls clk_prepare() then clk_enable(). clk_prepare() takes
prepare_lock mutex before proceeding. Note that i2c transfer functions are
invoked from many places in kernel, typically with some other additional lock
held.
It may happen that function on CPU1 (e.g. regmap_update_bits()) has taken a
mutex (i.e. regmap lock mutex) then it attempts i2c communication in order to
proceed (so it needs to obtain clock related prepare_lock mutex during transfer
preparation stage due to clk_prepare() call). At the same time other task on
CPU0 wants to operate on clock (e.g. to (un)prepare clock for some other reason)
so it has taken prepare_lock mutex.
CPU0: CPU1:
clk_disable_unused() regulator_disable()
clk_prepare_lock() map->lock(map->lock_arg)
regmap_read() s3c24xx_i2c_xfer()
map->lock(map->lock_arg) clk_prepare_lock()
Implemented solution from [2] leaves i2c clock prepared. Preparation is done in
s3c24xx_i2c_probe() function. Without this patch, it is immediately unprepared
by clk_disable_unprepare() call. I've replaced this call with clk_disable() and
I've added clk_unprepare() call in s3c24xx_i2c_remove().
The s3c24xx_i2c_xfer() function now uses clk_enable() instead of
clk_prepare_enable() (and clk_disable() instead of clk_unprepare_disable()).
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e5dff0e804 upstream.
OTG device shall support this device for allowing compliance automated testing.
The modification is derived from Pavankumar and Vijayavardhans' previous work.
Signed-off-by: Macpaul Lin <macpaul@gmail.com>
Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: Vijayavardhan Vennapusa <vvreddy@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae43e9d05e upstream.
The comment for rbd_dev_parent_get() said
* We must get the reference before checking for the overlap to
* coordinate properly with zeroing the parent overlap in
* rbd_dev_v2_parent_info() when an image gets flattened. We
* drop it again if there is no overlap.
but the "drop it again if there is no overlap" part was missing from
the implementation. This lead to absurd parent_ref values for images
with parent_overlap == 0, as parent_ref was incremented for each
img_request and virtually never decremented.
Fix this by leveraging the fact that refresh path calls
rbd_dev_v2_parent_info() under header_rwsem and use it for read in
rbd_dev_parent_get(), instead of messing around with atomics. Get rid
of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
they'd pair with now and I suspect we are in a pretty miserable
situation as far as proper locking goes regardless.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e69b8d414f upstream.
This effectively reverts the last hunk of 392a9dad7e ("rbd: detect
when clone image is flattened").
The problem with parent_overlap != 0 condition is that it's possible
and completely valid to have an image with parent_overlap == 0 whose
parent state needs to be cleaned up on unmap. The next commit, which
drops the "clone image now standalone" logic, opens up another window
of opportunity to hit this, but even without it
# cat parent-ref.sh
#!/bin/bash
rbd create --image-format 2 --size 1 foo
rbd snap create foo@snap
rbd snap protect foo@snap
rbd clone foo@snap bar
rbd resize --allow-shrink --size 0 bar
rbd resize --size 1 bar
DEV=$(rbd map bar)
rbd unmap $DEV
leaves rbd_device/rbd_spec/etc and rbd_client along with ceph_client
hanging around.
My thinking behind calling rbd_dev_parent_put() unconditionally is that
there shouldn't be any requests in flight at that point in time as we
are deep into unmap sequence. Hence, even if rbd_dev_unparent() caused
by flatten is delayed by in-flight requests, it will have finished by
the time we reach rbd_dev_unprobe() caused by unmap, thus turning
unconditional rbd_dev_parent_put() into a no-op.
Fixes: http://tracker.ceph.com/issues/10352
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0767e95bb9 upstream.
When the last subscriber to a "Through" port has been removed, the
subscribed destination ports might still be active, so it would be
wrong to send "all sounds off" and "reset controller" events to them.
The proper place for such a shutdown would be the closing of the actual
MIDI port (and close_substream() in rawmidi.c already can do this).
This also fixes a deadlock when dummy_unuse() tries to send events to
its own port that is already locked because it is being freed.
Reported-by: Peter Billam <peter@www.pjb.com.au>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6eb2eba49 upstream.
The commit 3b8a3c0109 ("powerpc/pseries: Fix endiannes issue in RTAS
call from xmon") was fixing an endianness issue in the call made from
xmon to RTAS.
However, as Michael Ellerman noticed, this fix was not complete, the
token value was not byte swapped. This lead to call an unexpected and
most of the time unexisting RTAS function, which is silently ignored by
RTAS.
This fix addresses this hole.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e638642b08 upstream.
While being in an ERROR_WARNING state, and receiving further
bus error events with error counters still in the ERROR_WARNING
range of 97-127 inclusive, the state handling code erroneously
reverts back to ERROR_ACTIVE.
Per the CAN standard, only revert to ERROR_ACTIVE when the
error counters are less than 96.
Moreover, in certain Kvaser models, the BUS_ERROR flag is
always set along with undefined bits in the M16C status
register. Thus use bitwise operators instead of full equality
for checking that register against bus errors.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14c10c2a1d upstream.
On some x86 laptops, plugging a Kvaser device again after an
unplug makes the firmware always ignore the very first command.
For such a case, provide some room for retries instead of
completely exiting the driver init code.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3803fa6977 upstream.
Send expected argument to the URB completion hander: a CAN
netdevice instead of the network interface private context
`kvaser_usb_net_priv'.
This was discovered by having some garbage in the kernel
log in place of the netdevice names: can0 and can1.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ded5006667 upstream.
Upon receiving a hardware event with the BUS_RESET flag set,
the driver kills all of its anchored URBs and resets all of
its transmit URB contexts.
Unfortunately it does so under the context of URB completion
handler `kvaser_usb_read_bulk_callback()', which is often
called in an atomic context.
While the device is flooded with many received error packets,
usb_kill_urb() typically sleeps/reschedules till the transfer
request of each killed URB in question completes, leading to
the sleep in atomic bug. [3]
In v2 submission of the original driver patch [1], it was
stated that the URBs kill and tx contexts reset was needed
since we don't receive any tx acknowledgments later and thus
such resources will be locked down forever. Fortunately this
is no longer needed since an earlier bugfix in this patch
series is now applied: all tx URB contexts are reset upon CAN
channel close. [2]
Moreover, a BUS_RESET is now treated _exactly_ like a BUS_OFF
event, which is the recommended handling method advised by
the device manufacturer.
[1] http://article.gmane.org/gmane.linux.network/239442http://www.webcitation.org/6Vr2yagAQ
[2] can: kvaser_usb: Reset all URB tx contexts upon channel close
889b77f7fd
[3] Stacktrace:
<IRQ> [<ffffffff8158de87>] dump_stack+0x45/0x57
[<ffffffff8158b60c>] __schedule_bug+0x41/0x4f
[<ffffffff815904b1>] __schedule+0x5f1/0x700
[<ffffffff8159360a>] ? _raw_spin_unlock_irqrestore+0xa/0x10
[<ffffffff81590684>] schedule+0x24/0x70
[<ffffffff8147d0a5>] usb_kill_urb+0x65/0xa0
[<ffffffff81077970>] ? prepare_to_wait_event+0x110/0x110
[<ffffffff8147d7d8>] usb_kill_anchored_urbs+0x48/0x80
[<ffffffffa01f4028>] kvaser_usb_unlink_tx_urbs+0x18/0x50 [kvaser_usb]
[<ffffffffa01f45d0>] kvaser_usb_rx_error+0xc0/0x400 [kvaser_usb]
[<ffffffff8108b14a>] ? vprintk_default+0x1a/0x20
[<ffffffffa01f5241>] kvaser_usb_read_bulk_callback+0x4c1/0x5f0 [kvaser_usb]
[<ffffffff8147a73e>] __usb_hcd_giveback_urb+0x5e/0xc0
[<ffffffff8147a8a1>] usb_hcd_giveback_urb+0x41/0x110
[<ffffffffa0008748>] finish_urb+0x98/0x180 [ohci_hcd]
[<ffffffff810cd1a7>] ? acct_account_cputime+0x17/0x20
[<ffffffff81069f65>] ? local_clock+0x15/0x30
[<ffffffffa000a36b>] ohci_work+0x1fb/0x5a0 [ohci_hcd]
[<ffffffff814fbb31>] ? process_backlog+0xb1/0x130
[<ffffffffa000cd5b>] ohci_irq+0xeb/0x270 [ohci_hcd]
[<ffffffff81479fc1>] usb_hcd_irq+0x21/0x30
[<ffffffff8108bfd3>] handle_irq_event_percpu+0x43/0x120
[<ffffffff8108c0ed>] handle_irq_event+0x3d/0x60
[<ffffffff8108ec84>] handle_fasteoi_irq+0x74/0x110
[<ffffffff81004dfd>] handle_irq+0x1d/0x30
[<ffffffff81004727>] do_IRQ+0x57/0x100
[<ffffffff8159482a>] common_interrupt+0x6a/0x6a
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b07ef35244 upstream.
Commit 6fb1ca92a6 "udf: Fix race between write(2) and close(2)"
changed the condition when preallocation is released. The idea was that
we don't want to release the preallocation for an inode on close when
there are other writeable file descriptors for the inode. However the
condition was written in the opposite way so we released preallocation
only if there were other writeable file descriptors. Fix the problem by
changing the condition properly.
Fixes: 6fb1ca92a6
Reported-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ddfdb5c5a upstream.
If asoc_simple_card_probe() fails, asoc_simple_card_unref() may be
called before dev_set_drvdata(), causing a NULL pointer dereference in
asoc_simple_card_unref():
Unable to handle kernel NULL pointer dereference at virtual address 000000d4
...
PC is at asoc_simple_card_unref+0x14/0x48
LR is at asoc_simple_card_probe+0x3d4/0x40c
This typically happens because asoc_simple_card_parse_of() returns
-EPROBE_DEFER, but other failure modes are possible.
devm_snd_soc_register_card()/snd_soc_register_card() may fail either
before or after dev_set_drvdata().
Pass a snd_soc_card pointer instead of a platform_device pointer to
asoc_simple_card_unref() to fix this.
Note that if CONFIG_OF_DYNAMIC=n, of_node_put() is a dummy, and gcc may
optimize away the loop over card->dai_link, never actually dereferencing
card, and thus avoiding the crash...
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: e512e001da ("ASoC: simple-card: Fix the reference count of device nodes")
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3268a40d4 upstream.
In soc_new_compress() when rtd->dai_link->dynamic is set, we create the pcm
substreams with this call:
ret = snd_pcm_new_internal(rtd->card->snd_card, new_name, num,
1, 0, &be_pcm);
which passes 0 as capture_count leading to
be_pcm->streams[SNDRV_PCM_STREAM_CAPTURE].substream
being NULL, hence when trying to set rtd a few lines below we get an oops.
Fix by using rtd->dai_link->dpcm_playback and rtd->dai_link->dpcm_capture as
playback_count and capture_count to snd_pcm_new_internal().
Signed-off-by: Qais Yousef <qais.yousef@imgtec.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c145c56d0 upstream.
The stack guard page error case has long incorrectly caused a SIGBUS
rather than a SIGSEGV, but nobody actually noticed until commit
fee7e49d45 ("mm: propagate error from stack expansion even for guard
page") because that error case was never actually triggered in any
normal situations.
Now that we actually report the error, people noticed the wrong signal
that resulted. So far, only the test suite of libsigsegv seems to have
actually cared, but there are real applications that use libsigsegv, so
let's not wait for any of those to break.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33692f2759 upstream.
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67bf9cda4b upstream.
The FIFO size is 40 accordingly to the specifications, but this means 0x40,
i.e. 64 bytes. This patch fixes the typo and enables FIFO size autodetection
for Intel MID devices.
Fixes: 7063c0d942 (spi/dw_spi: add DMA support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d297933cc7 upstream.
Current code tries to find the highest valid fifo depth by checking the value
it wrote to DW_SPI_TXFLTR. There are a few problems in current code:
1) There is an off-by-one in dws->fifo_len setting because it assumes the latest
register write fails so the latest valid value should be fifo - 1.
2) We know the depth could be from 2 to 256 from HW spec, so it is not necessary
to test fifo == 257. In the case fifo is 257, it means the latest valid
setting is fifo = 256. So after the for loop iteration, we should check
fifo == 2 case instead of fifo == 257 if detecting the FIFO depth fails.
This patch fixes above issues.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Reviewed-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e14dcf7cb upstream.
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.
Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.
Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.
Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4943ba16bb upstream.
This adds the module loading prefix "crypto-" to the template lookup
as well.
For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":
net-pf-38
algif-hash
crypto-vfat(blowfish)
crypto-vfat(blowfish)-all
crypto-vfat
Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 175f8e2650 upstream.
In some cases acpi_device_wakeup() may be called to ensure wakeup
power to be off for a given device even though that device's wakeup
GPE has not been enabled so far. It calls acpi_disable_gpe() on a
GPE that's not enabled and this causes ACPICA to return the AE_LIMIT
status code from that call which then is reported as an error by the
ACPICA's debug facilities (if enabled). This may lead to a fair
amount of confusion, so introduce a new ACPI device wakeup flag
to store the wakeup GPE status and avoid disabling wakeup GPEs
that have not been enabled.
Reported-and-tested-by: Venkat Raghavulu <venkat.raghavulu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbdd74763f upstream.
This reverts commit 2c3fc8d26d.
This commit broke on x86 PV because entries in the generic SWIOTLB are
indexed using (pseudo-)physical address not DMA address and these are
not the same in a x86 PV guest.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b05ac3824 upstream.
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.
The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov
for noticing that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ca3f5e974 upstream.
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.
Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list
This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.
The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62924af247 upstream.
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ea2aa8b7d upstream.
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45f87de57f upstream.
Commit 2457aec637 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.
Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks. This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.
Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree. Just make
sure that the radix tree gets only the sane subset of the mask (e.g. do
not pass __GFP_WRITE).
Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3a8784454 upstream.
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.
This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).
This would cause either a panic, or corrupt memory.
Fixes CVE-2014-9529.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4aaa71873d upstream.
DMA mapped IO should be unmapped on the error path in probe() and
unconditionally on remove().
Fixes: 62936009f3 ([libata] Add 460EX on-chip SATA driver, sata_dwc_460ex)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38a1dfda8e upstream.
Commit 0dbc6078c0 ('x86, build, pci: Fix PCI_MSI build on !SMP')
introduced the dependency that X86_UP_APIC is only available when
PCI_MSI is false. This effectively prevents PCI_MSI support on 32bit
UP systems because it disables both APIC and IO-APIC. But APIC support
is architecturally required for PCI_MSI.
The intention of the patch was to enforce APIC support when PCI_MSI is
enabled, but failed to do so.
Remove the !PCI_MSI dependency from X86_UP_APIC and enforce
X86_UP_APIC when PCI_MSI support is enabled on 32bit UP systems.
[ tglx: Massaged changelog ]
Fixes 0dbc6078c0 'x86, build, pci: Fix PCI_MSI build on !SMP'
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: http://lkml.kernel.org/r/1421967529-9037-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3669ef9fa7 upstream.
The Witcher 2 did something like this to allocate a TLS segment index:
struct user_desc u_info;
bzero(&u_info, sizeof(u_info));
u_info.entry_number = (uint32_t)-1;
syscall(SYS_set_thread_area, &u_info);
Strictly speaking, this code was never correct. It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real. The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.
The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.
This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working. In
particular, using the code above to allocate two segments will
allocate the same segment both times.
According to FrostbittenKing on Github, this fixes The Witcher 2.
If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.
Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3747379ac upstream.
SYSENTER emulation is broken in several ways:
1. It misses the case of 16-bit code segments completely (CVE-2015-0239).
2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can
still be set without causing #GP).
3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in
legacy-mode.
4. There is some unneeded code.
Fix it.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c38d28ba8 upstream.
EXYNOS4_MCT_L_MASK is defined as 0xffffff00, so applying this bitmask
produces a number outside the range 0x00 to 0xff, which always results
in execution of the default switch statement.
Obviously this is wrong and git history shows that the bitmask inversion
was incorrectly set during a refactoring of the MCT code.
Fix this by putting the inversion at the correct position again.
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Reported-by: GP Orcullo <kinsamanka@gmail.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 378ff1a53b upstream.
It really needs to check that src is non-directory *and* use
{un,}lock_two_nodirectories(). As it is, it's trivial to cause
double-lock (ioctl(fd, CIFS_IOC_COPYCHUNK_FILE, fd)) and if the
last argument is an fd of directory, we are asking for trouble
by violating the locking order - all directories go before all
non-directories. If the last argument is an fd of parent
directory, it has 50% odds of locking child before parent,
which will cause AB-BA deadlock if we race with unlink().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38bdf45f4a upstream.
On Armada XP, 375 and 38x the MBus window 13 has the remap capability,
like windows 0 to 7. However, the mvebu-mbus driver isn't currently
taking into account this special case, which means that when window 13
is actually used, the remap registers are left to 0, making the device
using this MBus window unavailable.
As a minimal fix for stable, don't use window 13. A full fix will
follow later.
Fixes: fddddb52a6 ("bus: introduce an Marvell EBU MBus driver")
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f1e8ee286 upstream.
The current hardware I/O coherency is known to cause problems with DMA
coherent buffers, as it still requires explicit I/O synchronization
barriers, which is not compatible with the semantics expected by the
Linux DMA coherent buffers API.
So, in order to have enough time to validate a new solution based on
automatic I/O synchronization barriers, this commit disables hardware
I/O coherency entirely. Future patches will re-enable it.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ecd0bde5b upstream.
Currently PWM functionality is broken on mx25 due to the wrong assignment of the
PWM "per" clock.
According to Documentation/devicetree/bindings/clock/imx25-clock.txt:
pwm_ipg_per 52
,so update the pwm "per" to use 'pwm_ipg_per' instead of 'per10' clock.
With this change PWM can work fine on mx25.
Reported-by: Carlos Soto <csotoalonso@gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e5aeb4367 upstream.
Verify that the frequency value from userspace is valid and makes sense.
Unverified values can cause overflows later on.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ada1fc0e1 upstream.
An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b149e4174 upstream.
commit 55601c9f24 (arm: omap: intc: switch over
to linear irq domain) introduced a regression with
SDMA legacy driver because that driver strictly depends
on INTC's IRQs starting at NR_IRQs. Aparently
irq_domain_add_linear() won't guarantee that, since we see
a 7 IRQs difference when booting with and without the
commit cited above.
Until arch/arm/plat-omap/dma.c is properly fixed, we
must maintain OMAP2/3 using irq_domain_add_legacy().
A FIXME note was added so people know to delete that
code once that legacy DMA driver is fixed up.
Fixes: 55601c9f24 (arm: omap: intc: switch over to linear irq domain)
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Link: https://lkml.kernel.org/r/1420576688-10604-1-git-send-email-balbi@ti.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a59db67656 upstream.
Introduce a new variable to count the number of allocated migration
structures. The existing variable cache->nr_migrations became
overloaded. It was used to:
i) track of the number of migrations in flight for the purposes of
quiescing during suspend.
ii) to estimate the amount of background IO occuring.
Recent discard changes meant that REQ_DISCARD bios are processed with
a migration. Discards are not background IO so nr_migrations was not
incremented. However this could cause quiescing to complete early.
(i) is now handled with a new variable cache->nr_allocated_migrations.
cache->nr_migrations has been renamed cache->nr_io_migrations.
cleanup_migration() is now called free_io_migration(), since it
decrements that variable.
Also, remove the unused cache->next_migration variable that got replaced
with with prealloc_structs a while ago.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b1cc9f251 upstream.
If a DM table is reloaded with an inactive table when the device is not
suspended (normal procedure for LVM2), then there will be two dm-bufio
objects that can diverge. This can lead to a situation where the
inactive table uses bufio to read metadata at the same time the active
table writes metadata -- resulting in the inactive table having stale
metadata buffers once it is promoted to the active table slot.
Fix this by using reference counting and a global list of cache metadata
objects to ensure there is only one metadata object per metadata device.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cf11ee630 upstream.
The locking scheme inside the vb2 thread is unsafe when stopping the
thread. In particular kthread_stop was called *after* internal data
structures were cleaned up instead of doing that before. In addition,
internal vb2 functions were called after threadio->stop was set to
true and vb2_internal_streamoff was called. This is also not allowed.
All this led to a variety of race conditions and kernel warnings and/or
oopses.
Fixed by moving the kthread_stop call up before the cleanup takes
place, and by checking threadio->stop before calling internal vb2
queuing operations.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 721f3223f2 upstream.
Unconditionally attaching Si2161/Si2165 demod driver
breaks Hauppauge WinTV Starburst.
So create own card entry for this.
Add card name comments to the subsystem ids.
This fixes a regression introduced in 3.17 by
36efec48e2 ([media] cx23885: Add si2165 support for HVR-5500)
Signed-off-by: Matthias Schwarzott <zzam@gentoo.org>
Tested-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cdb08172b upstream.
Fixes a race condition in abort handling that was injected
when multiple interrupt support was added. When only a single
interrupt is present, the adapter guarantees it will send
responses for aborted commands prior to the response for the
abort command itself. With multiple interrupts, these responses
generally come back on different interrupts, so we need to
ensure the abort thread waits until the aborted command is
complete so we don't perform a double completion. This race
condition was being hit frequently in environments which
were triggering command timeouts, which was resulting in
a double completion causing a kernel oops.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3e59ee4e7 upstream.
Reports against the TL-WDN4800 card indicate that PCI bus reset of this
Atheros device cause system lock-ups and resets. I've also been able to
confirm this behavior on multiple systems. The device never returns from
reset and attempts to access config space of the device after reset result
in hangs. Blacklist bus reset for the device to avoid this issue.
[bhelgaas: This regression appeared in v3.14. Andreas bisected it to
425c1b223d ("PCI: Add Virtual Channel to save/restore support"), but we
don't understand the mechanism by which that commit affects the reset
path.]
[bhelgaas: changelog, references]
Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.org
Reported-by: Andreas Hartmann <andihartmann@freenet.de>
Tested-by: Andreas Hartmann <andihartmann@freenet.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f331a859e0 upstream.
Enable a mechanism for devices to quirk that they do not behave when
doing a PCI bus reset. We require a modest level of spec compliant
behavior in order to do a reset, for instance the device should come
out of reset without throwing errors and PCI config space should be
accessible after reset. This is too much to ask for some devices.
Link: http://lkml.kernel.org/r/20140923210318.498dacbd@dualc.maya.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f7e7aee2f upstream.
Add pci_bus_clip_resource(). If a PCI-PCI bridge window overlaps an
upstream bridge window but is not completely contained by it, this clips
the downstream window so it fits inside the upstream one.
No functional change (this adds the function but no callers).
[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8505e729a2 upstream.
Add pci_claim_bridge_resource() to claim a PCI-PCI bridge window. This is
like regular pci_claim_resource(), except that if we fail to claim the
window, we check to see if we can reduce the size of the window and try
again.
This is for scenarios like this:
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff]
pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref]
pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref]
The 00:01.0 window is illegal: it starts before the host bridge window, so
we have to assume the [0xbdf00000-0xbfffffff] region is inaccessible. We
can make it legal by clipping it to [mem 0xc0000000-0xddefffff 64bit pref].
Previously we discarded the 00:01.0 window and tried to reassign that part
of the hierarchy from scratch. That is a problem because Linux doesn't
always assign things optimally. For example, in this case, BIOS put the
01:00.0 device in a prefetchable window below 4GB, but after 5b28541552,
Linux puts the prefetchable window above 4GB where the 32-bit 01:00.0
device can't use it.
Clipping the 00:01.0 window is less intrusive than completely reassigning
things and is sufficient to let us use most of the BIOS configuration. Of
course, it's possible that devices below 00:01.0 will no longer fit. If
that's the case, we'll have to reassign things. But that's a separate
problem.
[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f2f4dc456 upstream.
pci_setup_bridge_io(), pci_setup_bridge_mmio(), and
pci_setup_bridge_mmio_pref() program the windows of PCI-PCI bridges.
Previously they accepted a pointer to the pci_bus of the secondary bus,
then looked up the bridge leading to that bus. Pass the bridge directly,
which will make it more convenient for future callers.
No functional change.
[bhelgaas: changelog, split into separate patch]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 226e5ae9e5 upstream.
If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared
if the mutex debugging is enabled which introduces a race in our
mutex_is_locked_by() - i.e. we may inspect the old owner value before it
is acquired by the new task.
This is the root cause of this error:
# diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
# index 5cf6731..3ef3736 100644
# --- a/kernel/locking/mutex-debug.c
# +++ b/kernel/locking/mutex-debug.c
# @@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex *lock)
# DEBUG_LOCKS_WARN_ON(lock->owner != current);
#
# DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
# - mutex_clear_owner(lock);
# }
#
# /*
# * __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
# * mutexes so that we can do it here after we've verified state.
# */
# + mutex_clear_owner(lock);
# atomic_set(&lock->count, 1);
# }
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29187a9eea upstream.
A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.
This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value. This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.
Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function. need_to_create_worker() tests the following
conditions.
pending work items && !nr_running && !nr_idle
The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.
If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items. If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.
This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.
We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers. Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.
Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce75145267 upstream.
It is possible for ata_sff_flush_pio_task() to set ap->hsm_task_state to
HSM_ST_IDLE in between the time __ata_sff_port_intr() checks for HSM_ST_IDLE
and before it calls ata_sff_hsm_move() causing ata_sff_hsm_move() to BUG().
This problem is hard to reproduce making this patch hard to verify, but this
fix will prevent the race.
I have not been able to reproduce the problem, but here is a crash dump from
a 2.6.32 kernel.
On examining the ata port's state, its hsm_task_state field has a value of HSM_ST_IDLE:
crash> struct ata_port.hsm_task_state ffff881c1121c000
hsm_task_state = 0
Normally, this should not be possible as ata_sff_hsm_move() was called from ata_sff_host_intr(),
which checks hsm_task_state and won't call ata_sff_hsm_move() if it has a HSM_ST_IDLE value.
PID: 11053 TASK: ffff8816e846cae0 CPU: 0 COMMAND: "sshd"
#0 [ffff88008ba03960] machine_kexec at ffffffff81038f3b
#1 [ffff88008ba039c0] crash_kexec at ffffffff810c5d92
#2 [ffff88008ba03a90] oops_end at ffffffff8152b510
#3 [ffff88008ba03ac0] die at ffffffff81010e0b
#4 [ffff88008ba03af0] do_trap at ffffffff8152ad74
#5 [ffff88008ba03b50] do_invalid_op at ffffffff8100cf95
#6 [ffff88008ba03bf0] invalid_op at ffffffff8100bf9b
[exception RIP: ata_sff_hsm_move+317]
RIP: ffffffff813a77ad RSP: ffff88008ba03ca0 RFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff881c1121dc60 RCX: 0000000000000000
RDX: ffff881c1121dd10 RSI: ffff881c1121dc60 RDI: ffff881c1121c000
RBP: ffff88008ba03d00 R8: 0000000000000000 R9: 000000000000002e
R10: 000000000001003f R11: 000000000000009b R12: ffff881c1121c000
R13: 0000000000000000 R14: 0000000000000050 R15: ffff881c1121dd78
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff88008ba03d08] ata_sff_host_intr at ffffffff813a7fbd
#8 [ffff88008ba03d38] ata_sff_interrupt at ffffffff813a821e
#9 [ffff88008ba03d78] handle_IRQ_event at ffffffff810e6ec0
commit 72dd299d50 upstream.
Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
"Since commit 8a4aeec8d "libata/ahci: accommodate tag ordered
controllers" the access to the harddisk on the first SATA-port is
failing on its first access. The access to the harddisk on the
second port is working normal.
When reverting the above commit, access to both harddisks is working
fine again."
Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Ronny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b166010f6a upstream.
sd_set_power_mode() in derived module drivers/mmc/host/rtsx_usb_sdmmc.c
acquires dev_mutex and then calls pm_runtime_get_sync() to make sure the
device is awake while initializing a newly inserted card. Once it is
called during suspending state and explicitly before rtsx_usb_suspend()
acquires the same dev_mutex, both routine deadlock and further hang the
driver because pm_runtime_get_sync() waits the pending PM operations.
Fix this by using an empty suspend method. mmc_core always turns the
LED off after a request is done and thus it is ok to remove the only
rtsx_usb_turn_off_led() here.
Fixes: 730876be25 ("mfd: Add realtek USB card reader driver")
Signed-off-by: Roger Tseng <rogerable@realtek.com>
[Lee: Removed newly unused variable]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f29ae369a4 upstream.
If we don't tell regmap-irq that our first status
register is at offset 1, it will try to read offset
zero, which is the chipid register.
Fixes: 44b4dc6 mfd: tps65218: Add driver for the TPS65218 PMIC
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 773328da24 upstream.
STATUS register can be modified by the HW, so we
should bypass cache because of that.
In the case of INT[12] registers, they are the ones
that actually clear the IRQ source at the time they
are read. If we rely on the cache for them, we will
never be able to clear the interrupt, which will cause
our IRQ line to be disabled due to IRQ throttling.
Fixes: 44b4dc6 mfd: tps65218: Add driver for the TPS65218 PMIC
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db93facfb0 upstream.
This patch is to fix two deadlock cases.
Deadlock 1:
CPU #1
pinctrl_register-> pinctrl_get ->
create_pinctrl
(Holding lock pinctrl_maps_mutex)
-> get_pinctrl_dev_from_devname
(Trying to acquire lock pinctrldev_list_mutex)
CPU #0
pinctrl_unregister
(Holding lock pinctrldev_list_mutex)
-> pinctrl_put ->> pinctrl_free ->
pinctrl_dt_free_maps -> pinctrl_unregister_map
(Trying to acquire lock pinctrl_maps_mutex)
Simply to say
CPU#1 is holding lock A and trying to acquire lock B,
CPU#0 is holding lock B and trying to acquire lock A.
Deadlock 2:
CPU #3
pinctrl_register-> pinctrl_get ->
create_pinctrl
(Holding lock pinctrl_maps_mutex)
-> get_pinctrl_dev_from_devname
(Trying to acquire lock pinctrldev_list_mutex)
CPU #2
pinctrl_unregister
(Holding lock pctldev->mutex)
-> pinctrl_put ->> pinctrl_free ->
pinctrl_dt_free_maps -> pinctrl_unregister_map
(Trying to acquire lock pinctrl_maps_mutex)
CPU #0
tegra_gpio_request
(Holding lock pinctrldev_list_mutex)
-> pinctrl_get_device_gpio_range
(Trying to acquire lock pctldev->mutex)
Simply to say
CPU#3 is holding lock A and trying to acquire lock D,
CPU#2 is holding lock B and trying to acquire lock A,
CPU#0 is holding lock D and trying to acquire lock B.
Signed-off-by: Jim Lin <jilin@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cfda7fbeb upstream.
During the CAN FD standardization process within the ISO it turned out that
the failure detection capability has to be improved.
The CAN in Automation organization (CiA) defined the already implemented CAN
FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937
Finally there will be three types of CAN FD controllers in the future:
1. ISO compliant (fixed)
2. non-ISO compliant (fixed, like the M_CAN IP v3.0.1 in m_can.c)
3. ISO/non-ISO CAN FD controllers (switchable, like the PEAK USB FD)
So the current M_CAN driver for the M_CAN IP v3.0.1 has to expose its non-ISO
implementation by setting the CAN_CTRLMODE_FD_NON_ISO ctrlmode at startup.
As this bit cannot be switched at configuration time CAN_CTRLMODE_FD_NON_ISO
must not be set in ctrlmode_supported of the current M_CAN driver.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b1087aa5e upstream.
When changing flags in the CAN drivers ctrlmode the provided new content has to
be checked whether the bits are allowed to be changed. The bits that are to be
changed are given as a bitfield in cm->mask. Therefore checking against
cm->flags is wrong as the content can hold any kind of values.
The iproute2 tool sets the bits in cm->mask and cm->flags depending on the
detected command line options. To be robust against bogus user space
applications additionally sanitize the provided flags with the provided mask.
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f1241ed1a upstream.
pps_{lock,unlock}() call intel_display_power_{get,put}() outside
pps_mutes to avoid deadlocks with the power_domain mutex. In theory
during aux transfers we should usually have the relevant power domain
references already held by some higher level code, so this should not
result in much overhead (exception being userspace i2c-dev access).
However thanks to the check_power_well() calls in
intel_display_power_{get/put}() we end up doing a few Punit reads for
each aux transfer. Obviously doing this for each byte transferred via
i2c-over-aux is not a good idea.
I can't think of a good way to keep check_power_well() while eliminating
the overhead, so let's just remove check_power_well() entirely.
Fixes a driver init time regression introduced by:
commit 773538e860
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Thu Sep 4 14:54:56 2014 +0300
drm/i915: Reset power sequencer pipe tracking when disp2d is off
Credit goes to Jani for figuring this out.
v2: Add the regression note in the commit message.
Cc: Egbert Eich <eich@suse.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86201
Tested-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Jani: s/intel_runtime_pm.c/intel_pm.c/g and wiggle for 3.18]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4624386080 upstream.
While looking at hch's recent conversion to drop the MSG_*_TAG
definitions, I noticed a long standing bug in vhost-scsi where
the VIRTIO_SCSI_S_* attribute definitions where incorrectly
being passed directly into target_submit_cmd_map_sgls().
This patch adds the missing virtio-scsi to TCM/SAM task attribute
conversion.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 046ba64285 upstream.
This patch drops the arbitrary maximum I/O size limit in sbc_parse_cdb(),
which currently for fabric_max_sectors is hardcoded to 8192 (4 MB for 512
byte sector devices), and for hw_max_sectors is a backend driver dependent
value.
This limit is problematic because Linux initiators have only recently
started to honor block limits MAXIMUM TRANSFER LENGTH, and other non-Linux
based initiators (eg: MSFT Fibre Channel) can also generate I/Os larger
than 4 MB in size.
Currently when this happens, the following message will appear on the
target resulting in I/Os being returned with non recoverable status:
SCSI OP 28h with too big sectors 16384 exceeds fabric_max_sectors: 8192
Instead, drop both [fabric,hw]_max_sector checks in sbc_parse_cdb(),
and convert the existing hw_max_sectors into a purely informational
attribute used to represent the granuality that backend driver and/or
subsystem code is splitting I/Os upon.
Also, update FILEIO with an explicit FD_MAX_BYTES check in fd_execute_rw()
to deal with the one special iovec limitiation case.
v2 changes:
- Drop hw_max_sectors check in sbc_parse_cdb()
Reported-by: Lance Gropper <lance.gropper@qosserver.com>
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06bed7d18c upstream.
This commit fixes a race whereby nlmclnt_init() first starts the lockd
daemon, and then calls nlm_bind_host() with the expectation that
nlmsvc_timeout has already been initialised. Unfortunately, there is no
no synchronisation between lockd() and lockd_up() to guarantee that this
is the case.
Fix is to move the initialisation of nlmsvc_timeout into lockd_create_svc
Fixes: 9a1b6bf818 ("LOCKD: Don't call utsname()->nodename...")
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org # 3.10.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a16c5f99a2 upstream.
scripts/Makefile.clean treats absolute path specially, but
$(objtree)/debian is no longer an absolute path since 7e1c0477 (kbuild:
Use relative path for $(objtree). Work around this by checking if the
path starts with $(objtree)/.
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Fixes: 7e1c0477 (kbuild: Use relative path for $(objtree)
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b485342bd7 upstream.
Commit a074335a37 ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.
Before:
$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 D sys_call_table
0000000000000000 D syscall_table_size
After:
$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev
0000000000000000 R sys_call_table
0000000000000000 D syscall_table_size
Fixes: a074335a37 ("x86, um: Mark system call tables readonly")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f911d73105 upstream.
futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5c8afe5be upstream.
"origPtr" is used as an offset into the bd->dbuf[] array. That array is
allocated in start_bunzip() and has "bd->dbufSize" number of elements so
the test here should be >= instead of >.
Later we check "origPtr" again before using it as an offset so I don't
know if this bug can be triggered in real life.
Fixes: bc22c17e12 ('bzip2/lzma: library support for gzip, bzip2 and lzma decompression')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2eacc608b3 upstream.
The ad7991/ad7995/ad7999 does not have a configuration register like the
other devices that can be written and read. The configuration is written as
part of the conversion sequence.
Fixes: 0f7ddcc1bf ("iio:adc:ad799x: Write default config on probe and reset alert status on probe")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Tested-by: Mike Looijmans <mike.looijmans@topic.nl>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69d2626f97 upstream.
64KiB is allocated for qspi dtb partition which is not
sufficient, so updating the partition table size to 512KiB
for device tree partition.
This also aligns the QSPI partition definitions between
kernel and U-Boot.
Fixes: dc2dd5b8 ("ARM: dts: dra7: Add qspi device")
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0ddb319db upstream.
The sh73a0 INTC can't mask interrupts properly most likely due to a
hardware bug. Set the .control_parent flag to delegate masking to the
parent interrupt controller, like was already done for irqpin1.
Without this, accessing the three-axis digital accelerometer ADXL345
on kzm9g through /dev/input/event1 causes an interrupt storm, which
requires a power-cycle to recover from.
This was inspired by a patch for arch/arm/boot/dts/sh73a0.dtsi from
Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 341eb5465f ("ARM: shmobile: INTC External IRQ pin driver on sh73a0")
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5adba7c2da upstream.
There's no card detection for the eMMC, so this patch adds the missing
broken-cd property. This patch also sets bus width as 8 to add
MMC_CAP_8_BIT_DATA in the Host capabilities.
Fixes: 3047086dfd ("ARM: dts: berlin: enable SD card reader and eMMC for the BG2Q DMP")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4cf0935a2 upstream.
Correct returning IRQ_HANDLED unconditionally in the irq handler.
Return IRQ_NONE for some interrupt which we do not expect to be
handled in this handler. This prevents kernel stalling with back
to back spurious interrupts.
Fixes: 2722e56de6 ("OMAP4: l3: Introduce l3-interconnect error handling driver")
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61b43d4e91 upstream.
On certain SoCs such as AM437x SoC, L3_noc error registers are
maintained in power domain such as per domain which looses context as part
of low power state such as RTC+DDR mode. On these platforms when we
mask interrupts which we cannot handle, the source of these interrupts
still remain on resume, however, the flag mux registers now contain
their reset value (unmasked) - this breaks the system with infinite
interrupts since we do not these interrupts to take place ever again.
To handle this: restore the masking of interrupts which we have
already recorded in the system as ones we cannot handle.
Fixes: 2100b595b7 ("bus: omap_l3_noc: ignore masked out unclearable targets")
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 999f934de1 upstream.
If the boot loader enables HYP mode on the boot CPU, the secondary CPU
also needs to call into the ROM to switch to HYP mode before booting.
The firmwares on the omap5 and dra7xx unfortunately do not take care
of this, so it has to be handled by the kernel.
This patch is based on "[PATCH 2/2] ARM: OMAP5: Add HYP mode entry support
for secondary CPUs" by Santosh Shilimkar <santosh.shilimkar@ti.com>,
except this version does not require a compile time CONFIG to control
if it should enable HYP mode or not, it simply does it based on the mode
of the boot CPU, so it works whether the CPU boots in SVC or HYP mode,
and should even work as a guest kernel inside kvm if qemu decides to
support emulating the omap5 or dra7xx.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 572b24e6d8 upstream.
The switch statement of the possible list of SYSCLK1 frequencies is
missing a 0 in 4 out of the 7 frequencies.
Fixes: fa6d79d276 ("ARM: OMAP: Add initialisation for the real-time counter")
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Reviewed-by: Lokesh Vutla <lokeshvutla@ti.com>
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81ef447950 upstream.
The post dividers do not work on i.MX6Q rev T0 1.0 so they must be fixed
to 1. As the table index was wrong, a divider a of 4 could still be
requested which implied the clock not to be set properly. This is the
root cause of the HDMI not working at high resolution on rev T0 1.0 of
the SoC.
Signed-off-by: Gary Bisson <bisson.gary@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a9f0604bd upstream.
GPIO2_5 is the reset GPIO for the USB3317 ULPI PHY. Instead of modelling it as
a regulator, the correct approach is to use the 'reset_gpios' property of the
"usb-nop-xceiv" node.
GPIO1_7 is the reset GPIO for the USB2517 USB hub. As we currently don't have
dt bindings to describe a HUB reset, let's keep using the regulator approach.
Rename the regulator to 'reg_hub_reset' to better describe its function and bind
it with the USB host1 port instead.
USB host support has been introduced by commit 9bf206a9d1 ("ARM: dts:
imx51-babbage: Add USB Host1 support"), which landed in 3.16 and it seems that
USB has only been functional due to previous bootloader initialization.
With this patch applied we can get USB host to work without relying on the
bootloader.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a87e9cbc3 upstream.
From Documentation/devicetree/bindings/clock/imx25-clock.txt:
cspi1_ipg 78
cspi2_ipg 79
cspi3_ipg 80
, so fix the SPI1 clocks accordingly to avoid a kernel hang when trying to
access SPI1.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40d1746d2e upstream.
CONFIG_GENERIC_CPUFREQ_CPU0 disappeared with commit bbcf071969
("cpufreq: cpu0: rename driver and internals to 'cpufreq_dt'")
Use the renamed CONFIG_CPUFREQ_DT generic driver. It looks like with
v3.18-rc1, commit bbcf071969 and fdc509b15e came in via
different trees causing the resultant v3.18-rc1 to be non-functional for
cpufreq as default supported with omap2plus_defconfig.
Fixes: fdc509b15e ("ARM: omap2plus_defconfig: Add cpufreq to defconfig")
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 148e9a711e upstream.
On some laptops, keyboard needs to be reset in order to successfully detect
touchpad (e.g., some Gigabyte laptop models with Elantech touchpads).
Without resettin keyboard touchpad pretends to be completely dead.
Based on the original patch by Mateusz Jończyk this version has been
expanded to include DMI based detection & application of the fix
automatically on the affected models of laptops. This has been confirmed to
fix problem by three users already on three different models of laptops.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81331
Signed-off-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Tested-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com>
Tested by: Zakariya Dehlawi <zdehlawi@gmail.com>
Tested-by: Guillaum Bouchard <guillaum.bouchard@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 889b77f7fd upstream.
Flooding the Kvaser CAN to USB dongle with multiple reads and
writes in very high frequency (*), closing the CAN channel while
all the transmissions are on (#), opening the device again (@),
then sending a small number of packets would make the driver
enter an almost infinite loop of:
[....]
[15959.853988] kvaser_usb 4-3:1.0 can0: cannot find free context
[15959.853990] kvaser_usb 4-3:1.0 can0: cannot find free context
[15959.853991] kvaser_usb 4-3:1.0 can0: cannot find free context
[15959.853993] kvaser_usb 4-3:1.0 can0: cannot find free context
[15959.853994] kvaser_usb 4-3:1.0 can0: cannot find free context
[15959.853995] kvaser_usb 4-3:1.0 can0: cannot find free context
[....]
_dragging the whole system down_ in the process due to the
excessive logging output.
Initially, this has caused random panics in the kernel due to a
buggy error recovery path. That got fixed in an earlier commit.(%)
This patch aims at solving the root cause. -->
16 tx URBs and contexts are allocated per CAN channel per USB
device. Such URBs are protected by:
a) A simple atomic counter, up to a value of MAX_TX_URBS (16)
b) A flag in each URB context, stating if it's free
c) The fact that ndo_start_xmit calls are themselves protected
by the networking layers higher above
After grabbing one of the tx URBs, if the driver noticed that all
of them are now taken, it stops the netif transmission queue.
Such queue is worken up again only if an acknowedgment was received
from the firmware on one of our earlier-sent frames.
Meanwhile, upon channel close (#), the driver sends a CMD_STOP_CHIP
to the firmware, effectively closing all further communication. In
the high traffic case, the atomic counter remains at MAX_TX_URBS,
and all the URB contexts remain marked as active. While opening
the channel again (@), it cannot send any further frames since no
more free tx URB contexts are available.
Reset all tx URB contexts upon CAN channel close.
(*) 50 parallel instances of `cangen0 -g 0 -ix`
(#) `ifconfig can0 down`
(@) `ifconfig can0 up`
(%) "can: kvaser_usb: Don't free packets when tight on URBs"
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b442723fce upstream.
Flooding the Kvaser CAN to USB dongle with multiple reads and
writes in high frequency caused seemingly-random panics in the
kernel.
On further inspection, it seems the driver erroneously freed the
to-be-transmitted packet upon getting tight on URBs and returning
NETDEV_TX_BUSY, leading to invalid memory writes and double frees
at a later point in time.
Note:
Finding no more URBs/transmit-contexts and returning NETDEV_TX_BUSY
is a driver bug in and out of itself: it means that our start/stop
queue flow control is broken.
This patch only fixes the (buggy) error handling code; the root
cause shall be fixed in a later commit.
Acked-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68693b8ea4 upstream.
since the split of host+gadget mode in commit 74c2e93600 ("usb: musb:
factor out hcd initalization") we leak the usb_hcd struct. We call now
musb_host_cleanup() which does basically usb_remove_hcd() and also sets
the hcd variable to NULL. Doing so makes the finall call to
musb_host_free() basically a nop and the usb_hcd remains around for ever
without anowner.
This patch drops that NULL assignment for that reason.
Fixes: 74c2e93600 ("usb: musb: factor out hcd initalization")
Cc: Daniel Mack <zonque@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d89252a99 upstream.
Commit c3ee9b76aa (EHCI: improved logic for isochronous scheduling)
introduced the idea of using ehci->last_iso_frame as the origin (or
base) for the circular calculations involved in modifying the
isochronous schedule. However, the new code it added used
ehci->last_iso_frame before the value was properly initialized. This
patch rectifies the mistake by moving the initialization lines earlier
in iso_stream_schedule().
This fixes Bugzilla #72891.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: c3ee9b76aa
Reported-by: Joe Bryant <tenminjoe@yahoo.com>
Tested-by: Joe Bryant <tenminjoe@yahoo.com>
Tested-by: Martin Long <martin@longhome.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32a4bf2e81 upstream.
Use tty kref to release the fake tty in usb_console_setup to avoid use
after free if the underlying serial driver has acquired a reference.
Note that using the tty destructor release_one_tty requires some more
state to be initialised.
Fixes: 4a90f09b20 ("tty: usb-serial krefs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d269d4434c upstream.
The USB console currently allocates a temporary fake tty which is used
to pass terminal settings to the underlying serial driver.
The tty struct is not fully initialised, something which can lead to a
lockdep warning (or worse) if a serial driver tries to acquire a
line-discipline reference:
usbserial: USB Serial support registered for pl2303
pl2303 1-2.1:1.0: pl2303 converter detected
usb 1-2.1: pl2303 converter now attached to ttyUSB0
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 68 Comm: udevd Tainted: G W 3.18.0-rc5 #10
[<c0016f04>] (unwind_backtrace) from [<c0013978>] (show_stack+0x20/0x24)
[<c0013978>] (show_stack) from [<c0449794>] (dump_stack+0x24/0x28)
[<c0449794>] (dump_stack) from [<c006f730>] (__lock_acquire+0x1e50/0x2004)
[<c006f730>] (__lock_acquire) from [<c0070128>] (lock_acquire+0xe4/0x18c)
[<c0070128>] (lock_acquire) from [<c027c6f8>] (ldsem_down_read_trylock+0x78/0x90)
[<c027c6f8>] (ldsem_down_read_trylock) from [<c027a1cc>] (tty_ldisc_ref+0x24/0x58)
[<c027a1cc>] (tty_ldisc_ref) from [<c0340760>] (usb_serial_handle_dcd_change+0x48/0xe8)
[<c0340760>] (usb_serial_handle_dcd_change) from [<bf000484>] (pl2303_read_int_callback+0x210/0x220 [pl2303])
[<bf000484>] (pl2303_read_int_callback [pl2303]) from [<c031624c>] (__usb_hcd_giveback_urb+0x80/0x140)
[<c031624c>] (__usb_hcd_giveback_urb) from [<c0316fc0>] (usb_giveback_urb_bh+0x98/0xd4)
[<c0316fc0>] (usb_giveback_urb_bh) from [<c0042e44>] (tasklet_hi_action+0x9c/0x108)
[<c0042e44>] (tasklet_hi_action) from [<c0042380>] (__do_softirq+0x148/0x42c)
[<c0042380>] (__do_softirq) from [<c00429cc>] (irq_exit+0xd8/0x114)
[<c00429cc>] (irq_exit) from [<c007ae58>] (__handle_domain_irq+0x84/0xdc)
[<c007ae58>] (__handle_domain_irq) from [<c000879c>] (omap_intc_handle_irq+0xd8/0xe0)
[<c000879c>] (omap_intc_handle_irq) from [<c0014544>] (__irq_svc+0x44/0x7c)
Exception stack(0xdf4e7f08 to 0xdf4e7f50)
7f00: debc0b80 df4e7f5c 00000000 00000000 debc0b80 be8da96c
7f20: 00000000 00000128 c000fc84 df4e6000 00000000 df4e7f94 00000004 df4e7f50
7f40: c038ebc0 c038d74c 600f0013 ffffffff
[<c0014544>] (__irq_svc) from [<c038d74c>] (___sys_sendmsg.part.29+0x0/0x2e0)
[<c038d74c>] (___sys_sendmsg.part.29) from [<c038ec08>] (SyS_sendmsg+0x18/0x1c)
[<c038ec08>] (SyS_sendmsg) from [<c000fa00>] (ret_fast_syscall+0x0/0x48)
console [ttyUSB0] enabled
Fixes: 36697529b5 ("tty: Replace ldisc locking with ldisc_sem")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fb694f96e upstream.
When unloading the module 'g_hid.ko', the urb request will be dequeued and the
completion routine will be excuted. If there is no urb packet, the urb request
will not be added to the endpoint queue and the completion routine pointer in
urb request is NULL.
Accessing to this NULL function pointer will cause the Oops issue reported
below.
Add the code to check if the urb request is in the endpoint queue
or not. If the urb request is not in the endpoint queue, a negative
error code will be returned.
Here is the Oops log:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = dedf0000
[00000000] *pgd=3ede5831, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] ARM
Modules linked in: g_hid(-) usb_f_hid libcomposite
CPU: 0 PID: 923 Comm: rmmod Not tainted 3.18.0+ #2
Hardware name: Atmel SAMA5 (Device Tree)
task: df6b1100 ti: dedf6000 task.ti: dedf6000
PC is at 0x0
LR is at usb_gadget_giveback_request+0xc/0x10
pc : [<00000000>] lr : [<c02ace88>] psr: 60000093
sp : dedf7eb0 ip : df572634 fp : 00000000
r10: 00000000 r9 : df52e210 r8 : 60000013
r7 : df6a9858 r6 : df52e210 r5 : df6a9858 r4 : df572600
r3 : 00000000 r2 : ffffff98 r1 : df572600 r0 : df6a9868
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 3edf0059 DAC: 00000015
Process rmmod (pid: 923, stack limit = 0xdedf6230)
Stack: (0xdedf7eb0 to 0xdedf8000)
7ea0: 00000000 c02adbbc df572580 deced608
7ec0: df572600 df6a9868 df572634 c02aed3c df577c00 c01b8608 00000000 df6be27c
7ee0: 00200200 00100100 bf0162f4 c000e544 dedf6000 00000000 00000000 bf010c00
7f00: bf0162cc bf00159c 00000000 df572980 df52e218 00000001 df5729b8 bf0031d0
[..]
[<c02ace88>] (usb_gadget_giveback_request) from [<c02adbbc>] (request_complete+0x64/0x88)
[<c02adbbc>] (request_complete) from [<c02aed3c>] (usba_ep_dequeue+0x70/0x128)
[<c02aed3c>] (usba_ep_dequeue) from [<bf010c00>] (hidg_unbind+0x50/0x7c [usb_f_hid])
[<bf010c00>] (hidg_unbind [usb_f_hid]) from [<bf00159c>] (remove_config.isra.6+0x98/0x9c [libcomposite])
[<bf00159c>] (remove_config.isra.6 [libcomposite]) from [<bf0031d0>] (__composite_unbind+0x34/0x98 [libcomposite])
[<bf0031d0>] (__composite_unbind [libcomposite]) from [<c02acee0>] (usb_gadget_remove_driver+0x50/0x78)
[<c02acee0>] (usb_gadget_remove_driver) from [<c02ad570>] (usb_gadget_unregister_driver+0x64/0x94)
[<c02ad570>] (usb_gadget_unregister_driver) from [<bf0160c0>] (hidg_cleanup+0x10/0x34 [g_hid])
[<bf0160c0>] (hidg_cleanup [g_hid]) from [<c0056748>] (SyS_delete_module+0x118/0x19c)
[<c0056748>] (SyS_delete_module) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
Code: bad PC value
Signed-off-by: Songjun Wu <songjun.wu@atmel.com>
[nicolas.ferre@atmel.com: reworked the commit message]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Fixes: 914a3f3b37 ("USB: add atmel_usba_udc driver")
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6785a10344 upstream.
When receive data, the RXRDY in status register set by hardware
after a new packet has been stored in the endpoint FIFO. When it
is copied from FIFO, this bit is cleared which make the FIFO can
be accessed again.
In the receive_data() function, this bit RXRDY has been cleared.
So, after the receive_data() function return, this bit should
not be cleared again, or else it may cause the accessing FIFO
corrupt, which will make the data loss.
Fixes: 914a3f3b37 (USB: add atmel_usba_udc driver)
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f40afdddeb upstream.
According to the datasheet, when transfer using DMA, the control
setting for IN packet only need END_BUF_EN, END_BUF_IE, CH_EN,
while for OUT packet, need more two bits END_TR_EN and END_TR_IE
to be configured.
Fixes: 914a3f3b37 (USB: add atmel_usba_udc driver)
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b44be2462d upstream.
Commit 3b74c73f8d switched over to memdup_user()
in ep_write() function and removed kfree (kbuf).
memdup_user() function allocates memory which is never freed.
Fixes: 3b74c73 (usb: gadget: inode: switch over to memdup_user())
Signed-off-by: Mario Schuknecht <mario.schuknecht@dresearch-fe.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5122236bb upstream.
Fix null-pointer dereference during probe if the interface-status
completion handler is called before the individual ports have been set
up.
Fixes: f79b2d0fe8 ("USB: keyspan: fix NULL-pointer dereferences and
memory leaks")
Reported-by: Richard <richjunk@pacbell.net>
Tested-by: Richard <richjunk@pacbell.net>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d80c0d1418 upstream.
As has been discussed in the thread starting with
https://lkml.kernel.org/g/549748e9.d+SiJzqu50f1r4lSAL043YSc@arcor.de
Sierra Wireless MC73xx devices with USB VID/PID 0x1199:0x68c0 require the
option_send_setup() code to be used on the USB interface for the AT port
to make unsolicited response codes work correctly. Move these devices from
the qcserial driver where they have been added by commit
70a3615fc0 ("usb: qcserial: add Sierra Wireless
MC73xx") to the option driver and add a MC73xx-specific blacklist
to ensure that
1. the sendsetup code is not used for the DIAG/DM and NMEA interfaces
2. the option driver does not attach to the QMI/network interfaces
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90441b4dbe upstream.
Fixing typo for MeshConnect IDs. The original PID (0x8875) is not in
production and is not needed. Instead it has been changed to the
official production PID (0x8857).
Signed-off-by: Preston Fick <pffick@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39e60635a0 upstream.
DWC3 gadget sets up a pool of 32 TRBs for each EP during initialization. This
means, the max TRBs that can be submitted for an EP is fixed to 32. Since the
request queue for an EP is a linked list, any number of requests can be queued
to it by the gadget layer. However, the dwc3 driver must not submit TRBs more
than the pool it has created for. This limit wasn't respected when SG was used
resulting in submitting more than the max TRBs, eventually leading to
non-transfer of the TRBs submitted over the max limit.
Root cause:
When SG is used, there are two loops iterating to prepare TRBs:
- Outer loop over the request_list
- Inner loop over the SG list
The code was missing break to get out of the outer loop.
Fixes: eeb720fb21 (usb: dwc3: gadget: add support for SG lists)
Signed-off-by: Amit Virdi <amit.virdi@st.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec512fb8e5 upstream.
When scatter gather (SG) is used, multiple TRBs are prepared from one DWC3
request (dwc3_request). So while preparing TRBs, the 'last' flag should be set
only when it is the last TRB being prepared from the last dwc3_request entry.
The current implementation uses list_is_last to check if the dwc3_request is the
last entry from the request_list. However, list_is_last returns false for the
last entry too. This is because, while preparing the first TRB from a request,
the function dwc3_prepare_one_trb modifies the request's next and prev pointers
while moving the URB to req_queued. Hence, list_is_last always returns false no
matter what.
The correct way is not to access the modified pointers of dwc3_request but to
use list_empty macro instead.
Fixes: e5ba5ec833 (usb: dwc3: gadget: fix scatter gather implementation)
Signed-off-by: Amit Virdi <amit.virdi@st.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56abcab833 upstream.
Commit 8dccddbc23 ("OHCI: final fix for NVIDIA problems (I hope)")
introduced into 3.1.9 broke boot on e.g. Freescale P2020DS development
board. The code path that was previously specific to NVIDIA controllers
had then become taken for all chips.
However, the M5237 installed on the board wedges solid when accessing
its base+OHCI_FMINTERVAL register, making it impossible to boot any
kernel newer than 3.1.8 on this particular and apparently other similar
machines.
Don't readl() and writel() base+OHCI_FMINTERVAL on PCI ID 10b9:5237.
The patch is suitable for the -next tree as well as all maintained
kernels up to 3.2 inclusive.
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0915e6feb3 upstream.
The gpio device attributes were never destroyed when the gpio was
unexported (or on export failures).
Use device_create_with_groups() to create the default device attributes
of the gpio class device. Note that this also fixes the
attribute-creation race with userspace for these attributes.
Remove contingent attributes in export error path and on unexport.
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 121b6a7995 upstream.
The gpio-chip device attributes were never destroyed when the device was
removed.
Fix by using device_create_with_groups() to create the device attributes
of the chip class device.
Note that this also fixes the attribute-creation race with userspace.
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6798acaa01 upstream.
Move direct and indirect calls to gpiochip_remove_pin_ranges outside of
spin lock as they can end up taking a mutex in pinctrl_remove_gpio_range.
Note that the pin ranges are already added outside of the lock.
Fixes: 9ef0d6f762 ("gpiolib: call pin removal in chip removal function")
Fixes: f23f1516b6 ("gpiolib: provide provision to register pin ranges")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00acc3dc24 upstream.
Fix memory leak and sleep-while-atomic in gpiochip_remove.
The memory leak was introduced by afa82fab5e ("gpio / ACPI: Move event
handling registration to gpiolib irqchip helpers") that moved the
release of acpi interrupt resources to gpiochip_irqchip_remove, but by
then the resources are no longer accessible as the acpi_gpio_chip has
already been freed by acpi_gpiochip_remove.
Note that this also fixes a few potential sleep-while-atomics, which has
been around since 1425052097 ("gpio: add IRQ chip helpers in gpiolib")
when the call to gpiochip_irqchip_remove while holding a spinlock was
added (a couple of irq-domain paths can end up grabbing mutexes).
Fixes: afa82fab5e ("gpio / ACPI: Move event handling registration to
gpiolib irqchip helpers")
Fixes: 1425052097 ("gpio: add IRQ chip helpers in gpiolib")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5539b3c938 upstream.
Memory allocated and references taken by of_gpiochip_add and
acpi_gpiochip_add were never released on errors in gpiochip_add (e.g.
failure to find free gpio range).
Fixes: 391c970c0d ("of/gpio: add default of_xlate function if device
has a node pointer")
Fixes: 664e3e5ac6 ("gpio / ACPI: register to ACPI events
automatically")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b8792bbdf upstream.
of_get_named_gpiod_flags fails with -EPROBE_DEFER in cases
where the gpio chip is available and the GPIO translation fails.
This causes drivers to be re-probed erroneusly, and hides the
real problem(i.e. the GPIO number being out of range).
Signed-off-by: Hans Holmberg <hans.holmberg@intel.com>
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41f632fe17 upstream.
Remove bogus call to of_gpiochip_add (and of_gpio_chip remove in error
path) which is also called when adding the gpio chip.
This prevents adding the same pinctrl range twice.
Fixes: 3f8c50c9b1 ("OF: pinctrl: MIPS: lantiq: implement lantiq/xway
pinctrl support")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5138d5c562 upstream.
The gpio4 and gpio5 are in 0xf7fc0000 apb which is located in the SM domain.
This patch moves gpio4 and gpio5 to the correct location. This patch also
renames them as the following to match the names we internally used in
marvell:
gpio4 -> sm_gpio1
gpio5 -> sm_gpio0
porte -> portf
portf -> porte
This also matches what we did for BG2 and BG2CD's SM GPIO.
Fixes: cedf57fc4f ("ARM: dts: berlin: add the BG2Q GPIO nodes")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ca8c71742 upstream.
Just like all previous UAS capable Seagate disk enclosures, these need the
US_FL_NO_ATA_1X to not crash when udev probes them.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6fa3945c8 upstream.
Like the JMicron JMS567 enclosures with the JMS566 choke on report-opcodes,
so avoid it.
Tested-and-reported-by: Takeo Nakayama <javhera@gmx.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b13a65ef19 upstream.
H_RST bit in H_CSR register may be found lit before reset is started,
for example if preceding reset flow hasn't completed.
In that case asserting H_RST will be ignored, therefore we need to clean
H_RST bit to start a successful reset sequence.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fc0703af3 upstream.
Currently, our trunking code will check for session trunking, but will
fail to detect client id trunking. This is a problem, because it means
that the client will fail to recognise that the two connections represent
shared state, even if they do not permit a shared session.
By removing the check for the server minor id, and only checking the
major id, we will end up doing the right thing in both cases: we close
down the new nfs_client and fall back to using the existing one.
Fixes: 05f4c350ee ("NFS: Discover NFSv4 server trunking when mounting")
Cc: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7485058eea upstream.
Using just the filter for checking for trampolines or regs is not enough
when updating the code against the records that represent all functions.
Both the filter hash and the notrace hash need to be checked.
To trigger this bug (using trace-cmd and perf):
# perf probe -a do_fork
# trace-cmd start -B foo -e probe
# trace-cmd record -p function_graph -n do_fork sleep 1
The trace-cmd record at the end clears the filter before it disables
function_graph tracing and then that causes the accounting of the
ftrace function records to become incorrect and causes ftrace to bug.
Link: http://lkml.kernel.org/r/20150114154329.358378039@goodmis.org
[ still need to switch old_hash_ops to old_ops_hash ]
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f86f83709 upstream.
As the set_ftrace_filter affects both the function tracer as well as the
function graph tracer, the ops that represent each have a shared
ftrace_ops_hash structure. This allows both to be updated when the filter
files are updated.
But if function graph is enabled and the global_ops (function tracing) ops
is not, then it is possible that the filter could be changed without the
update happening for the function graph ops. This will cause the changes
to not take place and may even cause a ftrace_bug to occur as it could mess
with the trampoline accounting.
The solution is to check if the ops uses the shared global_ops filter and
if the ops itself is not enabled, to check if there's another ops that is
enabled and also shares the global_ops filter. In that case, the
modification still needs to be executed.
Link: http://lkml.kernel.org/r/20150114154329.055980438@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 237d28db03 upstream.
If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.
# modprobe jprobe_example.ko
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# ls
The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)
The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.
For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.
If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.
To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.
Some other updates:
Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).
Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.
Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c86ac2c50 upstream.
This patch fixes a NULL pointer dereference on led_dat->mode_val. Due to
this bug, a kernel oops can be observed at probe time on the LaCie 2Big
and 5Big v2 boards:
Unable to handle kernel NULL pointer dereference at virtual address 00000008
[...]
[<c03f244c>] (netxbig_led_probe) from [<c02c8c6c>] (platform_drv_probe+0x4c/0x9c)
[<c02c8c6c>] (platform_drv_probe) from [<c02c72d0>] (driver_probe_device+0x98/0x25c)
[<c02c72d0>] (driver_probe_device) from [<c02c7520>] (__driver_attach+0x8c/0x90)
[<c02c7520>] (__driver_attach) from [<c02c5c24>] (bus_for_each_dev+0x68/0x94)
[<c02c5c24>] (bus_for_each_dev) from [<c02c6408>] (bus_add_driver+0x124/0x1dc)
[<c02c6408>] (bus_add_driver) from [<c02c7ac0>] (driver_register+0x78/0xf8)
[<c02c7ac0>] (driver_register) from [<c000888c>] (do_one_initcall+0x80/0x1cc)
[<c000888c>] (do_one_initcall) from [<c0733618>] (kernel_init_freeable+0xe4/0x1b4)
[<c0733618>] (kernel_init_freeable) from [<c058db9c>] (kernel_init+0xc/0xec)
[<c058db9c>] (kernel_init) from [<c0009850>] (ret_from_fork+0x14/0x24)
[...]
This bug was introduced by commit 588a6a9928
("leds: netxbig: fix attribute-creation race").
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Acked-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Bryan Wu <cooloney@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25906052d9 upstream.
Since ALE table is a common resource for both the interfaces in Dual EMAC
mode and while bringing up the second interface in cpsw_ndo_set_rx_mode()
all the multicast entries added by the first interface is flushed out and
only second interface multicast addresses are added. Fixing this by
flushing multicast addresses based on dual EMAC port vlans which will not
affect the other emac port multicast addresses.
Fixes: d9ba8f9 (driver: net: ethernet: cpsw: dual emac interface implementation)
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 776d4e9f5c upstream.
Adds FCoE config option I40E_FCOE, so that FCoE can be enabled
as needed but otherwise have it disabled by default.
This also eliminate multiple FCoE config checks, instead now just
one config check for CONFIG_I40E_FCOE.
The I40E FCoE was added with 3.17 kernel and therefore this patch
shall be applied to stable 3.17 kernel also.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c93edc6393 upstream.
commit 5c90422439
"iwlwifi: mvm: don't allow diversity if BT Coex / TT forbid it"
broke Rx with 2 chains for diversity.
This had an impact on throughput where we're using only a single
stream (11a/b/g APs, single stream APs, static SMPS).
Fixes: 5c90422439 ("iwlwifi: mvm: don't allow diversity if BT Coex / TT forbid it")
Signed-off-by: Eyal Shapira <eyalx.shapira@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0145058c3d upstream.
This patch partially reverts commit 421520ba98
(only the arm64 part). There is no guarantee that the boot-loader places other
images like dtb in a different page than initrd start/end, especially when the
kernel is built with 64KB pages. When this happens, such pages must not be
freed. The free_reserved_area() already takes care of rounding up "start" and
rounding down "end" to avoid freeing partially used pages.
Reported-by: Peter Maydell <Peter.Maydell@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3cbc6123a9 upstream.
Host controllers lacking the required internal vmmc regulator may still
follow the spec with regard to the LSB of SDHCI_POWER_CONTROL. Set the
SDHCI_POWER_ON bit when vmmc is enabled to encourage the controller to
to drive CMD, DAT, SDCLK.
This fixes a regression observed on some Qualcomm and Nvidia boards
caused by 5222161 mmc: sdhci: Improve external VDD regulator support.
Fixes: 52221610dd (mmc: sdhci: Improve external VDD regulator support)
Signed-off-by: Tim Kryger <tim.kryger@gmail.com>
Tested-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bfe5fda8e7 upstream.
Patch c49f63530b ("powernv: Add OPAL tracepoints") has a spurious
store to the stack:
ld r12,opal_tracepoint_refcount@toc(r2); \
std r12,32(r1); \
The store was originally used to save the current tracepoint status
so the entry and the exit tracepoints were always balanced. In the
end I just created a separate path when tracepoints are enabled.
The offset on the stack used for this store is not valid for ABIv2
and it causes strange issues. I noticed it because OPAL console input
was broken.
Fixes: c49f63530b ("powernv: Add OPAL tracepoints")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c2e211f3c upstream.
Current vfio-pci just supports normal pci device, so vfio_pci_probe() will
return if the pci device is not a normal device. While current code makes a
mistake. PCI_HEADER_TYPE is the offset in configuration space of the device
type, but we use this value to mask the type value.
This patch fixs this by do the check directly on the pci_dev->hdr_type.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb9ff07886 upstream.
An error was returned if composing was not supported, instead of if
cropping was not supported.
A classic copy-and-paste bug. Found with v4l2-compliance.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac03086067 upstream.
The end timer is used for switching back from repeat code timings when
no repeat codes have been received for a certain amount of time. When
the protocol is changed, the end timer is deleted synchronously with
del_timer_sync(), however this takes place while holding the main spin
lock, and the timer handler also needs to acquire the spin lock.
This opens the possibility of a deadlock on an SMP system if the
protocol is changed just as the repeat timer is expiring. One CPU could
end up in img_ir_set_decoder() holding the lock and waiting for the end
timer to complete, while the other CPU is stuck in the timer handler
spinning on the lock held by the first CPU.
Lockdep also spots a possible lock inversion in the same code, since
img_ir_set_decoder() acquires the img-ir lock before the timer lock, but
the timer handler will try and acquire them the other way around:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
3.18.0-rc5+ #957 Not tainted
---------------------------------------------------------
swapper/0/0 just changed the state of lock:
(((&hw->end_timer))){+.-...}, at: [<4006ae5c>] _call_timer_fn+0x0/0xfc
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&(&priv->lock)->rlock#2){-.....}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(((&hw->end_timer)));
local_irq_disable();
lock(&(&priv->lock)->rlock#2);
lock(((&hw->end_timer)));
<Interrupt>
lock(&(&priv->lock)->rlock#2);
*** DEADLOCK ***
This is fixed by releasing the main spin lock while performing the
del_timer_sync() call. The timer is prevented from restarting before the
lock is reacquired by a new "stopping" flag which img_ir_handle_data()
checks before updating the timer.
---------------------------------------------------------
swapper/0/0 just changed the state of lock:
(((&hw->end_timer))){+.-...}, at: [<4006ae5c>] _call_timer_fn+0x0/0xfc
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&(&priv->lock)->rlock#2){-.....}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(((&hw->end_timer)));
local_irq_disable();
lock(&(&priv->lock)->rlock#2);
lock(((&hw->end_timer)));
<Interrupt>
lock(&(&priv->lock)->rlock#2);
*** DEADLOCK ***
This is fixed by releasing the main spin lock while performing the
del_timer_sync() call. The timer is prevented from restarting before the
lock is reacquired by a new "stopping" flag which img_ir_handle_data()
checks before updating the timer.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Sifan Naeem <sifan.naeem@imgtec.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea0de4ec54 upstream.
A problem was found on Polaris where if the unit it booted via the power
button on the infrared remote then the next button press on the remote
would return the key code used to power on the unit.
The sequence is:
- The polaris powered off but with the powerdown controller (PDC) block
still powered.
- Press power key on remote, IR block receives the key.
- Kernel starts, IR code is in IMG_IR_DATA_x but neither IMG_IR_RXDVAL
or IMG_IR_RXDVALD2 are set.
- Wait any amount of time.
- Press any key.
- IMG_IR_RXDVAL or IMG_IR_RXDVALD2 is set but IMG_IR_DATA_x is
unchanged since the powerup key data was never read.
This is worked around by always reading the IMG_IR_DATA_x in
img_ir_set_decoder(), rather than only when the IMG_IR_RXDVAL or
IMG_IR_RXDVALD2 bit is set.
Signed-off-by: Dylan Rajaratnam <dylan.rajaratnam@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 678fa12fb8 upstream.
The au0828 quirks table is currently not in sync with the au0828
media driver.
Syncronize it and put them on the same order as found at au0828
driver, as all the au0828 devices with analog TV need the
same quirks.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d1f00a20d upstream.
Add a macro to simplify au0828 quirk table. That makes easier
to check it against the USB IDs at drivers/media/usb/au0828/au0828-cards.c.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2279948735 upstream.
This patches fixes an ancient bug in the dvb_usb_af9005 driver, which
has been reported at least in the following threads:
https://lkml.org/lkml/2009/2/4/350https://lkml.org/lkml/2014/9/18/558
If the driver is compiled in without any IR support (neither
DVB_USB_AF9005_REMOTE nor custom symbols), the symbol_request calls in
af9005_usb_module_init() return pointers != NULL although the IR
symbols are not available.
This leads to the following oops:
...
[ 8.529751] usbcore: registered new interface driver dvb_usb_af9005
[ 8.531584] BUG: unable to handle kernel paging request at 02e00000
[ 8.533385] IP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d
[ 8.535613] *pde = 00000000
[ 8.536416] Oops: 0000 [#1] PREEMPT PREEMPT DEBUG_PAGEALLOCDEBUG_PAGEALLOC
[ 8.537863] CPU: 0 PID: 1 Comm: swapper Not tainted 3.15.0-rc6-00151-ga5c075c #1
[ 8.539827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 8.541519] task: 89c9a670 ti: 89c9c000 task.ti: 89c9c000
[ 8.541519] EIP: 0060:[<7d9d67c6>] EFLAGS: 00010206 CPU: 0
[ 8.541519] EIP is at af9005_usb_module_init+0x6b/0x9d
[ 8.541519] EAX: 02e00000 EBX: 00000000 ECX: 00000006 EDX: 00000000
[ 8.541519] ESI: 00000000 EDI: 7da33ec8 EBP: 89c9df30 ESP: 89c9df2c
[ 8.541519] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 8.541519] CR0: 8005003b CR2: 02e00000 CR3: 05a54000 CR4: 00000690
[ 8.541519] Stack:
[ 8.541519] 7d9d675b 89c9df90 7d992a49 7d7d5914 89c9df4c 7be3a800 7d08c58c 8a4c3968
[ 8.541519] 89c9df80 7be3a966 00000192 00000006 00000006 7d7d3ff4 8a4c397a 00000200
[ 8.541519] 7d6b1280 8a4c3979 00000006 000009a6 7da32db8 b13eec81 00000006 000009a6
[ 8.541519] Call Trace:
[ 8.541519] [<7d9d675b>] ? ttusb2_driver_init+0x16/0x16
[ 8.541519] [<7d992a49>] do_one_initcall+0x77/0x106
[ 8.541519] [<7be3a800>] ? parameqn+0x2/0x35
[ 8.541519] [<7be3a966>] ? parse_args+0x113/0x25c
[ 8.541519] [<7d992bc2>] kernel_init_freeable+0xea/0x167
[ 8.541519] [<7cf01070>] kernel_init+0x8/0xb8
[ 8.541519] [<7cf27ec0>] ret_from_kernel_thread+0x20/0x30
[ 8.541519] [<7cf01068>] ? rest_init+0x10c/0x10c
[ 8.541519] Code: 08 c2 c7 05 44 ed f9 7d 00 00 e0 02 c7 05 40 ed f9 7d 00 00 e0 02 c7 05 3c ed f9 7d 00 00 e0 02 75 1f b8 00 00 e0 02 85 c0 74 16 <a1> 00 00 e0 02 c7 05 54 84 8e 7d 00 00 e0 02 a3 58 84 8e 7d eb
[ 8.541519] EIP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d SS:ESP 0068:89c9df2c
[ 8.541519] CR2: 0000000002e00000
[ 8.541519] ---[ end trace 768b6faf51370fc7 ]---
The prefered fix would be to convert the whole IR code to use the kernel IR
infrastructure (which wasn't available at the time this driver had been created).
Until anyone who still has this old hardware steps up an does the conversion,
fix it by not calling the symbol_request calls if the driver is compiled in
without the default IR symbols (CONFIG_DVB_USB_AF9005_REMOTE).
Due to the IR related pointers beeing NULL by default, IR support will then be disabled.
The downside of this solution is, that it will no longer be possible to
compile custom IR symbols (not using CONFIG_DVB_USB_AF9005_REMOTE) in.
Please note that this patch has NOT been tested with all possible cases.
I don't have the hardware and could only verify that it fixes the reported
bug.
Reported-by: Fengguag Wu <fengguang.wu@intel.com>
Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Luca Olivetti <luca@ventoso.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30ea9c5218 upstream.
fb_deferred_io_fsync() returns the value of schedule_delayed_work() as
an error code, but schedule_delayed_work() does not return an error. It
returns true/false depending on whether the work was already queued.
Fix this by ignoring the return value of schedule_delayed_work().
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92b004d1aa upstream.
If the probe of an fb driver has been deferred due to missing
dependencies, and the probe is later ran when a module is loaded, the
fbdev framework will try to find a logo to use.
However, the logos are __initdata, and have already been freed. This
causes sometimes page faults, if the logo memory is not mapped,
sometimes other random crashes as the logo data is invalid, and
sometimes nothing, if the fbdev decides to reject the logo (e.g. the
random value depicting the logo's height is too big).
This patch adds a late_initcall function to mark the logos as freed. In
reality the logos are freed later, and fbdev probe may be ran between
this late_initcall and the freeing of the logos. In that case we will
miss drawing the logo, even if it would be possible.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ce67a38f7 upstream.
The CPSW IP implements pulse-signaled interrupts. Due to
that we must write a correct, pre-defined value to the
CPDMA_MACEOIVECTOR register so the controller generates
a pulse on the correct IRQ line to signal the End Of
Interrupt.
The way the driver is written today, all four IRQ lines
are requested using the same IRQ handler and, because of
that, we could fall into situations where a TX IRQ fires
but we tell the controller that we ended an RX IRQ (or
vice-versa). This situation triggers an IRQ storm on the
reserved IRQ 127 of INTC which will in turn call ack_bad_irq()
which will, then, print a ton of:
unexpected IRQ trap at vector 00
In order to fix the problem, we are moving all calls to
cpdma_ctlr_eoi() inside the IRQ handler and making sure
we *always* write the correct value to the CPDMA_MACEOIVECTOR
register. Note that the algorithm assumes that IRQ numbers and
value-to-be-written-to-EOI are proportional, meaning that a
write of value 0 would trigger an EOI pulse for the RX_THRESHOLD
Interrupt and that's the IRQ number sitting in the 0-th index
of our irqs_table array.
This, however, is safe at least for current implementations of
CPSW so we will refrain from making the check smarter (and, as
a side-effect, slower) until we actually have a platform where
IRQ lines are swapped.
This patch has been tested for several days with AM335x- and
AM437x-based platforms. AM57x was left out because there are
still pending patches to enable ethernet in mainline for that
platform. A read of the TRM confirms the statement on previous
paragraph.
Reported-by: Yegor Yefremov <yegorslists@googlemail.com>
Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly)
Signed-off-by: Felipe Balbi <balbi@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e86fb5e8ab upstream.
When ring buffer returns an error indicating retry, storvsc may not
return a proper error code to SCSI when bounce buffer is not used.
This has introduced I/O freeze on RAID running atop storvsc devices.
This patch fixes it by always returning a proper error code.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68ed7e1c3d upstream.
This is a partial revert of 2f2dafe (serial: serial_core.c: printk
replacement) which gets us booting again. The real problem seems to be
the _emit path in early boot. However, until we can root cause it, we
need at least to get boot working.
Fixes: 2f2dafe77d
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45db07382a upstream.
The __ldcw macro has a problem when its argument needs to be reloaded from
memory. The output memory operand and the input register operand both need to
be reloaded using a register in class R1_REGS when generating 64-bit code.
This fails because there's only a single register in the class. Instead, use a
memory clobber. This also makes the __ldcw macro a compiler memory barrier.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5164bece16 upstream.
In bio-based DM's clone_endio(), when target_type doesn't implement
.end_io (e.g. linear) r will be always be initialized 0. So if a
WRITE SAME bio fails WRITE SAME will not be disabled as intended.
Fix this by initializing r to error, rather than 0, in clone_endio().
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: 7eee4ae2db ("dm: disable WRITE SAME if it fails")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7426977c8e upstream.
The comma after --no-includes makes coccinelle to not run the script:
/usr/bin/spatch -D report --very-quiet --no-show-diff --cocci-file ./scripts/coccinelle/misc/bugon.cocci --no-includes, --include-headers --patch . --dir drivers/media/platform/coda/ -I ./arch/x86/include -I arch/x86/include/generated -I include -I ./arch/x86/include/uapi -I arch/x86/include/generated/uapi -I ./include/uapi -I include/generated/uapi -I ./include/linux/kconfig.h
Usage: spatch.opt --sp-file <SP> <infile> [-o <outfile>] [--iso-file <iso>] [options]
Options are:
--sp-file <file> the semantic patch file
-o <file> the output file
--in-place do the modification on the file directly
--backup-suffix suffix to use when making a backup for inplace
...
At least with Fedora 20 coccinelle package:
coccinelle-1.0.0-0.rc20.1.fc21.x86_64
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Tested-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: 5be1df66 (Coccinelle: Script to replace if and BUG with BUG_ON)
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96ed6046d3 upstream.
On BG2Q, the sdhci2 host uses nfcecc for "io" clk and nfc for "core" clk.
The shdci2 can't work without this patch due to the "core" clk is gated.
Fixes: 0d859a6a9d ("ARM: dts: berlin: add the SDHCI nodes for the BG2Q")
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dca1a4b5ff upstream.
All slow clk users are not properly claiming it (get + prepare + enable)
before using it.
If all users properly claiming this clock release it, the clock is
disabled, but faulty users still depends on it, and the system hangs.
This fix prevents the slow clock from being disabled, and should solve the
hanging issue, but offending drivers should be patched to properly claim
this clock.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b71e8ecd57 upstream.
The "smemc" clock is removed on BG2Q SoCs. In fact, bit19 of clkenable
register is for nfc. Current code use bit19 for non-exist "smemc"
incorrectly, this prevents eMMC from working due to the sdhci's
"core" clk is still gated.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12551f0239 upstream.
The bit locations indicating the locking status of the plls on rk3066 are
shifted by one to the right when compared to the rk3188, bits [7:4] instead
of [8:5] on the rk3188, thus indicating the locking state of the wrong pll
or a completely different information in case of the gpll.
The recently introduced pll init code exposed that problem on some rk3066
boards when it tried to bring the boot-pll value in line with the value
from the rate table.
Fix this by defining separate pll definitions for rk3066 with the correct
locking indices.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Fixes: 2c14736c75 ("clk: rockchip: add clock driver for rk3188 and rk3066 clocks")
Tested-by: FUKAUMI Naoki <naobsd@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9880d4277f upstream.
Commit 0e5bdb3f9f (clk: rockchip: switch to using the new cpuclk type
for armclk) didn't take into account that the divider used on rk3288
are of the (n+1) type.
The rk3066 and rk3188 socs use more complex divider types making it
necessary for the list-elements to be the real register-values to write.
Therefore reduce divider values in the table accordingly so that they
really are the values that should be written to the registers and match
the dividers actually specified for the rk3288.
Reported-by: Sonny Rao <sonnyrao@chromium.org>
Fixes: 0e5bdb3f9f ("clk: rockchip: switch to using the new cpuclk type for armclk")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 176a107b86 upstream.
This reverts commit da788acb28.
That commit tried to fix the section mismatch warning by moving the
ppc_corenet_clk_driver struct to init section. This is definitely wrong
because the kernel would free the memories occupied by this struct
after boot while this driver is still registered in the driver core.
The kernel would panic when accessing this driver struct.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89f7e9de59 upstream.
Commit 6314b6796e (clk: Don't hold prepare_lock across debugfs
creation, 2014-09-04) forgot to update one place where we hold
the prepare_lock while creating debugfs directories. This means
we still have the chance of a deadlock that the commit was trying
to fix. Actually fix it by moving the debugfs creation outside
the prepare_lock.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fixes: 6314b6796e "clk: Don't hold prepare_lock across debugfs creation"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
[mturquette@linaro.org: removed lockdep_assert]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 511833acfc upstream.
Commit ac61d19559 (scsi: set correct completion code in
scsi_send_eh_cmnd()) introduced a bug. It changed the stored return
value from a queuecommand call, but it didn't take into account that
the return value was used again later on. This patch fixes the bug by
changing the later usage.
There is a big comment in the middle of scsi_send_eh_cmnd() which
does a good job of explaining how the routine works. But it mentions
a "rtn = FAILURE" value that doesn't exist in the code. This patch
adjusts the code to match the comment (I assume the comment is right
and the code is wrong).
This fixes Bugzilla #88341.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Tested-by: Андрей Аладьев <aladjev.andrew@gmail.com>
Fixes: ac61d19559
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 120bb3e1e3 upstream.
This fixes random memory corruption triggered when all three of the
following are true:
* scsi-mq enabled
* T10 Protection Information (DIF) enabled
* SCSI host with sg_tablesize > SCSI_MAX_SG_SEGMENTS (128)
The symptoms of this bug are unpredictable memory corruption, BUG()s,
oopses, lockups, etc., any of which may appear to be completely
unrelated to the root cause.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2311ce4d9c upstream.
This reverts commit 963ba22b90
("mpt3sas: Remove phys on topology change")
Reverting the previous mpt3sas drives patch changes,
since we will observe below issue
Issue:
Drives connected Enclosure/Expander will unregister with
SCSI Transport Layer, if any one remove and add expander
cable with in DMD (Device Missing Delay) time period or
even any one power-off and power-on the Enclosure with in
the DMD period.
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81a89c2d89 upstream.
This reverts commit 3520f9c779
("mpt2sas: Remove phys on topology change")
Reverting the previous mpt2sas drives patch changes,
since we will observe below issue
Issue:
Drives connected Enclosure/Expander will unregister with
SCSI Transport Layer, if any one remove and add expander
cable with in DMD (Device Missing Delay) time period or
even any one power-off and power-on the Enclosure with in
the DMD period.
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01f5a6261c upstream.
The VGA_2X_MODE bit apparently affects the display even when the VGA
plane is disabled. The bit will set by the BIOS when the panel width
is at least 1280 pixels. So by preserving the bit from the BIOS we
end up with corrupted display on machines with such high res panels.
I only have 1024x768 panels on my gen2 machines so never ran into
this problem.
The original reason for preserving the VGACNTR register was to make
my 830 survive S3 with acpi_sleep=s3_bios option. However after
further 830 fixes that option is no longer needed to make S3 work
and preserving VGACNTR doesn't seem to be necessary without it,
so we can just revert the entire patch.
This reverts
commit 69769f9a42
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Fri Aug 15 01:22:08 2014 +0300
drm/i915: Preserve VGACNTR bits from the BIOS
Cc: Bruno Prémont <bonbons@linux-vserver.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87171
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23a548ee65 upstream.
iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.
In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 302cc7c3ca upstream.
Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.
Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
[<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
[<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
[<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
[<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
[<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
[<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
[<ffffffff8106ece2>] process_one_work+0x182/0x3b0
[<ffffffff8106fda0>] worker_thread+0x120/0x3c0
[<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
[<ffffffff8107594e>] kthread+0xce/0xf0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 570db170f3 upstream.
This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.
Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.
We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.
Squashed:
iser-target: Centralize PI context handling code
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b02efbfc9a upstream.
In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.
So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.
The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).
We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca6c1d82d1 upstream.
The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.
Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.
Squashed:
iser-target: Move cma_id setup to a function
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19e2090fb2 upstream.
Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.
In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.
Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().
Squashed:
iser-target: Decrement completion context active_qps in error flow
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2371e5da8c upstream.
There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.
This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.
In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.
Squashed:
iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 128e9cc845 upstream.
ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.
Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.
- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.
Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).
Squashed:
iser-target: Don't deffer disconnected handler to a work
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 954f23722b upstream.
Since commit 0fc4ea701f ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.
Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).
Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.
Squashed:
iscsi-target: Add call to wait_conn in establishment error flow
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bf6ca7515 upstream.
This patch changes iscsit_do_tx_data() to fail on short writes
when kernel_sendmsg() returns a value different than requested
transfer length, returning -EPIPE and thus causing a connection
reset to occur.
This avoids a potential bug in the original code where a short
write would result in kernel_sendmsg() being called again with
the original iovec base + length.
In practice this has not been an issue because iscsit_do_tx_data()
is only used for transferring 48 byte headers + 4 byte digests,
along with seldom used control payloads from NOPIN + TEXT_RSP +
REJECT with less than 32k of data.
So following Al's audit of iovec consumers, go ahead and fail
the connection on short writes for now, and remove the bogus
logic ahead of his proper upstream fix.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c291ee6221 upstream.
Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.
CPU0 CPU1
show_interrupts()
desc = irq_to_desc(X);
free_desc(desc)
remove_from_radix_tree();
kfree(desc);
raw_spinlock_irq(&desc->lock);
/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.
The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.
For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.
Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.
Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.
Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.
Fixes: 1f5a5b87f7 "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ec9d4d231 upstream.
Fix a regression was introduced in commit 6df5a128f0 ("IB/iser:
Suppress scsi command send completions").
The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.
Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.
Fixes: 6df5a128f0 ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 506787a2c7 upstream.
tcm_loop has the I_T nexus associated with the HBA. This causes
commands to become misdirected if the HBA has more than one
target portal group; any command is then being sent to the
first target portal group instead of the correct one.
The nexus needs to be associated with the target portal group
instead.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3875f15207 upstream.
scripts/headers_install.sh will transform __packed to
__attribute__((packed)), so the #ifndef is not necessary.
(and, in fact, it's problematic, because we'll end up with the header
containing:
#ifndef __attribute__((packed))
#define __attribu...
and so forth.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5fd9733a3 upstream.
commit 4dbd27711c "tick: export nohz tick idle symbols for module
use" was merged via the thermal tree without an explicit ack from the
relevant maintainers.
The exports are abused by the intel powerclamp driver which implements
a fake idle state from a sched FIFO task. This causes all kinds of
wreckage in the NOHZ core code which rightfully assumes that
tick_nohz_idle_enter/exit() are only called from the idle task itself.
Recent changes in the NOHZ core lead to a failure of the powerclamp
driver and now people try to hack completely broken and backwards
workarounds into the NOHZ core code. This is completely unacceptable
and just papers over the real problem. There are way more subtle
issues lurking around the corner.
The real solution is to fix the powerclamp driver by rewriting it with
a sane concept, but that's beyond the scope of this.
So the only solution for now is to remove the calls into the core NOHZ
code from the powerclamp trainwreck along with the exports.
Fixes: d6d71ee4a1 "PM: Introduce Intel PowerClamp Driver"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
Cc: LKP <lkp@01.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9538cf4f9 upstream.
These drivers use 9100-byte receive buffers, thus allocating an skb requires
an O(3) memory allocation. Under heavy memory loads and fragmentation, such
a request can fail. Previous versions of the driver have dropped the packet
and reused the old buffer; however, the new version introduced a bug in that
it released the old buffer before trying to allocate a new one. The previous
method is implemented here. The skb is unmapped before any attempt is made to
allocate another.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08f6f14777 upstream.
The VHT supported channel width field is a two bit integer, not a
bitfield. cfg80211_chandef_usable() was interpreting it incorrectly and
ended up rejecting 160 MHz channel width if the driver indicated support
for both 160 and 80+80 MHz channels.
Fixes: 3d9d1d6656 ("nl80211/cfg80211: support VHT channel configuration")
(however, no real drivers had 160 MHz support it until 3.16)
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34f05f543f upstream.
In the already-set and intersect case of a driver-hint, the previous
wiphy regdomain was not freed before being reset with a copy of the
cfg80211 regdomain.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f5c4d631a upstream.
Streams do not work reliabe on Fresco Logic FL1000G xhci controllers,
trying to use them results in errors like this:
21:37:33 kernel: xhci_hcd 0000:04:00.0: ERROR Transfer event for disabled endpoint or incorrect stream ring
21:37:33 kernel: xhci_hcd 0000:04:00.0: @00000000368b3570 9067b000 00000000 05000000 01078001
21:37:33 kernel: xhci_hcd 0000:04:00.0: ERROR Transfer event for disabled endpoint or incorrect stream ring
21:37:33 kernel: xhci_hcd 0000:04:00.0: @00000000368b3580 9067b400 00000000 05000000 01038001
As always I've ordered a pci-e addon card with a Fresco Logic controller for
myself to see if I can come up with a better fix then the big hammer, in
the mean time this will make uas devices work again (in usb-storage mode)
for FL1000G users.
Reported-by: Marcin Zajączkowski <mszpak@wp.pl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f161ead70f upstream.
Solves xhci error cases with debug messages:
xhci_hcd 0000:00:14.0: Setup ERROR: setup context command for slot 1.
usb 1-6: hub failed to enable device, error -22
xhci will give a context state error if we try to set a slot in default
state to the same default state with a special address device command.
Turns out this happends in several cases:
- retry reading the device rescriptor in hub_port_init()
- usb_reset_device() is called for a slot in default state
- in resume path, usb_port_resume() calls hub_port_init()
The default state is usually reached from most states with a reset device
command without any context state errors, but using the address device
command with BSA bit set (block set address) only works from the enabled
state and will otherwise cause context error.
solve this by checking if we are already in the default state before issuing
a address device BSA=1 command.
Fixes: 48fc7dbd52 ("usb: xhci: change enumeration scheme to 'new scheme'")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b123429e6a upstream.
If we need to force detach a context (e.g. due to EEH or simply force
unbinding the driver) we should prevent the userspace contexts from
being able to access the Problem State Area MMIO region further, which
they may have mapped with mmap().
This patch unmaps any mapped MMIO regions when detaching a userspace
context.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a98e6e9f4e upstream.
In the event that something goes wrong in the hardware and it is unable
to complete a process element comment we would end up polling forever,
effectively making the associated process unkillable.
This patch adds a timeout to the process element command code path, so
that we will give up if the hardware does not respond in a reasonable
time.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee41d11d53 upstream.
We had a known sleep while atomic bug if a CXL device was forcefully
unbound while it was in use. This could occur as a result of EEH, or
manually induced with something like this while the device was in use:
echo 0000:01:00.0 > /sys/bus/pci/drivers/cxl-pci/unbind
The issue was that in this code path we iterated over each context and
forcefully detached it with the contexts_lock spin lock held, however
the detach also needed to take the spu_mutex, and call schedule.
This patch changes the contexts_lock to a mutex so that we are not in
atomic context while doing the detach, thereby avoiding the sleep while
atomic.
Also delete the related TODO comment, which suggested an alternate
solution which turned out to not be workable.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d47559ee8 upstream.
The flip stall detector kicks in when pending>=INTEL_FLIP_COMPLETE. That
means if we first call intel_prepare_page_flip() but don't call
intel_finish_page_flip(), the next stall check will erroneosly think
the page flip was somehow stuck.
With enough debug spew emitted from the interrupt handler my 830 hangs
when this happens. My theory is that the previous vblank interrupt gets
sufficiently delayed that the handler will see the pending bit set in
IIR, but ISR still has the bit set as well (ie. the flip was processed
by CS but didn't complete yet). In this case the handler will proceed
to call intel_check_page_flip() immediately after
intel_prepare_page_flip(). It then tries to print a backtrace for the
stuck flip WARN, which apparetly results in way too much debug spew
delaying interrupt processing further. That then seems to cause an
endless loop in the interrupt handler, and the machine is dead until
the watchdog kicks in and reboots. At least limiting the number of
iterations of the loop in the interrupt handler also prevented the
hang.
So it seems better to not call intel_prepare_page_flip() without
immediately calling intel_finish_page_flip(). The IIR/ISR trickery
avoids races here so this is a perfectly safe thing to do.
v2: Fix typo in commit message (checkpatch)
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88381
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85888
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c55018347 upstream.
There exists a current workaround to prevent a hang on context switch
should the ring go to sleep in the middle of the restore,
WaProgramMiArbOnOffAroundMiSetContext (applicable to all gen7+). In
spite of disabling arbitration (which prevents the ring from powering
down during the critical section) we were still hitting hangs that had
the hallmarks of the known erratum. That is we are still seeing hangs
"on the last instruction in the context restore". By comparing -nightly
(broken) with requests (working), we were able to deduce that it was the
semaphore LRI cross-talk that reproduced the original failure. The key
was that requests implemented deferred semaphore signalling, and
disabling that, i.e. emitting the semaphore signal to every other ring
after every batch restored the frequent hang. Explicitly disabling PSMI
sleep on the RCS ring was insufficient, all the rings had to be awake to
prevent the hangs. Fortunately, we can reduce the wakelock to the
MI_SET_CONTEXT operation itself, and so should be able to limit the extra
power implications.
Since the MI_ARB_ON_OFF workaround is listed for all gen7 and above
products, we should apply this extra hammer for all of the same
platforms despite so far that we have only been able to reproduce the
hang on certain ivb and hsw models. The last question is whether we want
to always use the extra hammer or only when we know semaphores are in
operation. At the moment, we only use LRI on non-RCS rings for
semaphores, but that may change in the future with the possibility of
reintroducing this bug under subtle conditions.
v2: Make it explicit that the PSMI LRI are an extension to the original
workaround for the other rings.
v3: Bikeshedding variable names and whitespacing
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80660
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83677
Cc: Simon Farnsworth <simon@farnz.org.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Peter Frühberger <fritsch@xbmc.org>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7d6f7d708 upstream.
Otherwise the MST resume paths can hit DPMS paths
which hit state checker paths, which hit WARN_ON,
because the state checker is inconsistent with the
hw.
This fixes a bunch of WARN_ON's on resume after
undocking.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b38705981 upstream.
In all likelihood we will do a few hundred errnoneous register
operations if we do a single invalid register access whilst the device
is suspended. As each instance causes a WARN, this floods the system
logs and can make the system unresponsive.
The warning was first introduced in
commit b2ec142cb0
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Feb 21 13:52:25 2014 -0300
drm/i915: call assert_device_not_suspended at gen6_force_wake_work
and despite the claims the WARN is still encountered in the wild today.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d472fcc837 upstream.
The problem here is that SNA pins batchbuffers to etch out a bit more
performance. Iirc it started out as a w/a for i830M (which we've
implemented in the kernel since a long time already). The problem is
that the pin ioctl wasn't added in
commit d23db88c3a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri May 23 08:48:08 2014 +0200
drm/i915: Prevent negative relocation deltas from wrapping
Fix this by simply disallowing pinning from userspace so that the
kernel is in full control of batch placement again. Especially since
distros are moving towards running X as non-root, so most users won't
even be able to see any benefits.
UMS support is dead now, but we need this minimal patch for
backporting. Follow-up patch will remove the pin ioctl code
completely.
Note to backporters: You must have both
commit b45305fce5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Dec 17 16:21:27 2012 +0100
drm/i915: Implement workaround for broken CS tlb on i830/845
which laned in 3.8 and
commit c4d69da167
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Sep 8 14:25:41 2014 +0100
drm/i915: Evict CS TLBs between batches
which is also marked cc: stable. Otherwise this could introduce a
regression by disabling the userspace w/a without the kernel w/a being
fully functional on i830/45.
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554#c116
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 410cce2a6b upstream.
The check was already in place in the dp mode_valid check, but
radeon_dp_get_dp_link_clock() never returned the high clock
mode_valid was checking for because that function clipped the
clock based on the hw capabilities. Add an explicit check
in the mode_valid function.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=87172
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02ae7af53a upstream.
Enabling bapm seems to cause clocking problems on some
KV configurations. Disable it by default for now.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbedf1c3fc upstream.
Enable all three in the driver. Early documentation
indicated the 3rd one was used for something else, but
that is not the case.
v2: handle disable as well
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bb62c95a7 upstream.
Always need to set bit 0 of RLC_CGTT_MGCG_OVERRIDE
to avoid unreliable doorbell updates in some cases.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0391359ddf upstream.
When we unplug a dp mst branch we unreference the entire tree from
the root towards the leaves. Which is ok, since that's the way the
pointers and so also the refcounts go.
But when we drop the reference we must make sure that we remove the
branches/ports from the lists/pointers before dropping the reference.
Otherwise the get_validated functions will still return it instead
of returning NULL (which indicates a potentially on-going unplug).
The mst branch destroy gets this right for ports: First it deletes
the port from the ports list, then it unrefs. But the ports destroy
function gets it wrong: First it unrefs, then it drops the ref. Which
means a zombie mst branch can still be validate with get_validated_mstb_ref
when it shouldn't.
Fix this.
This should address a backtrace Dave dug out somewhere on unplug:
[<ffffffffa00cc262>] drm_dp_mst_get_validated_mstb_ref_locked+0x92/0xa0 [drm_kms_helper]
[<ffffffffa00cc211>] drm_dp_mst_get_validated_mstb_ref_locked+0x41/0xa0 [drm_kms_helper]
[<ffffffffa00cc2aa>] drm_dp_get_validated_mstb_ref+0x3a/0x60 [drm_kms_helper]
[<ffffffffa00cc2fb>] drm_dp_payload_send_msg.isra.14+0x2b/0x100 [drm_kms_helper]
[<ffffffffa00cc547>] drm_dp_update_payload_part1+0x177/0x360 [drm_kms_helper]
[<ffffffffa015c52e>] intel_mst_disable_dp+0x3e/0x80 [i915]
[<ffffffffa013d60b>] haswell_crtc_disable+0x1cb/0x340 [i915]
[<ffffffffa0136739>] intel_crtc_control+0x49/0x100 [i915]
[<ffffffffa0136857>] intel_crtc_update_dpms+0x67/0x80 [i915]
[<ffffffffa013fa59>] intel_connector_dpms+0x59/0x70 [i915]
[<ffffffffa015c752>] intel_dp_destroy_mst_connector+0x32/0xc0 [i915]
[<ffffffffa00cb44b>] drm_dp_destroy_port+0x6b/0xa0 [drm_kms_helper]
[<ffffffffa00cb588>] drm_dp_destroy_mst_branch_device+0x108/0x130 [drm_kms_helper]
[<ffffffffa00cb3cd>] drm_dp_port_teardown_pdt+0x3d/0x50 [drm_kms_helper]
[<ffffffffa00cdb79>] drm_dp_mst_handle_up_req+0x499/0x540 [drm_kms_helper]
[<ffffffff810d9ead>] ? trace_hardirqs_on_caller+0x15d/0x200 [<ffffffffa00cdc73>]
drm_dp_mst_hpd_irq+0x53/0xa00 [drm_kms_helper] [<ffffffffa00c7dfb>]
? drm_dp_dpcd_read+0x1b/0x20 [drm_kms_helper] [<ffffffffa0153ed8>]
? intel_dp_dpcd_read_wake+0x38/0x70 [i915] [<ffffffffa015a225>]
intel_dp_check_mst_status+0xb5/0x250 [i915] [<ffffffffa015ac71>]
intel_dp_hpd_pulse+0x181/0x210 [i915] [<ffffffffa01104f6>]
i915_digport_work_func+0x96/0x120 [i915]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19a93f042f upstream.
At least on two MST devices I've tested with, when
they are link training downstream, they are totally
unable to handle aux ch msgs, so they defer like nuts.
I tried 16, it wasn't enough, 32 seems better.
This fixes one Dell 4k monitor and one of the
MST hubs.
v1.1: fixup comment (Tom).
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2809c7db8 upstream.
On MST systems the monitors don't appear when we set the fb up,
but plymouth opens the drm device and holds it open while they
come up, when plymouth finishes and lastclose gets called we
don't do the delayed fb probe, so the monitor never appears on the
console.
Fix this by moving the delayed checking into the mode restore.
v2: Daniel suggested that ->delayed_hotplug is set under
the mode_config mutex, so we should check it under that as
well, while we are in the area.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 881fdaa5e4 upstream.
Andrew Morton wrote:
> On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> > Andrew Morton wrote:
> > > Poor ttm guys - this is a bit of a trap we set for them.
> >
> > Commit a91576d791 ("drm/ttm: Pass GFP flags in order to avoid deadlock.")
> > changed to use sc->gfp_mask rather than GFP_KERNEL.
> >
> > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> > - GFP_KERNEL);
> > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> >
> > But this bug is caused by sc->gfp_mask containing some flags which are not
> > in GFP_KERNEL, right? Then, I think
> >
> > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);
> >
> > would hide this bug.
> >
> > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)
>
> Well no - ttm_page_pool_free() should stop calling kmalloc altogether.
> Just do
>
> struct page *pages_to_free[16];
>
> and rework the code to free 16 pages at a time. Easy.
Well, ttm code wants to process 512 pages at a time for performance.
Memory footprint increased by 512 * sizeof(struct page *) buffer is
only 4096 bytes. What about using static buffer like below?
----------
>From d3cb5393c9c8099d6b37e769f78c31af1541fe8c Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Thu, 13 Nov 2014 22:21:54 +0900
Subject: drm/ttm: Avoid memory allocation from shrinker functions.
Commit a91576d791 ("drm/ttm: Pass GFP flags in order to avoid
deadlock.") caused BUG_ON() due to sc->gfp_mask containing flags
which are not in GFP_KERNEL.
https://bugzilla.kernel.org/show_bug.cgi?id=87891
Changing from sc->gfp_mask to (sc->gfp_mask & GFP_KERNEL) would
avoid the BUG_ON(), but avoiding memory allocation from shrinker
function is better and reliable fix.
Shrinker function is already serialized by global lock, and
clean up function is called after shrinker function is unregistered.
Thus, we can use static buffer when called from shrinker function
and clean up function.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89669e7a7f upstream.
The commit "vmwgfx: Rework fence event action" introduced a number of bugs
that are fixed with this commit:
a) A forgotten return stateemnt.
b) An if statement with identical branches.
Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e338c4c2b6 upstream.
The function vmw_master_check() might return -ERESTARTSYS if there is a
signal pending, indicating that the IOCTL should be rerun, potentially from
user-space. At that point we shouldn't print out an error message since that
is not an error condition. In short, avoid bloating the kernel log when a
process refuses to die on SIGTERM.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f563a6a46 upstream.
Kernel side fence objects are used when unbinding resources and may thus be
created as part of a memory reclaim operation. This might trigger recursive
memory reclaims and result in the kernel running out of stack space.
So a simple way out is to avoid accounting of these fence objects.
In principle this is OK since while user-space can trigger the creation of
such objects, it can't really hold on to them. However, their lifetime is
quite long, so some form of accounting should perhaps be implemented in the
future.
Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3
with low system memory settings.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b0d11b4278 ]
This patch is fixing a race condition that may cause setting
count_pending to -1, which results in unwanted big bulk of arp messages
(in case of "notify peers").
Consider following scenario:
count_pending == 2
CPU0 CPU1
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers
atomic_add (adding 1 to count_pending)
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 1)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to 0)
schedule_delayed_work
team_notify_peers_work
atomic_dec_and_test (dec count_pending to -1)
Fix this race by using atomic_dec_if_positive - that will prevent
count_pending running under 0.
Fixes: fc423ff00d ("team: add peer notification")
Fixes: 492b200efd ("team: add support for sending multicast rejoins")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7a05dc64e2 ]
Commit d75b1ade56 ("net: less interrupt masking in NAPI") uncovered
wrong alx_poll() behavior.
A NAPI poll() handler is supposed to return exactly the budget when/if
napi_complete() has not been called.
It is also supposed to return number of frames that were received, so
that netdev_budget can have a meaning.
Also, in case of TX pressure, we still have to dequeue received
packets : alx_clean_rx_irq() has to be called even if
alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade56 ("net: less interrupt masking in NAPI")
Reported-by: Oded Gabbay <oded.gabbay@amd.com>
Bisected-by: Oded Gabbay <oded.gabbay@amd.com>
Tested-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 07ff890dae ]
Since e9ce7cb6b1 ("xen-netback: Factor queue-specific data into queue struct"),
the transimt shaper timeout is always set to 0. The value the user sets via
xenbus is never propagated to the transmit shaper.
This patch fixes the issue.
Cc: Anthony Liguori <aliguori@amazon.com>
Signed-off-by: Imre Palik <imrep@amazon.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 843925f33f ]
Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.
In fact the problem was completely unrelated to IPsec. The bug is
also reproducible if you just disable TSO/GSO.
The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.
This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue. Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.
Fixes: 1485348d24 ("tcp: Apply device TSO segment limit earlier")
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a51e0df4c1 ]
Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the
old MTT's page shift, which could result in using an incorrect offset.
Fix the initialization to be after we calculate the new MTT offset.
In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This
is necessary in order to mark the MTT as invalid and avoid freeing it later.
Fixes: e630664 ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5f35227ea3 ]
GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.
This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.
By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.
CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c26d34bbc ]
When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:
[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
These are reliably reproduced in a network topology of:
container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet. Note that the warning is
associated with the container eth0.
The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.
The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb. __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.
This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b8fb4e0648 ]
skb_scrub_packet() is called when a packet switches between a context
such as between underlay and overlay, between namespaces, or between
L3 subnets.
While we already scrub the packet mark, connection tracking entry,
and cached destination, the security mark/context is left intact.
It seems wrong to inherit the security context of a packet when going
from overlay to underlay or across forwarding paths.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 796f2da81b ]
When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.
Fixes: 58e998c6d2 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2dc49d1680 ]
When xfrm6_policy_check() is used, _decode_session6() is called after some
intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
prepared after the call of xfrm6_policy_check().
Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Reported-by: Huaibin Wang <huaibin.wang@6wind.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 492f5add4b ]
iowrite32() will byteswap it's argument on big endian archs.
iowrite32be() will byteswap on little endian archs.
Since we don't want to do this unnecessary byteswap on the fast path,
doorbell is stored in the NIC's native endianness. Using the right
iowrite() according to the arch endianness.
CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: David Laight <david.laight@aculab.com>
Fixes: 6a4e812 ("net/mlx4_en: Avoid calling bswap in tx fast path")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0d16449195 ]
Gateway having bandwidth_down equal to zero are not accepted
at all and so never added to the Gateway list.
For this reason checking the bandwidth_down member in
batadv_gw_out_of_range() is useless.
This is probably a copy/paste error and this check was supposed
to be "!gw_node" only. Moreover, the way the check is written
now may also lead to a NULL dereference.
Fix this by rewriting the if-condition properly.
Introduced by 414254e342
("batman-adv: tvlv - gateway download/upload bandwidth container")
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0402e444cd ]
The fragmentation code was replaced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge") by an implementation which
can handle up to 16 fragments of a packet. The packet is prepared for the split
in fragments by the function batadv_frag_send_packet and the actual split is
done by batadv_frag_create.
Both functions calculate the size of a fragment themself. But their calculation
differs because batadv_frag_send_packet also subtracts ETH_HLEN. Therefore,
the check in batadv_frag_send_packet "can a full fragment can be created?" may
return true even when batadv_frag_create cannot create a full fragment.
The function batadv_frag_create doesn't check the size of the skb before
splitting it and therefore might try to create a larger fragment than the
remaining buffer. This creates an integer underflow and an invalid len is given
to skb_split.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5b6698b0e4 ]
The fragmentation code was replaced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge"). The new code provided a
mostly unused parameter skb for the merging function. It is used inside the
function to calculate the additionally needed skb tailroom. But instead of
increasing its own tailroom, it is only increasing the tailroom of the first
queued skb. This is not correct in some situations because the first queued
entry can be a different one than the parameter.
An observed problem was:
1. packet with size 104, total_size 1464, fragno 1 was received
- packet is queued
2. packet with size 1400, total_size 1464, fragno 0 was received
- packet is queued at the end of the list
3. enough data was received and can be given to the merge function
(1464 == (1400 - 20) + (104 - 20))
- merge functions gets 1400 byte large packet as skb argument
4. merge function gets first entry in queue (104 byte)
- stored as skb_out
5. merge function calculates the required extra tail as total_size - skb->len
- pskb_expand_head tail of skb_out with 64 bytes
6. merge function tries to squeeze the extra 1380 bytes from the second queued
skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out
Instead calculate the extra required tail bytes for skb_out also using skb_out
instead of using the parameter skb. The skb parameter is only used to get the
total_size from the last received packet. This is also the total_size used to
decide that all fragments were received.
Reported-by: Philipp Psurek <philipp.psurek@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 05b0aa5793 ]
During driver load in tg3_init_one, if the driver detects DMA activity before
intializing the chip tg3_halt is called. As part of tg3_halt interrupts are
disabled using routine tg3_disable_ints. This routine was using mailbox value
which was not initialized (default value is 0). As a result driver was writing
0x00000001 to pci config space register 0, which is the vendor id / device id.
This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only
the Vendor ID to identify Configuration Request Retry). Also this issue is only
seen in older generation chipsets like 5722 because config space write to offset
0 from driver is possible. The newer generation chips ignore writes to offset 0.
Also without commit a7877b17a667, for these older chips when a GRC reset is
issued the Bootcode would reprogram the vendor id/device id, which is the reason
this bug was masked earlier.
Fixed by initializing the interrupt mailbox registers before calling tg3_halt.
Please queue for -stable.
Reported-by: Nils Holland <nholland@tisys.org>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d08acd2d3 ]
Resolve conflicts between glibc definition of IPV6 socket options
and those defined in Linux headers. Looks like earlier efforts to
solve this did not cover all the definitions.
It resolves warnings during iproute2 build.
Please consider for stable as well.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit af6dabc9c7 ]
Commit cecda693a9 ("net: keep original skb
which only needs header checking during software GSO") keeps the original
skb for packets that only needs header check, but it doesn't drop the
packet if software segmentation or header check were failed.
Fixes cecda693a9 ("net: keep original skb which only needs header checking during software GSO")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstram commit 26c0e10258 ]
Commit bc96f648df (xen-netback: make
feature-rx-notify mandatory) incorrectly assumed that there were no
frontends in use that did not support this feature. But the frontend
driver in MiniOS does not and since this is used by (qemu) stubdoms,
these stopped working.
Netback sort of works as-is in this mode except:
- If there are no Rx requests and the internal Rx queue fills, only
the drain timeout will wake the thread. The default drain timeout
of 10 s would give unacceptable pauses.
- If an Rx stall was detected and the internal Rx queue is drained,
then the Rx thread would never wake.
Handle these two cases (when feature-rx-notify is disabled) by:
- Reducing the drain timeout to 30 ms.
- Disabling Rx stall detection.
Reported-by: John <jw@nuclearfallout.net>
Tested-by: John <jw@nuclearfallout.net>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 12069401d8 ]
Currently, searching for a socket to add a reference to is not
synchronized with deletion of sockets. This can result in use
after free if there is another operation that is removing a
socket at the same time. Solving this requires both holding the
appropriate lock and checking the refcount to ensure that it
has not already hit zero.
Inspired by a related (but not exactly the same) issue in the
VXLAN driver.
Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7ed767f731 ]
Sockets aren't currently removed from the the global list when
they are destroyed. In addition, offload handlers need to be cleaned
up as well.
Fixes: 0b5e8b8e ("net: Add Geneve tunneling protocol driver")
CC: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a18e6a186f ]
Each mmap Netlink frame contains a status field which indicates
whether the frame is unused, reserved, contains data or needs to
be skipped. Both loads and stores may not be reordeded and must
complete before the status field is changed and another CPU might
pick up the frame for use. Use an smp_mb() to cover needs of both
types of callers to netlink_set_status(), callers which have been
reading data frame from the frame, and callers which have been
filling or releasing and thus writing to the frame.
- Example code path requiring a smp_rmb():
memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
- Example code path requiring a smp_wmb():
hdr->nm_uid = from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
hdr->nm_gid = from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
netlink_frame_flush_dcache(hdr);
netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4682a03586 ]
Checking the file f_count and the nlk->mapped count is not completely
sufficient to prevent the mmap'd area contents from changing from
under us during netlink mmap sendmsg() operations.
Be careful to sample the header's length field only once, because this
could change from under us as well.
Fixes: 5fd96123ee ("netlink: implement memory mapped sendmsg()")
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c3f2511fea ]
This commit contains 2 fixes for the 128B CQE/EQE stride feaure.
Wei found that mlx4_QUERY_HCA function marked the wrong capability
in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled.
Also added small fix in initial CQE ownership bit assignment, when CQE
is size is not default 32B.
Fixes: 77507aa24 (net/mlx4: Enable CQE/EQE stride support)
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a0033a947 ]
The NBMA GRE tunnels temporarily push GRE header that contain the
per-packet NBMA destination on the skb via header ops early in xmit
path. It is the later pulled before the real GRE header is constructed.
The inner mac was thus set differently in nbma case: the GRE header
has been pushed by neighbor layer, and mac header points to beginning
of the temporary gre header (set by dev_queue_xmit).
Now that the offloads expect mac header to point to the gre payload,
fix the xmit patch to:
- pull first the temporary gre header away
- and reset mac header to point to gre payload
This fixes tso to work again with nbma tunnels.
Fixes: 14051f0452 ("gre: Use inner mac length when computing tunnel length")
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 690eac53da upstream.
Commit fee7e49d45 ("mm: propagate error from stack expansion even for
guard page") made sure that we return the error properly for stack
growth conditions. It also theorized that counting the guard page
towards the stack limit might break something, but also said "Let's see
if anybody notices".
Somebody did notice. Apparently android-x86 sets the stack limit very
close to the limit indeed, and including the guard page in the rlimit
check causes the android 'zygote' process problems.
So this adds the (fairly trivial) code to make the stack rlimit check be
against the actual real stack size, rather than the size of the vma that
includes the guard page.
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fee7e49d45 upstream.
Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.
This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.
And since /proc/maps explicitly ignores the guard page (commit
d7824370e2: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.
This is the minimal patch: it just propagates the error. It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.
Let's see if anybody notices. We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.
Reported-and-tested-by: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e5e366172 upstream.
Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
stuck in a busy loop with nothing left to balance, but
kswapd_try_to_sleep() failing to sleep. Their analysis found the cause
to be a combination of several factors:
1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
2. The process has been killed (by OOM in this case), but has not yet been
scheduled to remove itself from the waitqueue and die.
3. kswapd checks for throttled processes in prepare_kswapd_sleep():
if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
wake_up(&pgdat->pfmemalloc_wait);
return false; // kswapd will not go to sleep
}
However, for a process that was already killed, wake_up() does not remove
the process from the waitqueue, since try_to_wake_up() checks its state
first and returns false when the process is no longer waiting.
4. kswapd is running on the same CPU as the only CPU that the process is
allowed to run on (through cpus_allowed, or possibly single-cpu system).
5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
encounters no voluntary preemption points and repeatedly fails
prepare_kswapd_sleep(), blocking the process from running and removing
itself from the waitqueue, which would let kswapd sleep.
So, the source of the problem is that we prevent kswapd from going to
sleep until there are processes waiting on the pfmemalloc_wait queue,
and a process waiting on a queue is guaranteed to be removed from the
queue only when it gets scheduled. This was done to make sure that no
process is left sleeping on pfmemalloc_wait when kswapd itself goes to
sleep.
However, it isn't necessary to postpone kswapd sleep until the
pfmemalloc_wait queue actually empties. To prevent processes from being
left sleeping, it's actually enough to guarantee that all processes
waiting on pfmemalloc_wait queue have been woken up by the time we put
kswapd to sleep.
This patch therefore fixes this issue by substituting 'wake_up' with
'wake_up_all' and removing 'return false' in the code snippet from
prepare_kswapd_sleep() above. Note that if any process puts itself in
the queue after this waitqueue_active() check, or after the wake up
itself, it means that the process will also wake up kswapd - and since
we are under prepare_to_wait(), the wake up won't be missed. Also we
update the comment prepare_kswapd_sleep() to hopefully more clearly
describe the races it is preventing.
Fixes: 5515061d22 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d6d7f9828 upstream.
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:
__set_page_dirty_nobuffers() __delete_from_page_cache()
if (TestSetPageDirty(page))
page->mapping = NULL
if (PageDirty())
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
if (page->mapping)
account_page_dirtied(page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held. The notable exception to this rule, though,
is do_wp_page(), for which this race exists. However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes. Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.
Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed. Remove it.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3245d6acab upstream.
wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
change in between, gcc needs to reload p->exit_state after
security_task_wait(). In this case ->notask_error will be wrongly
cleared and do_wait() can hang forever if it was the last eligible
child.
Many thanks to Arne who carefully investigated the problem.
Note: this bug is very old but it was pure theoretical until commit
b3ab03160d ("wait: completely ignore the EXIT_DEAD tasks"). Before
this commit "-O2" was probably enough to guarantee that compiler won't
read ->exit_state twice.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Arne Goedeke <el@laramies.com>
Tested-by: Arne Goedeke <el@laramies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2836766a9d upstream.
Sleep in atomic context happened on Trats2 board after inserting or
removing SD card because mmc_gpio_get_cd() was called under spin lock.
Fix this by moving card detection earlier, before acquiring spin lock.
The mmc_gpio_get_cd() call does not have to be protected by spin lock
because it does not access any sdhci internal data.
The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After
moving it out side of spin lock it could theoretically race with driver
removal but still there is no actual protection against manual card
eject.
Dmesg after inserting SD card:
[ 41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511
[ 41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1
[ 41.677580] INFO: lockdep is turned off.
[ 41.681486] irq event stamp: 61972
[ 41.684872] hardirqs last enabled at (61971): [<c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c
[ 41.693118] hardirqs last disabled at (61972): [<c04907ac>] _raw_spin_lock_irq+0x18/0x54
[ 41.701190] softirqs last enabled at (61648): [<c0026fd4>] __do_softirq+0x234/0x2c8
[ 41.708914] softirqs last disabled at (61631): [<c00273a0>] irq_exit+0xd0/0x114
[ 41.716206] Preemption disabled at:[< (null)>] (null)
[ 41.721500]
[ 41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G W 3.18.0-rc5-next-20141121 #883
[ 41.732111] Workqueue: kmmcd mmc_rescan
[ 41.735945] [<c0014d2c>] (unwind_backtrace) from [<c0011c80>] (show_stack+0x10/0x14)
[ 41.743661] [<c0011c80>] (show_stack) from [<c0489d14>] (dump_stack+0x70/0xbc)
[ 41.750867] [<c0489d14>] (dump_stack) from [<c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30)
[ 41.759628] [<c0228b74>] (gpiod_get_raw_value_cansleep) from [<c03646e8>] (mmc_gpio_get_cd+0x38/0x58)
[ 41.768821] [<c03646e8>] (mmc_gpio_get_cd) from [<c036d378>] (sdhci_request+0x50/0x1a4)
[ 41.776808] [<c036d378>] (sdhci_request) from [<c0357934>] (mmc_start_request+0x138/0x268)
[ 41.785051] [<c0357934>] (mmc_start_request) from [<c0357cc8>] (mmc_wait_for_req+0x58/0x1a0)
[ 41.793469] [<c0357cc8>] (mmc_wait_for_req) from [<c0357e68>] (mmc_wait_for_cmd+0x58/0x78)
[ 41.801714] [<c0357e68>] (mmc_wait_for_cmd) from [<c0361c00>] (mmc_io_rw_direct_host+0x98/0x124)
[ 41.810480] [<c0361c00>] (mmc_io_rw_direct_host) from [<c03620f8>] (sdio_reset+0x2c/0x64)
[ 41.818641] [<c03620f8>] (sdio_reset) from [<c035a3d8>] (mmc_rescan+0x254/0x2e4)
[ 41.826028] [<c035a3d8>] (mmc_rescan) from [<c003a0e0>] (process_one_work+0x180/0x3f4)
[ 41.833920] [<c003a0e0>] (process_one_work) from [<c003a3bc>] (worker_thread+0x34/0x4b0)
[ 41.841991] [<c003a3bc>] (worker_thread) from [<c003fed8>] (kthread+0xe4/0x104)
[ 41.849285] [<c003fed8>] (kthread) from [<c000f268>] (ret_from_fork+0x14/0x2c)
[ 42.038276] mmc0: new high speed SDHC card at address 1234
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 94144a465d ("mmc: sdhci: add get_cd() implementation")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1222d8fe57 upstream.
Invalid buck4 configuration for linear mapping of voltage in S2MPS14
regulators caused boot failure on Gear 2 (dw_mmc-exynos):
[ 3.569137] EXT4-fs (mmcblk0p15): mounted filesystem with ordered data mode. Opts: (null)
[ 3.571716] VFS: Mounted root (ext4 filesystem) readonly on device 179:15.
[ 3.629842] mmcblk0: error -110 sending status command, retrying
[ 3.630244] mmcblk0: error -110 sending status command, retrying
[ 3.636292] mmcblk0: error -110 sending status command, aborting
Buck4 voltage regulator has different minimal voltage value than other
bucks. Commit merging multiple regulator description macros caused to
use linear_min_sel from buck[1235] regulators as value for buck4. This
lead to lower voltage of buck4 than required.
Output of the buck4 is used internally as power source for
LDO{3,4,7,11,19,20,21,23}. On Gear 2 board LDO11 is used as MMC
regulator (V_EMMC_1.8V).
Fixes: 5a867cf288 ("regulator: s2mps11: Optimize the regulator description macro")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2036eaa740 upstream.
nouveau userspace back at 1.0.1 used to call the X server
DRIOpenDRMMaster interface even for DRI2 (doh!), this attempts
to map the sarea and fails if it can't.
Since 884c6dabb0 from Daniel,
this fails, but only ancient drivers would see it.
Revert the nouveau bits of that fix.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff4c0d5213 upstream.
On !SMP systems spinlocks do not exist. Thus checking of they
are active will always fail.
Use
assert_spin_locked(lock);
instead of
BUG_ON(!spin_is_locked(lock));
to not BUG() on all UP systems.
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 015760563e upstream.
SH-MSIOF driver is enabled autosuspend API of spi framework.
But autosuspend framework doesn't work during initializing.
So runtime PM lock is added in SH-MSIOF driver initializing.
Fixes: e2a0ba547b (spi: sh-msiof: Convert to spi core auto_runtime_pm framework)
Signed-off-by: Hisashi Nakamura <hisashi.nakamura.ak@renesas.com>
Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fc81d8742 upstream.
We allow PMU driver to change the cpu on which the event
should be installed to. This happened in patch:
e2d37cd213 ("perf: Allow the PMU driver to choose the CPU on which to install events")
This patch also forces all the group members to follow
the currently opened events cpu if the group happened
to be moved.
This and the change of event->cpu in perf_install_in_context()
function introduced in:
0cda4c0231 ("perf: Introduce perf_pmu_migrate_context()")
forces group members to change their event->cpu,
if the currently-opened-event's PMU changed the cpu
and there is a group move.
Above behaviour causes problem for breakpoint events,
which uses event->cpu to touch cpu specific data for
breakpoints accounting. By changing event->cpu, some
breakpoints slots were wrongly accounted for given
cpu.
Vinces's perf fuzzer hit this issue and caused following
WARN on my setup:
WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
Can't find any breakpoint slot
[...]
This patch changes the group moving code to keep the event's
original cpu.
Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af91568e76 upstream.
The uncore_collect_events functions assumes that event group
might contain only uncore events which is wrong, because it
might contain any type of events.
This bug leads to uncore framework touching 'not' uncore events,
which could end up all sorts of bugs.
One was triggered by Vince's perf fuzzer, when the uncore code
touched breakpoint event private event space as if it was uncore
event and caused BUG:
BUG: unable to handle kernel paging request at ffffffff82822068
IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
...
The code in uncore_assign_events() function was looking for
event->hw.idx data while the event was initialized as a
breakpoint with different members in event->hw union.
This patch forces uncore_collect_events() to collect only uncore
events.
Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f8960541b upstream.
Commit 1d52c78afb (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code. But the delayed processing will end up
ignoring delayed refs from log replay because the inode itself wasn't
put through the delayed code.
This can end up triggering a warning at commit time:
WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
Which is repeated for each commit because we never process the delayed
inode ref update.
The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're currently in log replay. The caller will do the ref
deletion immediately and everything will work properly.
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b1e95b2fa upstream.
The "by8" counter mode optimization is broken for 128 bit keys with
input data longer than 128 bytes. It uses the wrong key material for
en- and decryption.
The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
in case we're handling more than 128 bytes of input data -- they won't
get reloaded after the initial load. They must therefore be (a) loaded
on the first iteration and (b) be preserved for the latter ones. The
implementation for 128 bit keys does not comply with (a) nor (b).
Fix this by bringing the implementation back to its original source
and correctly load the key registers and preserve their values by
*not* re-using the registers for other purposes.
Kudos to James for reporting the issue and providing a test case
showing the discrepancies.
Reported-by: James Yonan <james@openvpn.net>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b8c960cf6 upstream.
This patch fixes this allyesconfig target build error with older
binutils.
LD arch/x86/crypto/built-in.o
ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory
Signed-off-by: Vinson Lee <vlee@twitter.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e63ea48b4 upstream.
The early ioremap support introduced by patch bf4b558eba
("arm64: add early_ioremap support") failed to add a call to
early_ioremap_reset() at an appropriate time. Without this call,
invocations of early_ioremap etc. that are done too late will go
unnoticed and may cause corruption.
This is exactly what happened when the first user of this feature
was added in patch f84d02755f ("arm64: add EFI runtime services").
The early mapping of the EFI memory map is unmapped during an early
initcall, at which time the early ioremap support is long gone.
Fix by adding the missing call to early_ioremap_reset() to
setup_arch(), and move the offending early_memunmap() to right after
the point where the early mapping of the EFI memory map is last used.
Fixes: f84d02755f ("arm64: add EFI runtime services")
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f43c27188a upstream.
On arm64 the TTBR0_EL1 register is set to either the reserved TTBR0
page tables on boot or to the active_mm mappings belonging to user space
processes, it must never be set to swapper_pg_dir page tables mappings.
When a CPU is booted its active_mm is set to init_mm even though its
TTBR0_EL1 points at the reserved TTBR0 page mappings. This implies
that when __cpu_suspend is triggered the active_mm can point at
init_mm even if the current TTBR0_EL1 register contains the reserved
TTBR0_EL1 mappings.
Therefore, the mm save and restore executed in __cpu_suspend might
turn out to be erroneous in that, if the current->active_mm corresponds
to init_mm, on resume from low power it ends up restoring in the
TTBR0_EL1 the init_mm mappings that are global and can cause speculation
of TLB entries which end up being propagated to user space.
This patch fixes the issue by checking the active_mm pointer before
restoring the TTBR0 mappings. If the current active_mm == &init_mm,
the code sets the TTBR0_EL1 to the reserved TTBR0 mapping instead of
switching back to the active_mm, which is the expected behaviour
corresponding to the TTBR0_EL1 settings when __cpu_suspend was entered.
Fixes: 95322526ef ("arm64: kernel: cpu_{suspend/resume} implementation")
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d27eb7931c upstream.
Protocol v7 uses the middle / right button bits on clickpads to communicate
"location" information of a 3th touch (and possible 4th) touch on
clickpads.
Specifically when 3 touches are down, if one of the 3 touches is in the
left / right button area, this will get reported in the middle / right
button bits and the touchpad will still send a TWO type packet rather then
a MULTI type packet, so when this happens we must add the finger reported
in the button area to the finger count.
Likewise we must also add fingers reported this way to the finger count
when we get MULTI packets.
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=86338
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7091c443dd upstream.
The v7 proto differentiates between a primary touch (with high precision)
and a secondary touch (with lower precision). Normally when 2 fingers are
down and one is lifted the still present touch becomes the primary touch,
but some traces have shown that this does not happen always.
This commit deals with this by making alps_get_mt_count() not stop at the
first empty mt slot, and if a touch is present in mt[1] and not mt[0]
moving the data to mt[0] (for input_mt_assign_slots).
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=86338
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b23811535 upstream.
NEW packets are send to indicate a discontinuity in the finger coordinate
reporting. Specifically a finger may have moved from slot 0 to 1 or vice
versa. INPUT_MT_TRACK takes care of this for us.
NEW packets have 3 problems:
1) They do not contain middle / right button info (on non clickpads)
this can be worked around by preserving the old button state
2) They do not contain an accurate fingercount, and they are
typically send when the number of fingers changes. We cannot use
the old finger count as that may mismatch with the amount of
touch coordinates we've available in the NEW packet
3) Their x data for the second touch is inaccurate leading to
a possible jump of the x coordinate by 16 units when the first
non NEW packet comes in
Since problems 2 & 3 cannot be worked around, just ignore them.
BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=86338
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b1f3e1699 upstream.
If an ACPI device object whose _STA returns 0 (not present and not
functional) has _PR0 or _PS0, its power_manageable flag will be set
and acpi_bus_init_power() will return 0 for it. Consequently, if
such a device object is passed to the ACPI device PM functions, they
will attempt to carry out the requested operation on the device,
although they should not do that for devices that are not present.
To fix that problem make acpi_bus_init_power() return an error code
for devices that are not present which will cause power_manageable to
be cleared for them as appropriate in acpi_bus_get_power_flags().
However, the lists of power resources should not be freed for the
device in that case, so modify acpi_bus_get_power_flags() to keep
those lists even if acpi_bus_init_power() returns an error.
Accordingly, when deciding whether or not the lists of power
resources need to be freed, acpi_free_power_resources_lists()
should check the power.flags.power_resources flag instead of
flags.power_manageable, so make that change too.
Furthermore, if acpi_bus_attach() sees that flags.initialized is
unset for the given device, it should reset the power management
settings of the device and re-initialize them from scratch instead
of relying on the previous settings (the device may have appeared
after being not present previously, for example), so make it use
the 'valid' flag of the D0 power state as the initial value of
flags.power_manageable for it and call acpi_bus_init_power() to
discover its current power state.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49a068f82a upstream.
A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.
This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary. You're more
likely to hit this case on large directories.
Other xdr_truncate_encode callers are probably also affected.
Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: 3e19ce762b "rpc: xdr_truncate_encode"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9008d83fe9 upstream.
Commit 705814b5ea ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to
re-use it for OMAP5")
Moved logic generic for OMAP5+ as part of the init routine by
introducing omap4_pm_init. However, the patch left the powerdomain
initial setup, an unused omap4430 es1.0 check and a spurious log
"Power Management for TI OMAP4." in the original code.
Remove the duplicate code which is already present in omap4_pm_init from
omap4_init_static_deps.
As part of this change, also move the u-boot version print out of the
static dependency function to the omap4_pm_init function.
Fixes: 705814b5ea ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e794de514 upstream.
The PWM block is required for system clock source so it must be always
enabled. This patch fixes boot issues on SMDK6410 which did not have
the node enabled explicitly for other purposes.
Fixes: eeb93d02 ("clocksource: of: Respect device tree node status")
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be6688350a upstream.
OMAP wdt driver supports only ti,omap3-wdt compatible. In DRA7 dt
wdt compatible property is defined as ti,omap4-wdt by mistake instead of
ti,omap3-wdt. Correcting the typo.
Fixes: 6e58b8f1da ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board")
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d312cd12e upstream.
CONFIG_GENERIC_CPUFREQ_CPU0 disappeared with commit bbcf071969
("cpufreq: cpu0: rename driver and internals to 'cpufreq_dt'") and some
defconfigs are still using it instead of the new one.
Use the renamed CONFIG_CPUFREQ_DT generic driver.
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d73f825e6e upstream.
The lcd0 node for am437x-sk-evm.dts contains bad LCD timings, and while
they seem to work with a quick test, doing for example blank/unblank
will give you a black display.
This patch updates the timings to the 'typical' values from the LCD spec
sheet.
Also, the compatible string is completely bogus, as
"osddisplays,osd057T0559-34ts" is _not_ a 480x272 panel. The panel on
the board is a newhaven one. Update the compatible string to reflect
this. Note that this hasn't caused any issues, as the "panel-dpi"
matches the driver.
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58230c2c44 upstream.
Caused by a copy & paste error. Note that even with
this bug AM437x SK display still works because GPIO
mux mode is always enabled. It's still wrong to mux
somebody else's pin.
Luckily ball D25 (offset 0x238 - gpio5_8) on AM437x
isn't used for anything.
While at that, also replace a pullup with a pulldown
as that gpio should be normally low, not high.
Acked-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 269ad8015a upstream.
The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is <= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:
- In case of partitioned scheduling (or single CPU), this
happens if task_tick_dl() has been called later than expected
(due to small HZ values). In this case, the current runtime is
also negative, and replenish_dl_entity() can take care of the
deadline miss by recharging the current runtime to a value smaller
than dl_runtime
- In case of global scheduling on multiple CPUs, scheduling
deadlines can be missed even if the task did not consume more
runtime than expected, hence penalizing the task is wrong
This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a503c3be9 upstream.
According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
activate_task(later_rq, next_task, 0);
which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().
enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.
As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 < 2
This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b990789a4 upstream.
The change from \d+ to .+ inside __aligned() means that the following
structure:
struct test {
u8 a __aligned(2);
u8 b __aligned(2);
};
essentially gets modified to
struct test {
u8 a;
};
for purposes of kernel-doc, thus dropping a struct member, which in
turns causes warnings and invalid kernel-doc generation.
Fix this by replacing the catch-all (".") with anything that's not a
semicolon ("[^;]").
Fixes: 9dc30918b2 ("scripts/kernel-doc: handle struct member __aligned without numbers")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 705304a863 upstream.
Same story as in commit 41080b5a24 ("nfsd race fixes: ext2") (similar
ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
of insert_inode_locked() and a bug of a check for dead inodes needs to
be fixed.
If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
If nilfs_iget() is called before nilfs_new_inode() calls
insert_inode_locked4(), it will create an in-core inode and read its
data from the on-disk inode. But, nilfs_iget() will find i_nlink equals
zero and fail at nilfs_read_inode_common(), which will lead it to call
iget_failed() and cleanly fail.
However, this sanity check doesn't work as expected for reused on-disk
inodes because they leave a non-zero value in i_mode field and it
hinders the test of i_nlink. This patch also fixes the issue by
removing the test on i_mode that nilfs2 doesn't need.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 775a9134f4 upstream.
3430LDP has NAND flash with 32 bytes OOB size which is sufficient to hold
BCH8 codes but the small page check introduced in
commit b491da7233 ("mtd: nand: omap: clean-up ecc layout for BCH ecc schemes")
considers anything below 64 bytes unsuitable for BCH4/8/16. There is another
bug in that code where it doesn't skip the check for OMAP_ECC_HAM1_CODE_SW.
Get rid of that small page check code as it is insufficient and redundant
because we are checking for OOB available bytes vs ecc layout before calling
nand_scan_tail().
Fixes: b491da7233 ("mtd: nand: omap: clean-up ecc layout for BCH ecc schemes")
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 834b686552 upstream.
As stated in a5b7616c5, "mtd: m25p80,spi-nor: Fix module aliases for
m25p80", m25p_ids[] in m25p80.c needs to be kept in sync with
spi_nor_ids[] in spi-nor.c. The change here corrects a misalignment.
(We were missing m25px80 and we had a duplicate w25q128.)
Signed-off-by: Alison Chaiken <alison_chaiken@mentor.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68f2981503 upstream.
The torture test should quit once it actually induces an error in the
flash. This step was accidentally removed during refactoring.
Without this fix, the torturetest just continues infinitely, or until
the maximum cycle count is reached. e.g.:
...
[ 7619.218171] mtd_test: error -5 while erasing EB 100
[ 7619.297981] mtd_test: error -5 while erasing EB 100
[ 7619.377953] mtd_test: error -5 while erasing EB 100
[ 7619.457998] mtd_test: error -5 while erasing EB 100
[ 7619.537990] mtd_test: error -5 while erasing EB 100
...
Fixes: 6cf78358c9 ("mtd: mtd_torturetest: use mtd_test helpers")
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 021b77bee2 upstream.
Probably this code was syncing a lot more often then intended because
the do_sync variable wasn't set to zero.
Fixes: c62988ec09 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4df463678 upstream.
If the firmware has declared more than 8 video output devices, and the
one that control the internal panel's backlight is listed after the
first 8 output devices, the _DOD will not include it due to the current
i915 operation region implementation. As a result, we will not create a
backlight device for it while we should. Solve this problem by special
case the firmware that has 8+ output devices in that if we see such a
firmware, we do not test if the device is in _DOD list. The creation of
the backlight device will also enable the firmware to emit events on
backlight hotkey press when the acpi_osi= cmdline option is specified on
those affected ASUS laptops.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70241
Reported-and-tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reported-and-tested-by: Dmitry Tunin <hanipouspilot@gmail.com>
Reported-and-tested-by: Jimbo <jaime.91@hotmail.es>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94ae1db226 upstream.
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.
Change the code to take the reference after the check and only if there
is no conflict.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf7491f1be upstream.
Fix a bug where nfsd4_encode_components_esc() incorrectly calculates the
length of server array in fs_location4--note that it is a count of the
number of array elements, not a length in bytes.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 082d4bd72a (nfsd4: "backfill" using write_bytes_to_xdr_buf)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a64e56976 upstream.
Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as
an additional string encoding.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: e7a0444aef "nfsd: add IPv6 addr escaping to fs_location hosts"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef17af2a81 upstream.
Bugs similar to the one in acbbe6fbb2 (kcmp: fix standard comparison
bug) are in rich supply.
In this variant, the problem is that struct xdr_netobj::len has type
unsigned int, so the expression o1->len - o2->len _also_ has type
unsigned int; it has completely well-defined semantics, and the result
is some non-negative integer, which is always representable in a long
long. But this means that if the conditional triggers, we are
guaranteed to return a positive value from compare_blob.
In this case it could be fixed by
- res = o1->len - o2->len;
+ res = (long long)o1->len - (long long)o2->len;
but I'd rather eliminate the usually broken 'return a - b;' idiom.
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31d4ea1a09 upstream.
An attempt to fix fcopy on i586 (bc5a5b0 Drivers: hv: util: Properly pack the data
for file copy functionality) led to a regression on x86_64 (and actually didn't fix
i586 breakage). Fcopy messages from Hyper-V host come in the following format:
struct do_fcopy_hdr | 36 bytes
0000 | 4 bytes
offset | 8 bytes
size | 4 bytes
data | 6144 bytes
On x86_64 struct hv_do_fcopy matched this format without ' __attribute__((packed))'
and on i586 adding ' __attribute__((packed))' to it doesn't change anything. Keep
the structure packed and add padding to match re reality. Tested both i586 and x86_64
on Hyper-V Server 2012 R2.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04a258c162 upstream.
When build with Debug the following crash is sometimes observed:
Call Trace:
[<ffffffff812b9600>] string+0x40/0x100
[<ffffffff812bb038>] vsnprintf+0x218/0x5e0
[<ffffffff810baf7d>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff812bb4c1>] vscnprintf+0x11/0x30
[<ffffffff8107a2f0>] vprintk+0xd0/0x5c0
[<ffffffffa0051ea0>] ? vmbus_process_rescind_offer+0x0/0x110 [hv_vmbus]
[<ffffffff8155c71c>] printk+0x41/0x45
[<ffffffffa004ebac>] vmbus_device_unregister+0x2c/0x40 [hv_vmbus]
[<ffffffffa0051ecb>] vmbus_process_rescind_offer+0x2b/0x110 [hv_vmbus]
...
This happens due to the following race: between 'if (channel->device_obj)' check
in vmbus_process_rescind_offer() and pr_debug() in vmbus_device_unregister() the
device can disappear. Fix the issue by taking an additional reference to the
device before proceeding to vmbus_device_unregister().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8bfbe2de76 upstream.
Commit 19e2ad6a09 ("n_tty: Remove overflow
tests from receive_buf() path") moved the increment of read_head into
the arguments list of read_buf_addr(). Function calls represent a
sequence point in C. Therefore read_head is incremented before the
character c is placed in the buffer. Since the circular read buffer is
a lock-less design since commit 6d76bd2618
("n_tty: Make N_TTY ldisc receive path lockless"), this creates a race
condition that leads to communication errors.
This patch modifies the code to increment read_head _after_ the data
is placed in the buffer and thus fixes the race for non-SMP machines.
To fix the problem for SMP machines, memory barriers must be added in
a separate patch.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff009ab6d4 upstream.
Replace PAGE_KERNEL with PAGE_KERNEL_EXEC to allow copy_to_user_page
invalidate icache for pages mapped with kmap.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ff383a4c3 upstream.
This patch adds waiting until transmit buffer and shifter will be empty
before clock disabling.
Without this fix it's possible to have clock disabled while data was
not transmited yet, which causes unproper state of TX line and problems
in following data transfers.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aee4e5f3d3 upstream.
When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).
But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.
The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().
I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.
sleep-4290 [000] 537272.259992: sched_switch: sleep:4290 [120] R ==> swapper/0:0 [120]
sleep-4290 [000] 537272.260015: kernel_stack: <stack trace>
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)
After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.
Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 0102874755 "sched: Create more preempt_count accessors"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c6ac78eb3 upstream.
After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
tests inode->i_state locklessly to see whether it already has all the
necessary I_DIRTY bits set. The comment above the barrier doesn't
contain any useful information - memory barriers can't ensure "changes
are seen by all cpus" by itself.
And it sure enough was broken. Please consider the following
scenario.
CPU 0 CPU 1
-------------------------------------------------------------------------------
enters __writeback_single_inode()
grabs inode->i_lock
tests PAGECACHE_TAG_DIRTY which is clear
enters __set_page_dirty()
grabs mapping->tree_lock
sets PAGECACHE_TAG_DIRTY
releases mapping->tree_lock
leaves __set_page_dirty()
enters __mark_inode_dirty()
smp_mb()
sees I_DIRTY_PAGES set
leaves __mark_inode_dirty()
clears I_DIRTY_PAGES
releases inode->i_lock
Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem
to lead to an immediately critical problem because requeue_inode()
later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
deciding whether the inode needs to be requeued for IO and there are
enough unintentional memory barriers inbetween, so while the inode
ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
IO list.
The lack of explicit barrier may also theoretically affect the other
I_DIRTY bits which deal with metadata dirtiness. There is no
guarantee that a strong enough barrier exists between
I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
inode. Filesystem inode writeout path likely has enough stuff which
can behave as full barrier but it's theoretically possible that the
writeout may not see all the updates from ->dirty_inode().
Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note
that I_DIRTY_PAGES needs a special treatment as it always needs to be
cleared to be interlocked with the lockless test on
__mark_inode_dirty() side. It's cleared unconditionally and
reinstated after smp_mb() if the mapping still has dirty pages.
Also add comments explaining how and why the barriers are paired.
Lightly tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9581f97a68 upstream.
A connection timeout affects all volumes of a resource!
Under the following conditions:
A resource with multiple volumes
AND
ko-count >=1
AND
a write request triggers the timeout (ko-count * timeout)
DRBD's internal state gets confused. That in turn may
lead to very miss leading follow up failures. E.g.
"BUG: scheduling while atomic"
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5fabcb4c33 upstream.
We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.
Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 007487f1fd upstream.
Currently we enable Exynos devices in the multi v7 defconfig, however, when
testing on my ODROID-U3, I noticed that USB was not working. Enabling this
option causes USB to work, which enables networking support as well since the
ODROID-U3 has networking on the USB bus.
[arnd] Support for odroid-u3 was added in 3.10, so it would be nice to
backport this fix at least that far.
Signed-off-by: Steev Klimaszewski <steev@gentoo.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7181d005e upstream.
Added new device layout "DEVICE_HWI" and also added the USB VID/PID for the
HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e)
Signed-off-by: Martin Hauke <mardnh@gmx.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11432050f0 upstream.
This patch fixes an issue that the NULL pointer dereference happens
when we uses g_audio driver. Since the g_audio driver will call
usb_ep_disable() in afunc_set_alt() before it calls usb_ep_enable(),
the uep->pipe of renesas usbhs driver will be NULL. So, this patch
adds a condition to avoid the oops.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: 2f98382dc (usb: renesas_usbhs: Add Renesas USBHS Gadget)
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 108cef3aa4 upstream.
It is critical that fetch_block() and handle_stripe_dirtying()
are consistent in their analysis of what needs to be loaded.
Otherwise raid5 can wait forever for a block that won't be loaded.
Currently when writing to a RAID5 that is resyncing, to a location
beyond the resync offset, handle_stripe_dirtying chooses a
reconstruct-write cycle, but fetch_block() assumes a
read-modify-write, and a lockup can happen.
So treat that case just like RAID6, just as we do in
handle_stripe_dirtying. RAID6 always does reconstruct-write.
This bug was introduced when the behaviour of handle_stripe_dirtying
was changed in 3.7, so the patch is suitable for any kernel since,
though it will need careful merging for some versions.
Fixes: a7854487cd
Reported-by: Henry Cai <henryplusplus@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c507de88f6 upstream.
stac_store_hints() does utterly wrong for masking the values for
gpio_dir and gpio_data, likely due to copy&paste errors. Fortunately,
this feature is used very rarely, so the impact must be really small.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49cdd5b641 upstream.
Commit 897c329bc ("ALSA: usb: caiaq: check for cdev->n_streams > 1")
introduced a safety check to protect against bogus data provided by
devices. However, the n_streams variable is already divided by
CHANNELS_PER_STREAM, so the correct check is 'n_streams > 0'.
Fix this to un-break support for stereo devices.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92cb46584e upstream.
Although the 't->length' is a big-endian value, it's used without any
conversion. This means that the driver always uses 'length' parameter.
Fixes: 555e8a8f7f14("ALSA: fireworks: Add command/response functionality into hwdep interface")
Reported-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69eba10e60 upstream.
In olden times the snd_hda_param_read() function always set "*start_id"
but in 2007 we introduced a new return and it causes uninitialized data
bugs in a couple of the callers: print_codec_info() and
hdmi_parse_codec().
Fixes: e8a7f136f5 ('[ALSA] hda-intel - Improve HD-audio codec probing robustness')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d70a1b9893 upstream.
The Arcam rPAC seems to have the same problem - whenever anything
(alsamixer, udevd, 3.9+ kernel from 60af3d037e, ..) attempts to
access mixer / control interface of the card, the firmware "locks up"
the entire device, resulting in
SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error
from alsa-lib.
Other operating systems can somehow read the mixer (there seems to be
playback volume/mute), but any manipulation is ignored by the device
(which has hardware volume controls).
Signed-off-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e2596e81a upstream.
In (6468276 i2c: designware: make SCL and SDA falling time
configurable) new device tree properties were added for setting the
falling time of SDA and SCL. The device tree bindings doc had a typo
in it: it forgot the "-ns" suffix for both properies in the prose of
the bindings.
I assume this is a typo because:
* The source code includes the "-ns"
* The example in the bindings includes the "-ns".
Fix the typo.
Signed-off-by: Doug Anderson <dianders@chromium.org>
Fixes: 6468276b22 ("i2c: designware: make SCL and SDA falling time configurable")
Acked-by: Romain Baeriswyl <romain.baeriswyl@alitech.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf35d6e047 upstream.
`genwqe_user_vmap()` calls `get_user_pages_fast()` and if the return
value is less than the number of pages requested, it frees the pages and
returns an error (`-EFAULT`). However, it fails to consider a negative
error return value from `get_user_pages_fast()`. In that case, the test
`if (rc < m->nr_pages)` will be false (due to promotion of `rc` to a
large `unsigned int`) and the code will continue on to call
`genwqe_map_pages()` with an invalid list of page pointers. Fix it by
bailing out if `get_user_pages_fast()` returns a negative error value.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ddf0b1b11 upstream.
In Linux 3.18 and below, GCC hoists the lsl instructions in the
pvclock code all the way to the beginning of __vdso_clock_gettime,
slowing the non-paravirt case significantly. For unknown reasons,
presumably related to the removal of a branch, the performance issue
is gone as of
e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu
but I don't trust GCC enough to expect the problem to stay fixed.
There should be no correctness issue, because the __getcpu calls in
__vdso_vlock_gettime were never necessary in the first place.
Note to stable maintainers: In 3.18 and below, depending on
configuration, gcc 4.9.2 generates code like this:
9c3: 44 0f 03 e8 lsl %ax,%r13d
9c7: 45 89 eb mov %r13d,%r11d
9ca: 0f 03 d8 lsl %ax,%ebx
This patch won't apply as is to any released kernel, but I'll send a
trivial backported version if needed.
[
Backported by Andy Lutomirski. Should apply to all affected
versions. This fixes a functionality bug as well as a performance
bug: buggy kernels can infinite loop in __vdso_clock_gettime on
affected compilers. See, for exammple:
https://bugzilla.redhat.com/show_bug.cgi?id=1178975
]
Fixes: 51c19b4f59 x86: vdso: pvclock gettime support
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 394f56fe48 upstream.
The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack. To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit. Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.
The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD. The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.
This fixes the implementation. The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits. (Disclaimer: I
don't know what the paxtest code is actually calculating.)
It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning. Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.
In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.
As an easy test, doing this:
for i in `seq 10000`
do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d
used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:
7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000
Note the suspicious fe000 endings. With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a629df7ead upstream.
Since most virtual machines raise this message once, it is a bit annoying.
Make it KERN_DEBUG severity.
Fixes: 7a2e8aaf0f
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1365039d0c upstream.
ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte
lock together with ACCESS_ONCE for the intial read.
union ipte_control {
unsigned long val;
struct {
unsigned long k : 1;
unsigned long kh : 31;
unsigned long kg : 32;
};
};
[...]
static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
{
union ipte_control old, new, *ic;
ic = &vcpu->kvm->arch.sca->ipte_control;
do {
new = old = ACCESS_ONCE(*ic);
new.kh--;
if (!new.kh)
new.k = 0;
} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
if (!new.kh)
wake_up(&vcpu->kvm->arch.ipte_wq);
}
The new value, is loaded twice from memory with gcc 4.7.2 of
fedora 18, despite the ACCESS_ONCE:
--->
l %r4,0(%r3) <--- load first 32 bit of lock (k and kh) in r4
alfi %r4,2147483647 <--- add -1 to r4
llgtr %r4,%r4 <--- zero out the sign bit of r4
lg %r1,0(%r3) <--- load all 64 bit of lock into new
lgr %r2,%r1 <--- load the same into old
risbg %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of
new
llihf %r4,2147483647
ngrk %r4,%r1,%r4
jne aa0 <ipte_unlock+0xf8>
nihh %r1,32767
lgr %r4,%r2
csg %r4,%r1,0(%r3)
cgr %r2,%r4
jne a70 <ipte_unlock+0xc8>
If the memory value changes between the first load (l) and the second
load (lg) we are broken. If that happens VCPU threads will hang
(unkillable) in handle_ipte_interlock.
Andreas Krebbel analyzed this and tracked it down to a compiler bug in
that version:
"while it is not that obvious the C99 standard basically forbids
duplicating the memory access also in that case. For an argumentation of
a similiar case please see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43
For the implementation-defined cases regarding volatile there are some
GCC-specific clarifications which can be found here:
https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles
I've tracked down the problem with a reduced testcase. The problem was
that during a tree level optimization (SRA - scalar replacement of
aggregates) the volatile marker is lost. And an RTL level optimizer (CSE
- common subexpression elimination) then propagated the memory read into
its second use introducing another access to the memory location. So
indeed Christian's suspicion that the union access has something to do
with it is correct (since it triggered the SRA optimization).
This issue has been reported and fixed in the GCC 4.8 development cycle:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145"
This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme
that should work for all supported compilers.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dca485f87 upstream.
some control register changes will flush some aspects of the CPU, e.g.
POP explicitely mentions that for CR9-CR11 "TLBs may be cleared".
Instead of trying to be clever and only flush on specific CRs, let
play safe and flush on all lctl(g) as future machines might define
new bits in CRs. Load control intercept should not happen that often.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b65d6e17fe upstream.
This feature is not supported inside KVM guests yet, because we do not emulate
MSR_IA32_XSS. Mask it out.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df1daba7d1 upstream.
Userspace is expecting non-compacted format for KVM_GET_XSAVE, but
struct xsave_struct might be using the compacted format. Convert
in order to preserve userspace ABI.
Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE
but the kernel will pass it to XRSTORS, and we need to convert back.
Fixes: f31a9f7c71
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Nadav Amit <namit@cs.technion.ac.il>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bacedada6 upstream.
New Genius MousePen i608X devices have a new id 0x501a instead of the
old 0x5011 so add a new #define with "_2" appended and change required
places.
The remaining two checkpatch warnings about line length
being over 80 characters are present in the original files too and this
patch was made in the same style (no line break).
Just adding a new id and changing the required places should make the
new device work without any issues according to the bug report in the
following url.
This patch was made according to and fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=67111
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da940db41d upstream.
Apple bluetooth wireless keyboard (sold in UK) has always reported zero
for battery strength no matter what condition the batteries are actually
in. With this patch applied (applying same quirk as other Apple
keyboards), the battery strength is now correctly reported.
Signed-off-by: Karl Relton <karllinuxtest.relton@ntlworld.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b44c53aeb upstream.
When a hid driver that uses i2c-hid as transport is unloaded, the hid core
will call i2c_hid_stop() which releases all the buffers associated with the
device. This includes also the command buffer.
Now, when the i2c-hid driver itself is unloaded it tries to power down the
device by sending it PWR_SLEEP command. Since the command buffer is already
released we get following crash:
[ 79.691459] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 79.691532] IP: [<ffffffffa05bc049>] __i2c_hid_command+0x49/0x310 [i2c_hid]
...
[ 79.693467] Call Trace:
[ 79.693494] [<ffffffff810424e1>] ? __unmask_ioapic+0x21/0x30
[ 79.693537] [<ffffffff81042855>] ? unmask_ioapic+0x25/0x40
[ 79.693581] [<ffffffffa05bc35b>] ? i2c_hid_set_power+0x4b/0xa0 [i2c_hid]
[ 79.693632] [<ffffffffa05bc3cf>] ? i2c_hid_runtime_resume+0x1f/0x30 [i2c_hid]
[ 79.693689] [<ffffffff814c08fb>] ? __rpm_callback+0x2b/0x70
[ 79.693733] [<ffffffff814c0961>] ? rpm_callback+0x21/0x90
[ 79.693776] [<ffffffff814c0dec>] ? rpm_resume+0x41c/0x600
[ 79.693820] [<ffffffff814c1e1c>] ? __pm_runtime_resume+0x4c/0x80
[ 79.693868] [<ffffffff814b8588>] ? __device_release_driver+0x28/0x100
[ 79.693917] [<ffffffff814b8d90>] ? driver_detach+0xa0/0xb0
[ 79.693959] [<ffffffff814b82cc>] ? bus_remove_driver+0x4c/0xb0
[ 79.694006] [<ffffffff810d1cfd>] ? SyS_delete_module+0x11d/0x1d0
[ 79.694054] [<ffffffff8165f107>] ? int_signal+0x12/0x17
[ 79.694095] [<ffffffff8165ee69>] ? system_call_fastpath+0x12/0x17
Fix this so that we only free buffers when the i2c-hid driver itself is
removed.
Fixes: 34f439e4af ("HID: i2c-hid: add runtime PM support")
Reported-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 606185b20c upstream.
This is a static checker fix. We write some binary settings to the
sysfs file. One of the settings is the "->startup_profile". There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.
I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1c7e29e8d upstream.
Before ->start() is called, bufsize size is set to HID_MIN_BUFFER_SIZE,
64 bytes. While processing the IRQ, we were asking to receive up to
wMaxInputLength bytes, which can be bigger than 64 bytes.
Later, when ->start is run, a proper bufsize will be calculated.
Given wMaxInputLength is said to be unreliable in other part of the
code, set to receive only what we can even if it results in truncated
reports.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6296f4a8eb upstream.
Current driver uses a common buffer for reading reports either
synchronously in i2c_hid_get_raw_report() and asynchronously in
the interrupt handler.
There is race condition if an interrupt arrives immediately after
the report is received in i2c_hid_get_raw_report(); the common
buffer is modified by the interrupt handler with the new report
and then i2c_hid_get_raw_report() proceed using wrong data.
Fix it by using a separate buffers for synchronous reports.
Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol@invensense.com>
[Antonio Borneo: cleanup, rebase to v3.17, submit mainline]
Signed-off-by: Antonio Borneo <borneo.antonio@gmail.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dff6741688 upstream.
Since the conversion from USB to HID (in v3.17), some people reported a
freeze on boot with the wacom driver. Hans managed to get a stacktrace:
[ 240.272331] Call Trace:
[ 240.272338] [<ffffffff813de7b9>] ? usb_hcd_submit_urb+0xa9/0xb10
[ 240.272347] [<ffffffff81555579>] schedule+0x29/0x70
[ 240.272355] [<ffffffff815559e6>] schedule_preempt_disabled+0x16/0x20
[ 240.272363] [<ffffffff81557365>] __mutex_lock_slowpath+0xe5/0x230
[ 240.272372] [<ffffffff815574c7>] mutex_lock+0x17/0x30
[ 240.272380] [<ffffffffa063c1d2>] wacom_resume+0x22/0x50 [wacom]
[ 240.272396] [<ffffffffa01aea8a>] hid_resume_common+0xba/0x110 [usbhid]
[ 240.272404] [<ffffffff813e5890>] ? usb_runtime_suspend+0x80/0x80
[ 240.272417] [<ffffffffa01aeb1d>] hid_resume+0x3d/0x70 [usbhid]
[ 240.272425] [<ffffffff813e44a6>] usb_resume_interface.isra.6+0xb6/0x120
[ 240.272432] [<ffffffff813e4774>] usb_resume_both+0x74/0x140
[ 240.272439] [<ffffffff813e58aa>] usb_runtime_resume+0x1a/0x20
[ 240.272446] [<ffffffff813b1912>] __rpm_callback+0x32/0x70
[ 240.272453] [<ffffffff813b1976>] rpm_callback+0x26/0xa0
[ 240.272460] [<ffffffff813b2d71>] rpm_resume+0x4b1/0x690
[ 240.272468] [<ffffffff812ab992>] ? radix_tree_lookup_slot+0x22/0x50
[ 240.272475] [<ffffffff813b2c1a>] rpm_resume+0x35a/0x690
[ 240.272482] [<ffffffff8116e9c9>] ? zone_statistics+0x89/0xa0
[ 240.272489] [<ffffffff813b2f90>] __pm_runtime_resume+0x40/0x60
[ 240.272497] [<ffffffff813e4272>] usb_autopm_get_interface+0x22/0x60
[ 240.272509] [<ffffffffa01ae8d9>] usbhid_open+0x59/0xe0 [usbhid]
[ 240.272517] [<ffffffffa063ac85>] wacom_open+0x35/0x50 [wacom]
[ 240.272525] [<ffffffff813f37b9>] input_open_device+0x79/0xa0
[ 240.272534] [<ffffffffa048d1c1>] evdev_open+0x1b1/0x200 [evdev]
[ 240.272543] [<ffffffff811c899e>] chrdev_open+0xae/0x1f0
[ 240.272549] [<ffffffff811c88f0>] ? cdev_put+0x30/0x30
[ 240.272556] [<ffffffff811c17e2>] do_dentry_open+0x1d2/0x320
[ 240.272562] [<ffffffff811c1cd1>] finish_open+0x31/0x50
[ 240.272571] [<ffffffff811d2202>] do_last.isra.36+0x652/0xe50
[ 240.272579] [<ffffffff811d2ac7>] path_openat+0xc7/0x6f0
[ 240.272586] [<ffffffff811cf012>] ? final_putname+0x22/0x50
[ 240.272594] [<ffffffff811d42d2>] ? user_path_at_empty+0x72/0xd0
[ 240.272602] [<ffffffff811d43fd>] do_filp_open+0x4d/0xc0
[...]
So here, wacom_open is called, and then wacom_resume is called by the
PM system. However, wacom_open already took the lock when wacom_resume
tries to get it. Freeze.
A little bit of history shows that this already happened in the past
- commit f6cd378372 ("Input: wacom - fix runtime PM related deadlock"),
and the solution was to call first the PM function before taking the lock.
The lock was introduced in commit commit e722409445 ("Input: wacom -
implement suspend and autosuspend") when the autosuspend feature has
been added. Given that usbhid already takes care of this very same
locking between suspend/resume, I think we can simply kill the lock
in open/close.
The lock is now used also with LEDs, so we can not remove it completely.
Reported-by: Hans Spath <inbox-546@hans-spath.de>
Tested-by: Hans Spath <inbox-546@hans-spath.de>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06a41a99d1 upstream.
When a CPU is hotplugged, the current blk-mq spews a warning like:
kobject '(null)' (ffffe8ffffc8b5d8): tried to add an uninitialized object, something is seriously wrong.
CPU: 1 PID: 1386 Comm: systemd-udevd Not tainted 3.18.0-rc7-2.g088d59b-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_171129-lamiak 04/01/2014
0000000000000000 0000000000000002 ffffffff81605f07 ffffe8ffffc8b5d8
ffffffff8132c7a0 ffff88023341d370 0000000000000020 ffff8800bb05bd58
ffff8800bb05bd08 000000000000a0a0 000000003f441940 0000000000000007
Call Trace:
[<ffffffff81005306>] dump_trace+0x86/0x330
[<ffffffff81005644>] show_stack_log_lvl+0x94/0x170
[<ffffffff81006d21>] show_stack+0x21/0x50
[<ffffffff81605f07>] dump_stack+0x41/0x51
[<ffffffff8132c7a0>] kobject_add+0xa0/0xb0
[<ffffffff8130aee1>] blk_mq_register_hctx+0x91/0xb0
[<ffffffff8130b82e>] blk_mq_sysfs_register+0x3e/0x60
[<ffffffff81309298>] blk_mq_queue_reinit_notify+0xf8/0x190
[<ffffffff8107cfdc>] notifier_call_chain+0x4c/0x70
[<ffffffff8105fd23>] cpu_notify+0x23/0x50
[<ffffffff81060037>] _cpu_up+0x157/0x170
[<ffffffff810600d9>] cpu_up+0x89/0xb0
[<ffffffff815fa5b5>] cpu_subsys_online+0x35/0x80
[<ffffffff814323cd>] device_online+0x5d/0xa0
[<ffffffff81432485>] online_store+0x75/0x80
[<ffffffff81236a5a>] kernfs_fop_write+0xda/0x150
[<ffffffff811c5532>] vfs_write+0xb2/0x1f0
[<ffffffff811c5f42>] SyS_write+0x42/0xb0
[<ffffffff8160c4ed>] system_call_fastpath+0x16/0x1b
[<00007f0132fb24e0>] 0x7f0132fb24e0
This is indeed because of an uninitialized kobject for blk_mq_ctx.
The blk_mq_ctx kobjects are initialized in blk_mq_sysfs_init(), but it
goes loop over hctx_for_each_ctx(), i.e. it initializes only for
online CPUs. Thus, when a CPU is hotplugged, the ctx for the newly
onlined CPU is registered without initialization.
This patch fixes the issue by initializing the all ctx kobjects
belonging to each queue.
Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=908794
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c38d185d4a upstream.
What we need is the following two guarantees:
* Any thread that observes the effect of the test_and_set_bit() by
__bt_get_word() also observes the preceding addition of 'current'
to the appropriate wait list. This is guaranteed by the semantics
of the spin_unlock() operation performed by prepare_and_wait().
Hence the conversion of test_and_set_bit_lock() into
test_and_set_bit().
* The wait lists are examined by bt_clear() after the tag bit has
been cleared. clear_bit_unlock() guarantees that any thread that
observes that the bit has been cleared also observes the store
operations preceding clear_bit_unlock(). However,
clear_bit_unlock() does not prevent that the wait lists are examined
before that the tag bit is cleared. Hence the addition of a memory
barrier between clear_bit() and the wait list examination.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e98e9d7cf upstream.
If __bt_get_word() is called with last_tag != 0, if the first
find_next_zero_bit() fails, if after wrap-around the
test_and_set_bit() call fails and find_next_zero_bit() succeeds,
if the next test_and_set_bit() call fails and subsequently
find_next_zero_bit() does not find a zero bit, then another
wrap-around will occur. Avoid this by introducing an additional
local variable.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45a9c9d909 upstream.
blk-mq users are allowed to free the memory request_queue.tag_set
points at after blk_cleanup_queue() has finished but before
blk_release_queue() has started. This can happen e.g. in the SCSI
core. The SCSI core namely embeds the tag_set structure in a SCSI
host structure. The SCSI host structure is freed by
scsi_host_dev_release(). This function is called after
blk_cleanup_queue() finished but can be called before
blk_release_queue().
This means that it is not safe to access request_queue.tag_set from
inside blk_release_queue(). Hence remove the blk_sync_queue() call
from blk_release_queue(). This call is not necessary - outstanding
requests must have finished before blk_release_queue() is
called. Additionally, move the blk_mq_free_queue() call from
blk_release_queue() to blk_cleanup_queue() to avoid that struct
request_queue.tag_set gets accessed after it has been freed.
This patch avoids that the following kernel oops can be triggered
when deleting a SCSI host for which scsi-mq was enabled:
Call Trace:
[<ffffffff8109a7c4>] lock_acquire+0xc4/0x270
[<ffffffff814ce111>] mutex_lock_nested+0x61/0x380
[<ffffffff812575f0>] blk_mq_free_queue+0x30/0x180
[<ffffffff8124d654>] blk_release_queue+0x84/0xd0
[<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
[<ffffffff8126c140>] kobject_put+0x30/0x70
[<ffffffff81245895>] blk_put_queue+0x15/0x20
[<ffffffff8125c409>] disk_release+0x99/0xd0
[<ffffffff8133d056>] device_release+0x36/0xb0
[<ffffffff8126c29b>] kobject_cleanup+0x7b/0x1a0
[<ffffffff8126c140>] kobject_put+0x30/0x70
[<ffffffff8125a78a>] put_disk+0x1a/0x20
[<ffffffff811d4cb5>] __blkdev_put+0x135/0x1b0
[<ffffffff811d56a0>] blkdev_put+0x50/0x160
[<ffffffff81199eb4>] kill_block_super+0x44/0x70
[<ffffffff8119a2a4>] deactivate_locked_super+0x44/0x60
[<ffffffff8119a87e>] deactivate_super+0x4e/0x70
[<ffffffff811b9833>] cleanup_mnt+0x43/0x90
[<ffffffff811b98d2>] __cleanup_mnt+0x12/0x20
[<ffffffff8107252c>] task_work_run+0xac/0xe0
[<ffffffff81002c01>] do_notify_resume+0x61/0xa0
[<ffffffff814d2c58>] int_signal+0x12/0x17
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliott <elliott@hp.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a33c1ba291 upstream.
We currently use num_possible_cpus(), but that breaks on sparc64 where
the CPU ID space is discontig. Use nr_cpu_ids as the highest CPU ID
instead, so we don't end up reading from invalid memory.
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62c22167dd upstream.
Since commit 1196c2f a domain is only destroyed in the
notifier path if it is hot-unplugged. This caused a
domain leakage in iommu_attach_device when a driver was
unbound from the device and bound to VFIO. In this case the
device is attached to a new domain and unlinked from the old
domain. At this point nothing points to the old domain
anymore and its memory is leaked.
Fix this by explicitly freeing the old domain in
iommu_attach_domain.
Fixes: 1196c2f (iommu/vt-d: Fix dmar_domain leak in iommu_attach_device)
Tested-by: Jerry Hoemann <jerry.hoemann@hp.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc4f14aa17 upstream.
There's an off-by-one bug in function __domain_mapping(), which may
trigger the BUG_ON(nr_pages < lvl_pages) when
(nr_pages + 1) & superpage_mask == 0
The issue was introduced by commit 9051aa0268 "intel-iommu: Combine
domain_pfn_mapping() and domain_sg_mapping()", which sets sg_res to
"nr_pages + 1" to avoid some of the 'sg_res==0' code paths.
It's safe to remove extra "+1" because sg_res is only used to calculate
page size now.
Reported-And-Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa5ad3b6eb upstream.
If the erase worker is unable to erase a PEB it will
free the ubi_wl_entry itself.
The failing ubi_wl_entry must not free()'d again after
do_sync_erase() returns.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f38aed975c upstream.
The logic of vfree()'ing vol->upd_buf is tied to vol->updating.
In ubi_start_update() vol->updating is set long before vmalloc()'ing
vol->upd_buf. If we encounter a write failure in ubi_start_update()
before vmalloc() the UBI device release function will try to vfree()
vol->upd_buf because vol->updating is set.
Fix this by allocating vol->upd_buf directly after setting vol->updating.
Fixes:
[ 31.559338] UBI warning: vol_cdev_release: update of volume 2 not finished, volume is damaged
[ 31.559340] ------------[ cut here ]------------
[ 31.559343] WARNING: CPU: 1 PID: 2747 at mm/vmalloc.c:1446 __vunmap+0xe3/0x110()
[ 31.559344] Trying to vfree() nonexistent vm area (ffffc90001f2b000)
[ 31.559345] Modules linked in:
[ 31.565620] 0000000000000bba ffff88002a0cbdb0 ffffffff818f0497 ffff88003b9ba148
[ 31.566347] ffff88002a0cbde0 ffffffff8156f515 ffff88003b9ba148 0000000000000bba
[ 31.567073] 0000000000000000 0000000000000000 ffff88002a0cbe88 ffffffff8156c10a
[ 31.567793] Call Trace:
[ 31.568034] [<ffffffff818f0497>] dump_stack+0x4e/0x7a
[ 31.568510] [<ffffffff8156f515>] ubi_io_write_vid_hdr+0x155/0x160
[ 31.569084] [<ffffffff8156c10a>] ubi_eba_write_leb+0x23a/0x870
[ 31.569628] [<ffffffff81569b36>] vol_cdev_write+0x226/0x380
[ 31.570155] [<ffffffff81179265>] vfs_write+0xb5/0x1f0
[ 31.570627] [<ffffffff81179f8a>] SyS_pwrite64+0x6a/0xa0
[ 31.571123] [<ffffffff818fde12>] system_call_fastpath+0x16/0x1b
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 027bc8b082 upstream.
On some ARMs the memory can be mapped pgprot_noncached() and still
be working for atomic operations. As pointed out by Colin Cross
<ccross@android.com>, in some cases you do want to use
pgprot_noncached() if the SoC supports it to see a debug printk
just before a write hanging the system.
On ARMs, the atomic operations on strongly ordered memory are
implementation defined. So let's provide an optional kernel parameter
for configuring pgprot_noncached(), and use pgprot_writecombine() by
default.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rob Herring <robherring2@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ae9cb8193 upstream.
Currently trying to use pstore on at least ARMs can hang as we're
mapping the peristent RAM with pgprot_noncached().
On ARMs, pgprot_noncached() will actually make the memory strongly
ordered, and as the atomic operations pstore uses are implementation
defined for strongly ordered memory, they may not work. So basically
atomic operations have undefined behavior on ARM for device or strongly
ordered memory types.
Let's fix the issue by using write-combine variants for mappings. This
corresponds to normal, non-cacheable memory on ARM. For many other
architectures, this change does not change the mapping type as by
default we have:
#define pgprot_writecombine pgprot_noncached
The reason why pgprot_noncached() was originaly used for pstore
is because Colin Cross <ccross@android.com> had observed lost
debug prints right before a device hanging write operation on some
systems. For the platforms supporting pgprot_noncached(), we can
add a an optional configuration option to support that. But let's
get pstore working first before adding new features.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Anton Vorontsov <cbouatmailru@gmail.com>
Cc: Colin Cross <ccross@android.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: linux-kernel@vger.kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
[tony@atomide.com: updated description]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36e8164882 upstream.
Commit 6ac665c63d ("PCI: rewrite PCI BAR reading code") masked off
low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(),
which compares 'base == maxbase' to check for read-only BARs. The masking
of 'l' means that comparison will never be 'true', so the check for
read-only BARs no longer works.
Resolve this by also masking off the low-order bits of 'sz' before passing
it into pci_size() as 'maxbase'. With this change, pci_size() will once
again catch the problems that have been encountered to date:
- AGP aperture BAR of AMD-7xx host bridges: if the AGP window is
disabled, this BAR is read-only and read as 0x00000008 [1]
- BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]
Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git))
Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a8fc95c87 upstream.
When connectable mode is enabled (page scan on) through some non-mgmt
method the HCI_CONNECTABLE flag will not be set. For backwards
compatibility with user space versions not using mgmt we should not
require HCI_CONNECTABLE to be set if HCI_MGMT is not set.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8bfe8442ff upstream.
When controllers set the HCI_QUIRK_INVALID_BDADDR flag, it is required
by userspace to program a valid public Bluetooth device address into
the controller before it can be used.
After successful address configuration, the internal state changes and
the controller runs the complete initialization procedure. However one
small difference is that this is no longer the HCI_SETUP stage. The
HCI_SETUP stage is only valid during initial controller setup. In this
case the stack runs the initialization as part of the HCI_CONFIG stage.
The controller version information, default name and supported commands
are only stored during HCI_SETUP. While these information are static,
they are not read initially when HCI_QUIRK_INVALID_BDADDR is set. So
when running in HCI_CONFIG state, these information need to be updated
as well.
This especially impacts Bluetooth 4.1 and later controllers using
extended feature pages and second event mask page.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4d5504d5c upstream.
The internal representation of the LE white list needs to be cleared
when receiving a successful HCI_Reset command. A reset of the controller
is expected to start with an empty LE white list.
When the LE white list is not cleared on controller reset, the passive
background scanning might skip programming the remote devices. Only
changes to the LE white list are programmed when passive background
is started.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b1db38ca2 upstream.
These days we allow simultaneous LE scanning and advertising. Checking
for whether advertising is enabled or not is therefore not a reliable
way to determine whether directed advertising was used to trigger the
connection creation. The appropriate place to check (instead of the hdev
context) is the connection role that's stored in the hci_conn. This
patch fixes such a check in le_conn_timeout() which could otherwise lead
to incorrect HCI commands being sent.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 980ffc0a2c upstream.
The le_conn_timeout() may call hci_le_conn_failed() which in turn may
call hci_conn_del(). Trying to use the _sync variant for cancelling the
conn timeout from hci_conn_del() could therefore result in a deadlock.
This patch converts hci_conn_del() to use the non-sync variant so the
deadlock is not possible.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0c42cd7b2 upstream.
This patch reverts commit:
a7807d73 ("Bluetooth: 6lowpan: Avoid memory leak if memory allocation
fails")
which was wrong suggested by Alexander Aring. The function skb_unshare
run also kfree_skb on failure.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f34b6c72c3 upstream.
The 24x7 counters are continuously running and not updated on an
interrupt. So we record the event counts when stopping the event or
deleting it.
But to "read" a single counter in 24x7, we allocate a page and pass it
into the hypervisor (The HV returns the page full of counters from which
we extract the specific counter for this event).
We allocate a page using GFP_USER and when deleting the event, we end up
with the following warning because we are blocking in interrupt context.
[ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000
We could use GFP_ATOMIC but that could result in failures. Pre-allocate
a buffer so we don't have to allocate in interrupt context. Further as
Michael Ellerman suggested, use Per-CPU buffer so we only need to
allocate once per CPU.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8117ac6a6c upstream.
Currently, when going idle, we set the flag indicating that we are in
nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap
(or sleep or rvwinkle) instruction, all with the MMU on. This is bad
for two reasons: (a) the architecture specifies that those instructions
must be executed with the MMU off, and in fact with only the SF, HV, ME
and possibly RI bits set, and (b) this introduces a race, because as
soon as we set the flag, another thread can switch the MMU to a guest
context. If the race is lost, this thread will typically start looping
on relocation-on ISIs at 0xc...4400.
This fixes it by setting the MSR as required by the architecture before
setting the flag or executing the nap/sleep/rvwinkle instruction.
[ shreyas@linux.vnet.ibm.com: Edited to handle LE ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 682e77c861 upstream.
The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting
in partial invalidation of TLBs which is not right. This patch fixes
that by passing IS=0xc00 to invalidate whole TLB for successful recovery
from TLB and ERAT errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd32e2dcc9 upstream.
We have some code in udbg_uart_getc_poll() that tries to protect
against a NULL udbg_uart_in, but gets it all wrong.
Found with the LLVM static analyzer (scan-build).
Fixes: 309257484c ("powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Add some newlines for readability while we're here]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9972fc0b85 upstream.
Commit 6071c22e17 "ktest: Rewrite the config-bisect to actually work"
fixed the config-bisect to work nicely but in doing so it broke
make_min_config by changing the way assign_configs works.
The assign_configs function now adds the config to the hash even if
it is disabled, but changes the hash value to be that of the
line "# CONFIG_FOO is not set". Unfortunately, the make_min_config
test only checks to see if the config is removed. It now needs to
check if the config is in the hash and not set to be disabled.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3475c3d034 upstream.
Flush the FIFOs when the stream is prepared for use. This avoids
an inadvertent swapping of the left/right channels if the FIFOs are
not empty at startup.
Signed-off-by: Andrew Jackson <Andrew.Jackson@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 681a195603 upstream.
When the codec is connected using i2c, it will only auto-increment
register addresses if msb (0x80) of the register address byte is set.
[Fixes cache sync if multiple adjacent registers are updated -- broonie]
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48826ee590 upstream.
Commit 5fe5b767dc ("ASoC: dapm: Do not pretend to support controls for non
mixer/mux widgets") revealed ill-defined control in a route between
"STENL Mux" and DACs in max98090.c:
max98090 i2c-193C9890:00: Control not supported for path STENL Mux -> [NULL] -> DACL
max98090 i2c-193C9890:00: ASoC: no dapm match for STENL Mux --> NULL --> DACL
max98090 i2c-193C9890:00: ASoC: Failed to add route STENL Mux -> NULL -> DACL
max98090 i2c-193C9890:00: Control not supported for path STENL Mux -> [NULL] -> DACR
max98090 i2c-193C9890:00: ASoC: no dapm match for STENL Mux --> NULL --> DACR
max98090 i2c-193C9890:00: ASoC: Failed to add route STENL Mux -> NULL -> DACR
Since there is no control between "STENL Mux" and DACs the control name must
be NULL not "NULL".
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50c0f21b42 upstream.
Make sure to check the version field of the firmware header to make sure to
not accidentally try to parse a firmware file with a different layout.
Trying to do so can result in loading invalid firmware code to the device.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 077661b6ed upstream.
The of_node_put() call in eukrea_tlv320_probe() may take an
uninitialized pointer, as compiler spotted out:
sound/soc/fsl/eukrea-tlv320.c:221:14: warning: 'ssi_np' may be used uninitialized in this function [-Wuninitialized]
This patch adds the proper NULL initializations as a fix.
(codec_np is also NULL initialized just for consistency.)
Fixes: 66f232908d ('ASoC: eukrea-tlv320: Add DT support')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e4982f6a5 upstream.
Like with ath9k, ath5k queues also need to be ordered by priority.
queue_info->tqi_subtype already contains the correct index, so use it
instead of relying on the order of ath5k_hw_setup_tx_queue calls.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a12a07e44 upstream.
Since the commit below, iwldvm sends the FLUSH command to
the firmware. All the devices that use iwldvm have a
firmware that expects the _v3 version of this command,
besides 5150.
5150's latest available firmware still expects a _v2 version
of the FLUSH command.
This means that since the commit below, we had a mismatch for
this specific device only.
This mismatch led to the NMI below:
Loaded firmware version: 8.24.2.2
Start IWL Error Log Dump:
Status: 0x0000004C, count: 5
0x00000004 | NMI_INTERRUPT_WDG
0x000006F4 | uPc
0x000005BA | branchlink1
0x000006F8 | branchlink2
0x000008C2 | interruptlink1
0x00005B02 | interruptlink2
0x00000002 | data1
0x07030000 | data2
0x00000068 | line
0x3E80510C | beacon time
0x728A0EF4 | tsf low
0x0000002A | tsf hi
0x00000000 | time gp1
0x01BDC977 | time gp2
0x00000000 | time gp3
0x00010818 | uCode version
0x00000000 | hw version
0x00484704 | board version
0x00000002 | hcmd
0x2FF23080 | isr0
0x0103E000 | isr1
0x0000001A | isr2
0x1443FCC3 | isr3
0x11800112 | isr4
0x00000068 | isr_pref
0x000000D4 | wait_event
0x00000000 | l2p_control
0x00000007 | l2p_duration
0x00103040 | l2p_mhvalid
0x00000007 | l2p_addr_match
0x00000000 | lmpm_pmg_sel
0x00000000 | timestamp
0x00000200 | flow_handler
This was reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=88961
Fixes: a0855054e5 ("iwlwifi: dvm: drop non VO frames when flushing")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c3fc8d26d upstream.
Need to pass the pointer within the swiotlb internal buffer to the
swiotlb library, that in the case of xen_unmap_single is dev_addr, not
paddr.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c884227eaa upstream.
On x86 truncation cannot occur because config XEN depends on X86_64 ||
(X86_32 && X86_PAE).
On ARM truncation can occur without CONFIG_ARM_LPAE, when the dma
operation involves foreign grants. However in that case the physical
address returned by xen_bus_to_phys is actually invalid (there is no mfn
to pfn tracking for foreign grants on ARM) and it is not used.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc50ddcd4c upstream.
This patchs fixes a misplaced call to memset() that fills the request
buffer with 0. The problem was with sending PCAN_USBPRO_REQ_FCT
requests, the content set by the caller was thus lost.
With this patch, the memory area is zeroed only when requesting info
from the device.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af35d0f1cc upstream.
This patch sets the correct reverse sequence order to the instructions
set to run, when any failure occurs during the initialization steps.
It also adds the missing unregistration call of the can device if the
failure appears after having been registered.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78063d81d3 upstream.
Hardware queues are ordered by priority. Use queue index 0 for BK, which
has lower priority than BE.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad8fdccf9c upstream.
The driver passes the desired hardware queue index for a WMM data queue
in qinfo->tqi_subtype. This was ignored in ath9k_hw_setuptxqueue, which
instead relied on the order in which the function is called.
Reported-by: Hubert Feurstein <h.feurstein@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53dc20b9a3 upstream.
In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of
new_dentry not old_dentry. We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.
With this change, hard linking works again, when paths are relative with
at least one subdirectory. This is how the problem was reproducable:
# mkdir a
# mkdir b
# touch a/test
# ln a/test b/test
ln: failed to create hard link `b/test' => `a/test': No such file or directory
However when creating links in the same dir, it worked well.
Now the link gets created.
Fixes: 0e048316ff ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Szabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: Aron Szabo <aron@ubit.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 136f49b917 upstream.
For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
for the read lock of journal->j_trans_barrier. Holding page lock and
ask for journal->j_trans_barrier breaks the locking order.
This will cause a deadlock with journal commit threads, ocfs2cmt will
get write lock of journal->j_trans_barrier first, then it wakes up
kjournald2 to do the commit work, at last it waits until done. To
commit journal, kjournald2 needs flushing data first, it needs get the
cache page lock.
Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.
unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5945b28803 upstream.
When Intersil ISL12057 support was added by commit 70e123373c ("rtc: Add
support for Intersil ISL12057 I2C RTC chip"), two masks for time registers
values imported from the device were either wrong or omitted, leading to
additional bits from those registers to impact read values:
- mask for hour register value when reading it in AM/PM mode. As
AM/PM mode is not the usual mode used by the driver, this error
would only have an impact on an externally configured RTC hour
later read by the driver.
- mask for month value. The lack of masking would provide an
erroneous value if century bit is set.
This patch fixes those two masks.
Fixes: 70e123373c ("rtc: Add support for Intersil ISL12057 I2C RTC chip")
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Peter Huewe <peter.huewe@infineon.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Grant Likely <grant.likely@linaro.org>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44c63a570a upstream.
This series fixes a few issues with the omap rtc-driver, cleans up a
bit, adds device abstraction, and finally adds support for the PMIC
control feature found in some revisions of this RTC IP block.
Ultimately, this allows for powering off the Beaglebone and waking it up
again on RTC alarms.
This patch (of 20):
Make sure not to reset the clock-source configuration when enabling the
32kHz clock mux.
Until the clock source can be configured through device tree we must not
overwrite settings made by the bootloader (e.g. clock-source
selection).
Fixes: cd914bba03 ("drivers/rtc/rtc-omap.c: add support for enabling 32khz clock")
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Benot Cousson <bcousson@baylibre.com>
Cc: Lokesh Vutla <lokeshvutla@ti.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Sekhar Nori <nsekhar@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Keerthy J <j-keerthy@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 678886bdc6 upstream.
When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.
This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.
I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:
[PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
The corruption always happened when a transaction aborted and then fsck complained
like this:
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
*** fsck.btrfs output ***
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
read block failed check_tree_block
Couldn't open file system
In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:
1) transaction N started, fs_info->pinned_extents pointed to
fs_info->freed_extents[0];
2) node/eb 94945280 is created;
3) eb is persisted to disk;
4) transaction N commit starts, fs_info->pinned_extents now points to
fs_info->freed_extents[1], and transaction N completes;
5) transaction N + 1 starts;
6) eb is COWed, and btrfs_free_tree_block() called for this eb;
7) eb range (94945280 to 94945280 + 16Kb) is added to
fs_info->pinned_extents (fs_info->freed_extents[1]);
8) Something goes wrong in transaction N + 1, like hitting ENOSPC
for example, and the transaction is aborted, turning the fs into
readonly mode. The stack trace I got for example:
[112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
[112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
[112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
[112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
[112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
[112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
[112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
[112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
[112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
(...)
[112065.264493] ---[ end trace dd7903a975a31a08 ]---
[112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
[112065.264997] BTRFS info (device sdc): forced readonly
9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
turn calls btrfs_destroy_pinned_extent();
10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
marked as dirty in fs_info->freed_extents[], and for each one
it calls discard, if the fs was mounted with "-o discard", and
adds the range to the free space cache of the respective block
group;
11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
sees the free space entries and performs a discard;
12) After an umount and mount (or fsck), our eb's location on disk was full
of zeroes, and it should have been untouched, because it was marked as
dirty in the fs_info->pinned_extents tree, and therefore used by the
trees that the last committed superblock points to.
Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.
This isn't a new problem, as it's been present since 2011 (git commit
acce952b02).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50d9aa99bd upstream.
Liu Bo pointed out that my previous fix would lose the generation update in the
scenario I described. It is actually much worse than that, we could lose the
entire extent if we lose power right after the transaction commits. Consider
the following
write extent 0-4k
log extent in log tree
commit transaction
< power fail happens here
ordered extent completes
We would lose the 0-4k extent because it hasn't updated the actual fs tree, and
the transaction commit will reset the log so it isn't replayed. If we lose
power before the transaction commit we are save, otherwise we are not.
Fix this by keeping track of all extents we logged in this transaction. Then
when we go to commit the transaction make sure we wait for all of those ordered
extents to complete before proceeding. This will make sure that if we lose
power after the transaction commit we still have our data. This also fixes the
problem of the improperly updated extent generation. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a28046956c upstream.
We use the modified list to keep track of which extents have been modified so we
know which ones are candidates for logging at fsync() time. Newly modified
extents are added to the list at modification time, around the same time the
ordered extent is created. We do this so that we don't have to wait for ordered
extents to complete before we know what we need to log. The problem is when
something like this happens
log extent 0-4k on inode 1
copy csum for 0-4k from ordered extent into log
sync log
commit transaction
log some other extent on inode 1
ordered extent for 0-4k completes and adds itself onto modified list again
log changed extents
see ordered extent for 0-4k has already been logged
at this point we assume the csum has been copied
sync log
crash
On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
which is the same one that we are replaying which also drops the csum, and then
we won't find the csum in the log for that bytenr. This of course causes us to
have errors about not having csums for certain ranges of our inode. So remove
the modified list manipulation in unpin_extent_cache, any modified extents
should have been added well before now, and we don't want them re-logged. This
fixes my test that I could reliably reproduce this problem with. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d95c1bec9 upstream.
The sizes that are obtained from space infos are in raw units and have
to be adjusted according to the raid factor. This was missing for
f_bavail and df reported doubled size for raid1.
Reported-by: Martin Steigerwald <Martin@lichtvoll.de>
Fixes: ba7b6e62f4 ("btrfs: adjust statfs calculations according to raid profiles")
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dba8cf128 upstream.
If we have two fsync()'s race on different subvols one will do all of its work
to get into the log_tree, wait on it's outstanding IO, and then allow the
log_tree to finish it's commit. The problem is we were just free'ing that
subvols logged extents instead of waiting on them, so whoever lost the race
wouldn't really have their data on disk. Fix this by waiting properly instead
of freeing the logged extents. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 942080643b upstream.
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 332b122d39 upstream.
The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:
e77a56d [PATCH] eCryptfs: Encrypted passthrough
This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Priya Bansal <p.bansal@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e237ec37ec upstream.
Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1d47b2629 upstream.
UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.
Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e159332b9a upstream.
Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e5cc9a40a upstream.
Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b101e2a3c upstream.
high_memory isn't direct mapped memory so retrieving it's physical address
isn't appropriate. But, it would be useful to check physical address of
highmem boundary so it's justfiable to get physical address from it. In
x86, there is a validation check if CONFIG_DEBUG_VIRTUAL and it triggers
following boot failure reported by Ingo.
...
BUG: Int 6: CR2 00f06f53
...
Call Trace:
dump_stack+0x41/0x52
early_idt_handler+0x6b/0x6b
cma_declare_contiguous+0x33/0x212
dma_contiguous_reserve_area+0x31/0x4e
dma_contiguous_reserve+0x11d/0x125
setup_arch+0x7b5/0xb63
start_kernel+0xb8/0x3e6
i386_start_kernel+0x79/0x7d
To fix boot regression, this patch implements workaround to avoid
validation check in x86 when retrieving physical address of high_memory.
__pa_nodebug() used by this patch is implemented only in x86 so there is
no choice but to use dirty #ifdef.
[akpm@linux-foundation.org: tweak comment]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a682e9c28c upstream.
If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error
return value is then (in most cases) just overwritten before we return.
This can result in reporting success to userspace although error happened.
This bug was introduced by commit 2e54eb96e2 ("BKL: Remove BKL from
ncpfs"). Propagate the errors correctly.
Coverity id: 1226925.
Fixes: 2e54eb96e2 ("BKL: Remove BKL from ncpfs")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e77bdebff upstream.
If a request is backlogged, it's complete() handler will get called
twice: once with -EINPROGRESS, and once with the final error code.
af_alg's complete handler, unlike other users, does not handle the
-EINPROGRESS but instead always completes the completion that recvmsg()
is waiting on. This can lead to a return to user space while the
request is still pending in the driver. If userspace closes the sockets
before the requests are handled by the driver, this will lead to
use-after-frees (and potential crashes) in the kernel due to the tfm
having been freed.
The crashes can be easily reproduced (for example) by reducing the max
queue length in cryptod.c and running the following (from
http://www.chronox.de/libkcapi.html) on AES-NI capable hardware:
$ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \
-k 00000000000000000000000000000000 \
-p 00000000000000000000000000000000 >/dev/null & done
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 041d7b98ff upstream.
A regression was caused by commit 780a7654ce:
audit: Make testing for a valid loginuid explicit.
(which in turn attempted to fix a regression caused by e1760bd)
When audit_krule_to_data() fills in the rules to get a listing, there was a
missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.
This broke userspace by not returning the same information that was sent and
expected.
The rule:
auditctl -a exit,never -F auid=-1
gives:
auditctl -l
LIST_RULES: exit,never f24=0 syscall=all
when it should give:
LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all
Tag it so that it is reported the same way it was set. Create a new
private flags audit_krule field (pflags) to store it that won't interact with
the public one from the API.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3640dcfa4f upstream.
Commit f1dc4867 ("audit: anchor all pid references in the initial pid
namespace") introduced a find_vpid() call when adding/removing audit
rules with PID/PPID filters; unfortunately this is problematic as
find_vpid() only works if there is a task with the associated PID
alive on the system. The following commands demonstrate a simple
reproducer.
# auditctl -D
# auditctl -l
# autrace /bin/true
# auditctl -l
This patch resolves the problem by simply using the PID provided by
the user without any additional validation, e.g. no calls to check to
see if the task/PID exists.
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 54dc77d974 upstream.
Eric Paris explains: Since kauditd_send_multicast_skb() gets called in
audit_log_end(), which can come from any context (aka even a sleeping context)
GFP_KERNEL can't be used. Since the audit_buffer knows what context it should
use, pass that down and use that.
See: https://lkml.org/lkml/2014/12/16/542
BUG: sleeping function called from invalid context at mm/slab.c:2849
in_atomic(): 1, irqs_disabled(): 0, pid: 885, name: sulogin
2 locks held by sulogin/885:
#0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff91152e30>] prepare_bprm_creds+0x28/0x8b
#1: (tty_files_lock){+.+.+.}, at: [<ffffffff9123e787>] selinux_bprm_committing_creds+0x55/0x22b
CPU: 1 PID: 885 Comm: sulogin Not tainted 3.18.0-next-20141216 #30
Hardware name: Dell Inc. Latitude E6530/07Y85M, BIOS A15 06/20/2014
ffff880223744f10 ffff88022410f9b8 ffffffff916ba529 0000000000000375
ffff880223744f10 ffff88022410f9e8 ffffffff91063185 0000000000000006
0000000000000000 0000000000000000 0000000000000000 ffff88022410fa38
Call Trace:
[<ffffffff916ba529>] dump_stack+0x50/0xa8
[<ffffffff91063185>] ___might_sleep+0x1b6/0x1be
[<ffffffff910632a6>] __might_sleep+0x119/0x128
[<ffffffff91140720>] cache_alloc_debugcheck_before.isra.45+0x1d/0x1f
[<ffffffff91141d81>] kmem_cache_alloc+0x43/0x1c9
[<ffffffff914e148d>] __alloc_skb+0x42/0x1a3
[<ffffffff914e2b62>] skb_copy+0x3e/0xa3
[<ffffffff910c263e>] audit_log_end+0x83/0x100
[<ffffffff9123b8d3>] ? avc_audit_pre_callback+0x103/0x103
[<ffffffff91252a73>] common_lsm_audit+0x441/0x450
[<ffffffff9123c163>] slow_avc_audit+0x63/0x67
[<ffffffff9123c42c>] avc_has_perm+0xca/0xe3
[<ffffffff9123dc2d>] inode_has_perm+0x5a/0x65
[<ffffffff9123e7ca>] selinux_bprm_committing_creds+0x98/0x22b
[<ffffffff91239e64>] security_bprm_committing_creds+0xe/0x10
[<ffffffff911515e6>] install_exec_creds+0xe/0x79
[<ffffffff911974cf>] load_elf_binary+0xe36/0x10d7
[<ffffffff9115198e>] search_binary_handler+0x81/0x18c
[<ffffffff91153376>] do_execveat_common.isra.31+0x4e3/0x7b7
[<ffffffff91153669>] do_execve+0x1f/0x21
[<ffffffff91153967>] SyS_execve+0x25/0x29
[<ffffffff916c61a9>] stub_execve+0x69/0xa0
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db86da7cb7 upstream.
A security fix in caused the way the unprivileged remount tests were
using user namespaces to break. Tweak the way user namespaces are
being used so the test works again.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66d2f338ee upstream.
Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.
This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed. Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cc46516dd upstream.
- Expose the knob to user space through a proc file /proc/<pid>/setgroups
A value of "deny" means the setgroups system call is disabled in the
current processes user namespace and can not be enabled in the
future in this user namespace.
A value of "allow" means the segtoups system call is enabled.
- Descendant user namespaces inherit the value of setgroups from
their parents.
- A proc file is used (instead of a sysctl) as sysctls currently do
not allow checking the permissions at open time.
- Writing to the proc file is restricted to before the gid_map
for the user namespace is set.
This ensures that disabling setgroups at a user namespace
level will never remove the ability to call setgroups
from a process that already has that ability.
A process may opt in to the setgroups disable for itself by
creating, entering and configuring a user namespace or by calling
setns on an existing user namespace with setgroups disabled.
Processes without privileges already can not call setgroups so this
is a noop. Prodcess with privilege become processes without
privilege when entering a user namespace and as with any other path
to dropping privilege they would not have the ability to call
setgroups. So this remains within the bounds of what is possible
without a knob to disable setgroups permanently in a user namespace.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f95d7918bd upstream.
If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.
Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.
Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.
This is part of the fix for CVE-2014-8989.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80dd00a237 upstream.
setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid. Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.
I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.
This is part of a fix for CVE-2014-8989.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be7c6dba23 upstream.
As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.
For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c
Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups. Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.
For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.
This is part of a fix for CVE-2014-8989.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 273d2c67c3 upstream.
setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.
The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established. Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.
This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0542f17bf2 upstream.
The rule is simple. Don't allow anything that wouldn't be allowed
without unprivileged mappings.
It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.
This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.
The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it. So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated. Violated expectations about the
behavior of the OS is a long way to say a security issue.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ff4d90b4c upstream.
Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged. Add a common function so that
they may all share the same permission checking code.
This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.
A user namespace security fix will update this new helper, shortly.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2f5d4dc38 upstream.
Forced unmount affects not just the mount namespace but the underlying
superblock as well. Restrict forced unmount to the global root user
for now. Otherwise it becomes possible a user in a less privileged
mount namespace to force the shutdown of a superblock of a filesystem
in a more privileged mount namespace, allowing a DOS attack on root.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a44a19b47 upstream.
- MNT_NODEV should be irrelevant except when reading back mount flags,
no longer specify MNT_NODEV on remount.
- Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts.
- Add a test to verify that remount of a prexisting mount with the same flags
is allowed and does not change those flags.
- Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used
when the code is built in an environment without them.
- Correct the test error messages when tests fail. There were not 5 tests
that tested MS_RELATIME.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e1866410f upstream.
Now that remount is properly enforcing the rule that you can't remove
nodev at least sandstorm.io is breaking when performing a remount.
It turns out that there is an easy intuitive solution implicitly
add nodev on remount when nodev was implicitly added on mount.
Tested-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Tested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c297abfdf1 upstream.
While reviewing the code of umount_tree I realized that when we append
to a preexisting unmounted list we do not change pprev of the former
first item in the list.
Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
the former first item of the list will stomp unmounted.first leaving
it set to some random mount point which we are likely to free soon.
This isn't likely to hit, but if it does I don't know how anyone could
track it down.
[ This happened because we don't have all the same operations for
hlist's as we do for normal doubly-linked lists. In particular,
list_splice() is easy on our standard doubly-linked lists, while
hlist_splice() doesn't exist and needs both start/end entries of the
hlist. And commit 38129a13e6 incorrectly open-coded that missing
hlist_splice().
We should think about making these kinds of "mindless" conversions
easier to get right by adding the missing hlist helpers - Linus ]
Fixes: 38129a13e6 switch mnt_hash to hlist
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28a9bc6812 upstream.
When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.
Fix this by iterating the array of keys over the right length.
Fixes: e31b82136d ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d025933e29 upstream.
As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
stopped being incremented after the use-after-free fix. Furthermore, the
RX-LED will be triggered by every multicast frame (which wouldn't happen
before) which wouldn't allow the LED to rest at all.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
patch.
Fixes: b8fff407a1 ("mac80211: fix use-after-free in defragmentation")
Signed-off-by: Andreas Müller <goo@stapelspeicher.org>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e6225a160 upstream.
Avoid a case where we would access uninitialized stack data if the AP
advertises HT support without 40MHz channel support.
Fixes: f3000e1b43 ("mac80211: fix broken use of VHT/20Mhz with some APs")
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2967e031d4 upstream.
Instead of keeping track of all those special cases where
VLAN interfaces have no bss_conf.chandef, just make sure
they have the same as the AP interface they belong to.
Among others, this fixes a crash getting a VLAN's channel
from userspace since a NULL channel is returned as a good
result (return value 0) for VLANs since the commit below.
Fixes: c12bc4885f ("mac80211: return the vif's chandef in ieee80211_cfg_get_channel()")
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[rewrite commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b26bdde5bb upstream.
When loading encrypted-keys module, if the last check of
aes_get_sizes() in init_encrypted() fails, the driver just returns an
error without unregistering its key type. This results in the stale
entry in the list. In addition to memory leaks, this leads to a kernel
crash when registering a new key type later.
This patch fixes the problem by swapping the calls of aes_get_sizes()
and register_key_type(), and releasing resources properly at the error
paths.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25cdb9c868 upstream.
I'm such a moron! The simple solution of saving the BSP patch
for use on resume was too simple (and wrong!), hint:
sizeof(struct microcode_intel).
What needs to be done instead is to fish out the microcode patch
we have stashed previously and apply that on the BSP in case the
late loader hasn't been utilized.
So do that instead.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141208110820.GB20057@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbae4ba8c4 upstream.
Normally, we do reapply microcode on resume. However, in the cases where
that microcode comes from the early loader and the late loader hasn't
been utilized yet, there's no easy way for us to go and apply the patch
applied during boot by the early loader.
Thus, reuse the patch stashed by the early loader for the BSP.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a18a0f6850 upstream.
Paravirtual guests are not expected to load microcode into processors
and therefore it is not necessary to initialize microcode loading
logic.
In fact, under certain circumstances initializing this logic may cause
the guest to crash. Specifically, 32-bit kernels use __pa_nodebug()
macro which does not work in Xen (the code path that leads to this macro
happens during resume when we call mc_bp_resume()->load_ucode_ap()
->check_loader_disabled_ap())
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef84b3bb9 upstream.
Hand down the cpu number instead, otherwise lockdep screams when doing
echo 1 > /sys/devices/system/cpu/microcode/reload.
BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e2024624e upstream.
We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.
Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fb2f4237b upstream.
It turns out that there's a lurking ABI issue. GCC, when
compiling this in a 32-bit program:
struct user_desc desc = {
.entry_number = idx,
.base_addr = base,
.limit = 0xfffff,
.seg_32bit = 1,
.contents = 0, /* Data, grow-up */
.read_exec_only = 0,
.limit_in_pages = 1,
.seg_not_present = 0,
.useable = 0,
};
will leave .lm uninitialized. This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.
Revert the .lm check in set_thread_area(). The value never did
anything in the first place.
Fixes: 0e58af4e1d ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4607572ef upstream.
Back when audio was enabled, the muxing of some MPP pins was causing
problems. However, since commit fea038ed55 ("ARM: mvebu: Add proper
pin muxing on the Armada 370 DB board"), those problematic MPP pins
have been assigned a proper muxing for the Ethernet interfaces. This
proper muxing is now conflicting with the hog pins muxing that had
been added as part of 249f382250 ("ARM: mvebu: add audio support to
Armada 370 DB").
Therefore, this commit simply removes the hog pins muxing, which
solves a warning a boot time due to the conflicting muxing
requirements.
Fixes: fea038ed55 ("ARM: mvebu: Add proper pin muxing on the Armada 370 DB board")
Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lkml.kernel.org/r/1414512524-24466-5-git-send-email-thomas.petazzoni@free-electrons.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e553554536 upstream.
Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada
38x and Armada XP requires a certain number of conditions:
- On Armada 370, the cache policy must be set to write-allocate.
- On Armada 375, 38x and XP, the cache policy must be set to
write-allocate, the pages must be mapped with the shareable
attribute, and the SMP bit must be set
Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
of these conditions are met. With Armada 370, the situation is worse:
since the processor is single core, regardless of whether CONFIG_SMP
or !CONFIG_SMP is used, the cache policy will be set to write-back by
the kernel and not write-allocate.
Since solving this problem turns out to be quite complicated, and we
don't want to let users with a mainline kernel known to have
infrequent but existing data corruptions, this commit proposes to
simply disable hardware I/O coherency in situations where it is known
not to work.
And basically, the is_smp() function of the kernel tells us whether it
is OK to enable hardware I/O coherency or not, so this commit slightly
refactors the coherency_type() function to return
COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
type of the coherency fabric in the other case.
Thanks to this, the I/O coherency fabric will no longer be used at all
in !CONFIG_SMP configurations. It will continue to be used in
CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
(which are multiple cores processors), but will no longer be used on
Armada 370 (which is a single core processor).
In the process, it simplifies the implementation of the
coherency_type() function, and adds a missing call to of_node_put().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: e60304f8cb ("arm: mvebu: Add hardware I/O Coherency support")
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30cdef9710 upstream.
The ll_add_cpu_to_smp_group(), ll_enable_coherency() and
ll_disable_coherency() are used on Armada XP to control the coherency
fabric. However, they make the assumption that the coherency fabric is
always available, which is currently a correct assumption but will no
longer be true with a followup commit that disables the usage of the
coherency fabric when the conditions are not met to use it.
Therefore, this commit modifies those functions so that they check the
return value of ll_get_coherency_base(), and if the return value is 0,
they simply return without configuring anything in the coherency
fabric.
The ll_get_coherency_base() function is also modified to properly
return 0 when the function is called with the MMU disabled. In this
case, it normally returns the physical address of the coherency
fabric, but we now check if the virtual address is 0, and if that's
case, return a physical address of 0 to indicate that the coherency
fabric is not enabled.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Link: https://lkml.kernel.org/r/1415871540-20302-2-git-send-email-thomas.petazzoni@free-electrons.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4a680099a upstream.
Commit d127e9c ("ARM: tegra: make tegra_resume can work with current and later
chips") removed tegra_get_soc_id macro leaving used cpu register corrupted after
branching to v7_invalidate_l1() and as result causing execution of unintended
code on tegra20. Possibly it was expected that r6 would be SoC id func argument
since common cpu reset handler is setting r6 before branching to tegra_resume(),
but neither tegra20_lp1_reset() nor tegra30_lp1_reset() aren't setting r6
register before jumping to resume function. Fix it by re-adding macro.
Fixes: d127e9c (ARM: tegra: make tegra_resume can work with current and later chips)
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc6057ecb3 upstream.
When creating a dumb buffer object using the DRM_IOCTL_MODE_CREATE_DUMB
IOCTL, only the width, height, bpp and flags parameters are inputs. The
caller is not guaranteed to zero out or set handle, pitch and size, so
the driver must not treat these values as possible inputs.
Fixes a bug where running the Weston compositor on Tegra DRM would cause
an attempt to allocate a 3 GiB framebuffer to be allocated.
Fixes: de2ba664c3 ("gpu: host1x: drm: Add memory manager and fb")
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51c9fbb1b1 upstream.
Earlier implementation assumed last instruction is BPF_EXIT.
Since this is no longer a restriction in eBPF, we remove this
limitation.
Per Alexei Starovoitov [1]:
> classic BPF has a restriction that last insn is always BPF_RET.
> eBPF doesn't have BPF_RET instruction and this restriction.
> It has BPF_EXIT insn which can appear anywhere in the program
> one or more times and it doesn't have to be last insn.
[1] https://lkml.org/lkml/2014/11/27/2
Fixes: e54bcde3d6 ("arm64: eBPF JIT compiler")
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d57511d2d upstream.
Commit a469abd0f8 (ARM: elf: add new hwcap for identifying atomic
ldrd/strd instructions) introduces HWCAP_ELF for 32-bit ARM
applications. As LPAE is always present on arm64, report the
corresponding compat HWCAP to user space.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 17181fb7a0 upstream.
As long as struct thin_c is in the list, anyone can grab a reference of
it. Consequently, we must wait for the reference count to drop to zero
*after* we remove the structure from the list, not before.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c43fd26e4 upstream.
Discard bios and thin device deletion have the potential to release data
blocks. If the thin-pool is in out-of-data-space mode, and blocks were
released, transition the thin-pool back to full write mode.
The correct time to do this is just after the thin-pool metadata commit.
It cannot be done before the commit because the space maps will not
allow immediate reuse of the data blocks in case there's a rollback
following power failure.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45ec9bd0fd upstream.
When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
function pointer was incorrectly being set to
process_prepared_discard_passdown rather than process_prepared_discard.
This incorrect function pointer meant the discard was being passed down,
but not effecting the mapping. As such any discard that was issued, in
an attempt to reclaim blocks, would not successfully free data space.
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1c6156fe4 upstream.
This function isn't right and it causes a static checker warning:
drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
error: potentially using uninitialized 'sb_data_size'.
It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.
Fixes: 3241b1d3e0 ('dm: add persistent data library')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f824a2af3d upstream.
We never bother caching a partial block that is at the back end of the
origin device. No cell ever gets locked, but the calling code was
assuming it was and trying to release it.
Now the code only releases if the cell has been set to a non NULL
value.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e32134a5a upstream.
If the incoming bio is a WRITE and completely covers a block then we
don't bother to do any copying for a promotion operation. Once this is
done the cache block and origin block will be different, so we need to
set it to 'dirty'.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f29a3147e2 upstream.
Overwrite causes the cache block and origin blocks to diverge, which
is only allowed in writeback mode.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a71d6ffe1 upstream.
Use memzero_explicit to cleanup sensitive data allocated on stack
to prevent the compiler from optimizing and removing memset() calls.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 445559cdcb upstream.
When dm-bufio sets out to use the bio built into a struct dm_buffer to
issue an IO, it needs to call bio_reset after it's done with the bio
so that we can free things attached to the bio such as the integrity
payload. Therefore, inject our own endio callback to take care of
the bio_reset after calling submit_io's end_io callback.
Test case:
1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300
2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device
3. Repeatedly read metadata and watch kmalloc-192 leak!
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08d4f77222 upstream.
This patch fixes kmemcheck warning in switch_names. The function
switch_names swaps inline names of two dentries. It swaps full arrays
d_iname, no matter how many bytes are really used by the strings. Reading
data beyond string ends results in kmemcheck warning.
We fix the bug by marking both arrays as fully initialized.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bd5a980de upstream.
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt
early so that it is safe to call .release in case nfs4_alloc_pages
fails.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: a47970ff78 ("NFSv4.1: Hold reference to layout hdr in layoutget")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9234f3190b upstream.
f2fs_write_begin() doesn't initialize the 'dn' variable if the inode has
inline data. However it uses its contents to decide whether it should
just zero out the page or load data to it. Thus if we are unlucky we can
zero out page contents instead of loading inline data into a page.
CC: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bd27ae4aa upstream.
If user specifies too low end sector for trimming, f2fs_trim_fs() will
use uninitialized value as a number of trimmed blocks and returns it to
userspace. Initialize number of trimmed blocks early to avoid the
problem.
Coverity-id: 1248809
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6c92b7e0a upstream.
The .eh_abort_handler needs to return SUCCESS, FAILED, or
FAST_IO_FAIL. So fixup all callers to adhere to this requirement.
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe08be3ec8 upstream.
The code reads the default voltage selector from its register. If the
bootloader disables the regulator, the default voltage selector will be
0 which results in faulty behaviour of this regulator driver.
This patch sets a default voltage selector for vddpu if it is not set in
the register.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 903101a839 upstream.
The commit, mmc: omap: clarify DDR timing mode between SD-UHS and eMMC,
switched omap_hsmmc to support MMC DDR mode instead of UHS DDR50 mode.
Add UHS DDR50 mode again and this time let's also keep the MMC DDR mode.
Fixes: 5438ad95a5 (mmc: omap: clarify DDR timing mode between SD-UHS and eMMC)
Reported-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66dfd10173 upstream.
Commit f1d2736c81 (mmc: dw_mmc: control card read threshold) added
dw_mci_ctrl_rd_thld() with an unconditional write to the CDTHRCTL
register at offset 0x100. However before version 240a, the FIFO region
started at 0x100, so the write messes with the FIFO and completely
breaks the driver.
If the version id < 240A, return early from dw_mci_ctl_rd_thld() so as
not to hit this problem.
Fixes: f1d2736c81 (mmc: dw_mmc: control card read threshold)
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a5fb99de4 upstream.
Some boards with TC6393XB chip require full state restore during system
resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000
tosa is one of them). Failing to do so would result in ohci Oops on
resume due to internal memory contentes being changed. Fail ohci suspend
on tc6393xb is full state restore is required.
Recommended workaround is to unbind tmio-ohci driver before suspend and
rebind it after resume.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b9b46d05f upstream.
Commit e7cd1d1eb1 ("mfd: twl4030-power: Add generic reset
configuration") accidentally removed the compatible flag for
"ti,twl4030-power" that should be there as documented in the
binding.
If "ti,twl4030-power" only the poweroff configuration is done
by the driver.
Fixes: e7cd1d1eb1 ("mfd: twl4030-power: Add generic reset configuration")
Reported-by: "Dr. H. Nikolaus Schaller" <hns@goldelico.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b46b8a718 upstream.
This is a bug fix for using physical arch timers when
the arch_timer_use_virtual boolean is false. It restores the
arch_counter_get_cntpct() function after removal in
0d651e4e "clocksource: arch_timer: use virtual counters"
We need this on certain ARMv7 systems which are architected like this:
* The firmware doesn't know and doesn't care about hypervisor mode and
we don't want to add the complexity of hypervisor there.
* The firmware isn't involved in SMP bringup or resume.
* The ARCH timer come up with an uninitialized offset between the
virtual and physical counters. Each core gets a different random
offset.
* The device boots in "Secure SVC" mode.
* Nothing has touched the reset value of CNTHCTL.PL1PCEN or
CNTHCTL.PL1PCTEN (both default to 1 at reset)
One example of such as system is RK3288 where it is much simpler to
use the physical counter since there's nobody managing the offset and
each time a core goes down and comes back up it will get reinitialized
to some other random value.
Fixes: 0d651e4e65 ("clocksource: arch_timer: use virtual counters")
Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29fa682546 upstream.
paravirt_enabled has the following effects:
- Disables the F00F bug workaround warning. There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists. This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.
- Disables 32-bit APM BIOS detection. On a KVM paravirt system,
there should be no APM BIOS anyway.
- Disables tboot. I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.
- paravirt_enabled disables espfix32. espfix32 should *not* be
disabled under KVM paravirt.
The last point is the purpose of this patch. It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests. Fixes CVE-2014-8134.
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e58af4e1d upstream.
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.
For completeness, block attempts to set the L bit. (Prior to
this patch, the L bit would have been silently dropped.)
This is an ABI break. I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.
Note to stable maintainers: this is a hardening patch that fixes
no known bugs. Given the possibility of ABI issues, this
probably shouldn't be backported quickly.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f54e18f1b8 upstream.
Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.
Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f62f5eff3d upstream.
The same fixup to enable EAPD is needed for ASUS Z99He with AD1986A
codec like another ASUS machine.
Reported-and-tested-by: Dmitry V. Zimin <pfzim@mail.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca5358ef75 upstream.
... by not hitting rename_retry for reasons other than rename having
happened. In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill(). Skip the killed siblings instead...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7f19fc5e0b ]
For netlink, we shouldn't be using arch_fast_hash() as a hashing
discipline, but rather jhash() instead.
Since netlink sockets can be opened by any user, a local attacker
would be able to easily create collisions with the DPDK-derived
arch_fast_hash(), which trades off performance for security by
using crc32 CPU instructions on x86_64.
While it might have a legimite use case in other places, it should
be avoided in netlink context, though. As rhashtable's API is very
flexible, we could later on still decide on other hashing disciplines,
if legitimate.
Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 69204cf7eb ]
commit 46e5da40ae (net: qdisc: use rcu prefix and silence
sparse warnings) triggers a spurious warning:
net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage!
The code should be using the _bh variant of rcu_dereference.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11d3d2a16c ]
Commit 97a6d1bb2b (xen-netfront: Fix
handling packets on compound pages with skb_linearize) attempted to
fix a problem where an skb that would have required too many slots
would be dropped causing TCP connections to stall.
However, it filled in the first slot using the original buffer and not
the new one and would use the wrong offset and grant access to the
wrong page.
Netback would notice the malformed request and stop all traffic on the
VIF, reporting:
vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
vif vif-3-0 vif3.0: fatal error; disabling device
Reported-by: Anthony Wright <anthony@overnetdata.com>
Tested-by: Anthony Wright <anthony@overnetdata.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0f85feae6b ]
When I cooked commit c3658e8d0f ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)
Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
Reported-by: Dann Frazier <dann.frazier@canonical.com>
Bisected-by: Dann Frazier <dann.frazier@canonical.com>
Tested-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: ca777eff51 ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9772b54c55 ]
To accomodate for enough headroom for tunnels, use MAX_HEADER instead
of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs
of trinity an skb_under_panic() via SCTP output path (see reference).
I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere
in other protocols might be one possible cause for this.
In any case, it looks like accounting on chunks themself seems to look
good as the skb already passed the SCTP output path and did not hit
any skb_over_panic(). Given tunneling was enabled in his .config, the
headroom would have been expanded by MAX_HEADER in this case.
Reported-by: Robert Święcki <robert@swiecki.net>
Reference: https://lkml.org/lkml/2014/12/1/507
Fixes: 594ccc14df ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5f478b4103 ]
mvneta_tx() dereferences skb to get skb->len too late,
as hardware might have completed the transmit and TX completion
could have freed the skb from another cpu.
Fixes: 71f6d1b31f ("net: mvneta: replace Tx timer with a real interrupt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit aebea2ba0f ]
The mvneta driver sets the amount of Tx coalesce packets to 16 by
default. Normally that does not cause any trouble since the driver
uses a much larger Tx ring size (532 packets). But some sockets
might run with very small buffers, much smaller than the equivalent
of 16 packets. This is what ping is doing for example, by setting
SNDBUF to 324 bytes rounded up to 2kB by the kernel.
The problem is that there is no documented method to force a specific
packet to emit an interrupt (eg: the last of the ring) nor is it
possible to make the NIC emit an interrupt after a given delay.
In this case, it causes trouble, because when ping sends packets over
its raw socket, the few first packets leave the system, and the first
15 packets will be emitted without an IRQ being generated, so without
the skbs being freed. And since the socket's buffer is small, there's
no way to reach that amount of packets, and the ping ends up with
"send: no buffer available" after sending 6 packets. Running with 3
instances of ping in parallel is enough to hide the problem, because
with 6 packets per instance, that's 18 packets total, which is enough
to grant a Tx interrupt before all are sent.
The original driver in the LSP kernel worked around this design flaw
by using a software timer to clean up the Tx descriptors. This timer
was slow and caused terrible network performance on some Tx-bound
workloads (such as routing) but was enough to make tools like ping
work correctly.
Instead here, we simply set the packet counts before interrupt to 1.
This ensures that each packet sent will produce an interrupt. NAPI
takes care of coalescing interrupts since the interrupt is disabled
once generated.
No measurable performance impact nor CPU usage were observed on small
nor large packets, including when saturating the link on Tx, and this
fixes tools like ping which rely on too small a send buffer. If one
wants to increase this value for certain workloads where it is safe
to do so, "ethtool -C $dev tx-frames" will override this default
setting.
This fix needs to be applied to stable kernels starting with 3.10.
Tested-By: Maggie Mae Roxas <maggie.mae.roxas@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e46477a12 ]
Remove optimize_div() from BPF_MOD | BPF_K case
since we don't know the dividend and fix the
emit_mod() by reading the mod operation result from HI register
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Reviewed-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f2a01517f2 ]
Following patch fixes typo in the flow validation. This prevented
installation of ARP and IPv6 flows.
Fixes: 19e7a3df72 ("openvswitch: Fix NDP flow mask validation")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6fb2a75673 ]
Set the inner mac header to point to the GRE payload when
doing GRO. This is needed if we proceed to send the packet
through GRE GSO which now uses the inner mac header instead
of inner network header to determine the length of encapsulation
headers.
Fixes: 14051f0452 ("gre: Use inner mac length when computing tunnel length")
Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 00c83b01d5 ]
Currently, when trying to reuse a socket, vxlan_sock_add will grab
vn->sock_lock, locate a reusable socket, inc refcount and release
vn->sock_lock.
But vxlan_sock_release() will first decrement refcount, and then grab
that lock. refcnt operations are atomic but as currently we have
deferred works which hold vs->refcnt each, this might happen, leading to
a use after free (specially after vxlan_igmp_leave):
CPU 1 CPU 2
deferred work vxlan_sock_add
... ...
spin_lock(&vn->sock_lock)
vs = vxlan_find_sock();
vxlan_sock_release
dec vs->refcnt, reaches 0
spin_lock(&vn->sock_lock)
vxlan_sock_hold(vs), refcnt=1
spin_unlock(&vn->sock_lock)
hlist_del_rcu(&vs->hlist);
vxlan_notify_del_rx_port(vs)
spin_unlock(&vn->sock_lock)
So when we look for a reusable socket, we check if it wasn't freed
already before reusing it.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Fixes: 7c47cedf43 ("vxlan: move IGMP join/leave to work queue")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-16 09:39:04 -08:00
2260 changed files with 27068 additions and 12074 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.