commit c27927e372 upstream.
Updates to tp_reserve can race with reads of the field in
packet_set_ring. Avoid this by holding the socket lock during
updates in setsockopt PACKET_RESERVE.
This bug was discovered by syzkaller.
Fixes: 8913336a7e ("packet: add PACKET_RESERVE sockopt")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f991af3daa upstream.
The retry logic for netlink_attachskb() inside sys_mq_notify()
is nasty and vulnerable:
1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
release the file refcnt
so we when retry but the fd has been just closed by user-space
during this small window, we end up calling netlink_detachskb()
on the error path which releases the sock again, later when
the user-space closes this socket a use-after-free could be
triggered.
Setting 'sock' to NULL here should be sufficient to fix it.
Reported-by: GeneBlue <geneblue.mail@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6399f1fae4 upstream.
In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.
This problem has been here since before the beginning of git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34bf129a7f upstream.
While working on another build error, I ran into several variations of
this dependency loop:
subsection "Kconfig recursive dependency limitations"
drivers/input/Kconfig:8: symbol INPUT is selected by VT
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/tty/Kconfig:12: symbol VT is selected by FB_STI
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:677: symbol FB_STI depends on FB
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/fbdev/Kconfig:5: symbol FB is selected by DRM_KMS_FB_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:72: symbol DRM_KMS_FB_HELPER is selected by DRM_KMS_CMA_HELPER
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/Kconfig:137: symbol DRM_KMS_CMA_HELPER is selected by DRM_HDLCD
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/arm/Kconfig:6: symbol DRM_HDLCD depends on OF
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/of/Kconfig:4: symbol OF is selected by X86_INTEL_CE
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:523: symbol X86_INTEL_CE depends on X86_IO_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:1011: symbol X86_IO_APIC depends on X86_LOCAL_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:1005: symbol X86_LOCAL_APIC depends on X86_UP_APIC
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
arch/x86/Kconfig:980: symbol X86_UP_APIC depends on PCI_MSI
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/pci/Kconfig:11: symbol PCI_MSI is selected by AMD_IOMMU
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/iommu/Kconfig:106: symbol AMD_IOMMU depends on IOMMU_SUPPORT
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/iommu/Kconfig:5: symbol IOMMU_SUPPORT is selected by DRM_ETNAVIV
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/etnaviv/Kconfig:2: symbol DRM_ETNAVIV depends on THERMAL
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/thermal/Kconfig:5: symbol THERMAL is selected by ACPI_VIDEO
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/acpi/Kconfig:183: symbol ACPI_VIDEO is selected by INPUT
This doesn't currently show up as I fixed the 'THERMAL' part of it,
but I noticed that the FB_STI dependency should not be there but
was introduced by slightly incorrect bug-fix patch that tried to
fix a link error.
Instead of selecting 'VT' to make us enter the drivers/video/console
directory at compile-time, it's sufficient to build the
drivers/video/console/sticore.c file by adding its directory
to when CONFIG_FB_STI is enabled. Alternatively, we could move the
sticore code to another directory that is always built when we
have at STI_CONSOLE or FB_STI enabled.
Fixes: 17085a9345 ("parisc: stifb: should depend on STI_CONSOLE")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9abc74a22d upstream.
This is broken since ever but sadly nobody noticed.
Recent versions of GDB set DR_CONTROL unconditionally and
UML dies due to a heap corruption. It turns out that
the PTRACE_POKEUSER was copy&pasted from i386 and assumes
that addresses are 4 bytes long.
Fix that by using 8 as address size in the calculation.
Reported-by: jie cao <cj3054@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 26c9cb668c upstream.
Mac requires the unicode flag to be set for cifs, even for the smb
echo request (which doesn't have strings).
Without this Mac rejects the periodic echo requests (when mounting
with cifs) that we use to check if server is down
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 564277ecee upstream.
January is month 1. There is no zero-th month. If someone passes a
zero month then it means we read from one space before the start of the
total_days_of_prev_months[] array.
We may as well also be strict about days as well.
Fixes: 1bd5bbcb65 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a9f11f963a upstream.
Be careful when comparing tcp_time_stamp to some u32 quantity,
otherwise result can be surprising.
Fixes: 7c106d7e78 ("[TCP]: TCP Low Priority congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b8481fa42 upstream.
Since that change also made the nfrag function not necessary
for exports, remove it.
Fixes: 89a23c8b52 ("ip6_tunnel: Fix missing tunnel encapsulation limit option")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89a23c8b52 upstream.
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and
IPV6_TLV_PADN options when an encapsulation limit is defined (the
default is a limit of 4). An MTU adjustment is done to account for
these options as well. However, the options are never present in the
generated packets.
The issue appears to be a subtlety between IPV6_DSTOPTS and
IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was
written, the encap limit options were included as IPV6_RTHDRDSTOPTS in
dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was
(correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS
are to be used. This caused the options to no longer be included in v6
encapsulated packets.
The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions)
instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement.
Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.")
Fixes: 333fad5364: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ab89f0bdd6 upstream.
Running 32bit userspace on 64bit kernel results in MSG_CMSG_COMPAT being
defined as 0x80000000. This results in sendmsg failure if used from 32bit
userspace running on 64bit kernel. Fix this by accounting for MSG_CMSG_COMPAT
in flags check in hci_sock_sendmsg.
Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>
Signed-off-by: Marko Kiiskila <marko@runtime.io>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b8642af15 upstream.
Since commit 5093bb965a ("powerpc/QE: switch to the cpm_muram
implementation"), muram area is not part of immrbar mapping anymore
so immrbar_virt_to_phys() is not usable anymore.
Fixes: 5093bb965a ("powerpc/QE: switch to the cpm_muram implementation")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Li Yang <pku.leo@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c6ade20f5e upstream.
The WRITE SAME to TRIM translation rewrites the DATA OUT buffer. While
the SCSI code accomodates for this by passing a read-writable buffer
userspace applications don't cater for this behavior. In fact it can
be used to rewrite e.g. a readonly file through mmap and should be
considered as a security fix.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2:
- Open-code blk_rq_is_passthrough()
- We don't distinguish which field is invaid so goto invalid_fld
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8561eae60f upstream.
The Infiniband spec defines "A multicast address is defined by a
MGID and a MLID" (section 10.5). Currently the MLID value is not
validated.
Add check to verify that the MLID value is in the correct address
range.
Fixes: 0c33aeedb2 ("[IB] Add checks to multicast attach and detach")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: use literal number instead of IB_MULTICAST_LID_BASE]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 20c7840a77 upstream.
A list of MGID/MLID pairs is built when doing a multicast attach. When
the multicast detach is called, the list is searched, and regardless of
the search outcome, the driver detach is called.
If an MGID/MLID pair is not on the list, driver detach should not be
called, and an error should be returned. Calling the driver without
removing an MGID/MLID pair from the list can leave the core and driver
out of sync.
Fixes: f4e401562c ("IB/uverbs: track multicast group membership for userspace QPs")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d6159640d upstream.
DWC3 driver uses of_usb_get_phy_mode() which is
implemented in drivers/usb/phy/of.c and in bare minimal
configuration it might not be pulled in kernel binary.
In case of ARC or ARM this could be easily reproduced with
"allnodefconfig" +CONFIG_USB=m +CONFIG_USB_DWC3=m.
On building all ends-up with:
---------------------->8------------------
Kernel: arch/arm/boot/Image is ready
Kernel: arch/arm/boot/zImage is ready
Building modules, stage 2.
MODPOST 5 modules
ERROR: "of_usb_get_phy_mode" [drivers/usb/dwc3/dwc3.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2
---------------------->8------------------
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: linux-snps-arc@lists.infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 53b56da83d upstream.
After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.
So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
CPU0 CPU1
ctnetlink_change_status __nf_conntrack_find_get
old = ct->status nf_ct_gc_expired
- nf_ct_kill
- test_and_set_bit(IPS_DYING_BIT
new = old | status; -
ct->status = new; <-- oops, _DYING_ is cleared!
Now using a series of atomic bit operation to solve the above issue.
Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.
If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.
Last, add some comments to describe the logic change due to the
commit a963d710f3 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.
Fixes: 76507f69c4 ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2:
- IPS_UNCHANGEABLE_MASK was not previously defined and ctnetlink_update_status()
is not needed
- enum ip_conntrack_status only assigns 13 bits
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d594aa0277 upstream.
The minimum size for a new stack (512 bytes) setup for arch/x86/boot components
when the bootloader does not setup/provide a stack for the early boot components
is not "enough".
The setup code executing as part of early kernel startup code, uses the stack
beyond 512 bytes and accidentally overwrites and corrupts part of the BSS
section. This is exposed mostly in the early video setup code, where
it was corrupting BSS variables like force_x, force_y, which in-turn affected
kernel parameters such as screen_info (screen_info.orig_video_cols) and
later caused an exception/panic in console_init().
Most recent boot loaders setup the stack for early boot components, so this
stack overwriting into BSS section issue has not been exposed.
Signed-off-by: Ashish Kalra <ashish@bluestacks.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170419152015.10011-1-ashishkalra@Ashishs-MacBook-Pro.local
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f140fc206 upstream.
Enabling vflip currently causes wrong colors.
It seems that (at least with the current sensor setup) REG04_VFLIP_IMG only
changes the vertical readout direction.
Because pixels are arranged RGRG... in odd lines and GBGB... in even lines,
either a one line shift or even/odd line swap is required, too, but
apparently this doesn't happen.
I finally figured out that this can be done manually by setting
REG04_VREF_EN.
Looking at hflip, it turns out that bit REG04_HREF_EN is set there
permanetly, but according to my tests has no effect on the pixel readout
order.
So my conclusion is that the current documentation of sensor register 0x04
is wrong (has changed after preliminary datasheet version 2.2).
I'm pretty sure that automatic vertical line shift/switch can be enabled,
too, but until anyone finds ot how this works, we have to stick with manual
switching.
Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 950e252cb4 upstream.
Otherwise the i2c transfer functions can read or write beyond the end of
stack or heap buffers.
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2:
- Use obuf instead of state->data
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 324ed533bf upstream.
We recently introduced some new error paths but the unlocks are missing.
Fixes: 0065a79a86 ('[media] dw2102: Don't use dynamic static allocation')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0065a79a86 upstream.
Dynamic static allocation is evil, as Kernel stack is too low, and
compilation complains about it on some archs:
drivers/media/usb/dvb-usb/dw2102.c:368:1: warning: 'dw2102_earda_i2c_transfer' uses dynamic stack allocation [enabled by default]
drivers/media/usb/dvb-usb/dw2102.c:449:1: warning: 'dw2104_i2c_transfer' uses dynamic stack allocation [enabled by default]
drivers/media/usb/dvb-usb/dw2102.c:512:1: warning: 'dw3101_i2c_transfer' uses dynamic stack allocation [enabled by default]
drivers/media/usb/dvb-usb/dw2102.c:621:1: warning: 's6x0_i2c_transfer' uses dynamic stack allocation [enabled by default]
Instead, let's enforce a limit for the buffer to be the max size of
a control URB payload data (64 bytes).
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a12b8ab8c5 upstream.
Otherwise ttusb2_i2c_xfer can read or write beyond the end of static and
heap buffers.
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff17999184 upstream.
The ttusb2_msg function uses on-stack variables to submit commands to
dvb_usb_generic. This eventually gets to the DMA api layer and will throw a
traceback if the debugging options are set.
This allocates the temporary buffer variables with kzalloc instead.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=734506
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea00353f36 upstream.
Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
crashes during suspend tests. Geert Uytterhoeven managed to reproduce the
issue on an M2-W Koelsch board (r8a7791):
It occurs when the PME scan runs, once per second. During PME scan, the
PCI host bridge (rcar-pci) registers are accessed while its module clock
has already been disabled, leading to the crash.
One reproducer is to configure s2ram to use "s2idle" instead of "deep"
suspend:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo s2idle > /sys/power/mem_sleep
# echo mem > /sys/power/state
Another reproducer is to write either "platform" or "processors" to
/sys/power/pm_test. It does not (or is less likely) to happen during full
system suspend ("core" or "none") because system suspend also disables
timers, and thus the workqueue handling PME scans no longer runs. Geert
believes the issue may still happen in the small window between disabling
module clocks and disabling timers:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test # Or "processors"
# echo mem > /sys/power/state
(Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
Rafael Wysocki agrees that PME scans should be suspended before the host
bridge registers become inaccessible. To that end, queue the task on a
workqueue that gets frozen before devices suspend.
Rafael notes however that as a result, some wakeup events may be missed if
they are delivered via PME from a device without working IRQ (which hence
must be polled) and occur after the workqueue has been frozen. If that
turns out to be an issue in practice, it may be possible to solve it by
calling pci_pme_list_scan() once directly from one of the host bridge's
pm_ops callbacks.
Stacktrace for posterity:
PM: Syncing filesystems ... [ 38.566237] done.
PM: Preparing system for sleep (mem)
Freezing user space processes ... [ 38.579813] (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: Suspending system (mem)
PM: suspend of devices complete after 152.456 msecs
PM: late suspend of devices complete after 2.809 msecs
PM: noirq suspend of devices complete after 29.863 msecs
suspend debug: Waiting for 5 second(s).
Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
pgd = c0003000
[00000000] *pgd=80000040004003, *pmd=00000000
Internal error: : 1211 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
4.9.0-rc1-koelsch-00011-g68db9bc814362e7f #3383
Hardware name: Generic R8A7791 (Flattened Device Tree)
Workqueue: events pci_pme_list_scan
task: eb56e140 task.stack: eb58e000
PC is at pci_generic_config_read+0x64/0x6c
LR is at rcar_pci_cfg_base+0x64/0x84
pc : [<c041d7b4>] lr : [<c04309a0>] psr: 600d0093
sp : eb58fe98 ip : c041d750 fp : 00000008
r10: c0e2283c r9 : 00000000 r8 : 600d0013
r7 : 00000008 r6 : eb58fed6 r5 : 00000002 r4 : eb58feb4
r3 : 00000000 r2 : 00000044 r1 : 00000008 r0 : 00000000
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5387d Table: 6a9f6c80 DAC: 55555555
Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
Stack: (0xeb58fe98 to 0xeb590000)
fe80: 00000002 00000044
fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
[<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>]
(pci_bus_read_config_word+0x58/0x80)
[<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>]
(pci_check_pme_status+0x34/0x78)
[<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54)
[<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4)
[<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>]
(process_one_work+0x1bc/0x308)
[<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0)
[<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc)
[<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
---[ end trace 667d43ba3aa9e589 ]---
Fixes: df17e62e5b ("PCI: Add support for polling PME state on suspended legacy PCI devices")
Reported-and-tested-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reported-and-tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Cc: Simon Horman <horms+renesas@verge.net.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cef4d02305 upstream.
The /proc/bus/pci mmap interface allows the user to specify whether they
want WC or not. Don't let them do so on non-prefetchable BARs.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3a92c319c4 upstream.
PCI exposes files like /proc/bus/pci/00/00.0 in procfs. These files
support operations like this:
ioctl(fd, PCIIOC_MMAP_IS_IO); # request I/O port space
ioctl(fd, PCIIOC_WRITE_COMBINE, 1); # request write-combining
mmap(fd, ...)
Write combining is useful on PCI memory space, but I don't think it makes
sense on PCI I/O port space.
We *could* change proc_bus_pci_ioctl() to make it impossible to set
mmap_state == pci_mmap_io and write_combine at the same time, but that
would break the following sequence, which is currently legal:
mmap(fd, ...) # default is I/O, non-combining
ioctl(fd, PCIIOC_WRITE_COMBINE, 1); # request write-combining
ioctl(fd, PCIIOC_MMAP_IS_MEM); # request memory space
mmap(fd, ...) # get write-combining mapping
Ignore the write-combining flag when mapping I/O port space.
This patch should have no functional effect, based on this analysis of all
implementations of pci_mmap_page_range():
- ia64 mips parisc sh unicore32 x86 do not support mapping of I/O port
space at all.
- arm cris microblaze mn10300 sparc xtensa support mapping of I/O port
space, but ignore the write_combine argument to pci_mmap_page_range().
- powerpc supports mapping of I/O port space and uses write_combine, and
it disables write combining for I/O port space in
__pci_mmap_set_pgprot().
This patch makes it possible to remove __pci_mmap_set_pgprot() from
powerpc, which simplifies that path.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee0fe833d9 upstream.
This code copies actual_length-128 bytes from the header, which will
underflow if the received buffer is too small.
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 821117dc21 upstream.
Return an error rather than memcpy()ing beyond the end of the buffer.
Internal callers use appropriate sizes, but digitv_i2c_xfer may not.
Signed-off-by: Alyssa Milburn <amilburn@zall.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6bccc7f426 upstream.
In the PCI_MMAP_PROCFS case when the address being passed by the user is a
'user visible' resource address based on the bus window, and not the actual
contents of the resource, that's what we need to be checking it against.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07a77929ba upstream.
The author meant to free the variable that was just allocated, instead
of the one that failed to be allocated, but made a simple typo. This
patch rectifies that.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 65f921647f upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: e0d3bafd02 ("V4L/DVB (10954): Add cx231xx USB driver")
Cc: Sri Deevi <Srinivasa.Deevi@conexant.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0cd273bb5e upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: e0d3bafd02 ("V4L/DVB (10954): Add cx231xx USB driver")
Cc: Sri Deevi <Srinivasa.Deevi@conexant.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 256d013a9b upstream.
There are numerous issues in error handling code of cx231xx initialization.
Double free (when cx231xx_init_dev() calls kfree(dev) via cx231xx_release_resources()
and then cx231xx_usb_probe() does the same) and memory leaks
(e.g. usb_get_dev() before (ifnum != 1) check in cx231xx_usb_probe())
are just a few of them.
The patch fixes the issues in cx231xx_usb_probe() and cx231xx_init_dev()
by moving usb_get_dev(interface_to_usbdev(interface)) below in code and
implementing proper error handling.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2:
- Keep using &= rather than clear_bit()
- Adjust filename, context
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eacb975b48 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: 2a9f8b5d25 ("V4L/DVB (5206): Usbvision: set alternate interface
modification")
Cc: Thierry MERLE <thierry.merle@free.fr>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aa58fedb8c upstream.
Make sure to check the number of endpoints to avoid accessing memory
beyond the endpoint array should a device lack the expected endpoints.
Note that, as far as I can tell, the gspca framework has already made
sure there is at least one endpoint in the current alternate setting so
there should be no risk for a NULL-pointer dereference here.
Fixes: b517af7228 ("V4L/DVB: gspca_konica: New gspca subdriver for
konica chipset using cams")
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Hans Verkuil <hansverk@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebeb36670e upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: 36bcce4306 ("ath9k_htc: Handle storage devices")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1bb9914e17 upstream.
Notifications may only be 8 bytes long. Accessing the 9th and
10th byte of unimplemented/unknown notifications may be insecure.
Also check the length of known notifications before accessing anything
behind the 8th byte.
Signed-off-by: Tobias Herzog <t-herzog@gmx.de>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f86a96be0 upstream.
There is race condition when two USB class drivers try to call
init_usb_class at the same time and leads to crash.
code path: probe->usb_register_dev->init_usb_class
To solve this, mutex locking has been added in init_usb_class() and
destroy_usb_class().
As pointed by Alan, removed "if (usb_class)" test from destroy_usb_class()
because usb_class can never be NULL there.
Signed-off-by: Ajay Kaher <ajay.kaher@samsung.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 03eb2a557e upstream.
Make sure to check for the required out endpoint to avoid dereferencing
a NULL-pointer in mce_request_packet should a malicious device lack such
an endpoint. Note that this path is hit during probe.
Fixes: 66e89522af ("V4L/DVB: IR: add mceusb IR receiver driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: using mce_dbg() instead of dev_dbg()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f5cccf4942 upstream.
While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
the following crash was observed.
Unable to handle kernel NULL pointer dereference at virtual address 00000218
pgd = ffffffc00165f000
[00000218] *pgd=000000000174f003, *pud=000000000174f003,
*pmd=0000000001750003, *pte=00e8000001751713
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
ppp_generic slhc tun
CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
Hardware name: Google Kevin (DT)
Workqueue: pm pm_runtime_work
task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
PC is at autosuspend_check+0x74/0x174
LR is at autosuspend_check+0x70/0x174
...
Call trace:
[<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
[<ffffffc000810500>] usb_runtime_idle+0x20/0x40
[<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
[<ffffffc000786af0>] rpm_idle+0x1e8/0x498
[<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40
Source:
(gdb) l *0xffffffc00080dcc0
0xffffffc00080dcc0 is in autosuspend_check
(drivers/usb/core/driver.c:1778).
1773 /* We don't need to check interfaces that are
1774 * disabled for runtime PM. Either they are unbound
1775 * or else their drivers don't support autosuspend
1776 * and so they are permanently active.
1777 */
1778 if (intf->dev.power.disable_depth)
1779 continue;
1780 if (atomic_read(&intf->dev.power.usage_count) > 0)
1781 return -EBUSY;
1782 w |= intf->needs_remote_wakeup;
Code analysis shows that intf is set to NULL in usb_disable_device() prior
to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
lock the usb device, and neither does any of the functions in the
traceback. This means that there is no protection against a race condition
where usb_disable_device() is removing dev->actconfig->interface[] pointers
while those are being accessed from autosuspend_check().
To solve the problem, synchronize and validate device state between
autosuspend_check() and usb_disconnect().
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca260ece6a upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Fixes: a1030e92c1 ("[PATCH] zd1211rw: Convert installer CDROM device into WLAN device")
Cc: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c9101766b upstream.
This patch fixes an issue that kernel panic happens when DMA is enabled
and we press enter key while the kernel booting on the serial console.
* An interrupt may occur after sci_request_irq().
* DMA transfer area is initialized by setup_timer() in sci_request_dma()
and used in interrupt.
If an interrupt occurred between sci_request_irq() and setup_timer() in
sci_request_dma(), DMA transfer area has not been initialized yet.
So, this patch changes the order of sci_request_irq() and
sci_request_dma().
Fixes: 73a19e4c03 ("serial: sh-sci: Add DMA support.")
Signed-off-by: Takatoshi Akiyama <takatoshi.akiyama.kj@ps.hitachi-solutions.com>
[Shimoda changes the commit log]
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aea57edf80 upstream.
This device is available under different marketing names:
WLM-20U2 - Wireless USB Dongle for Toshiba TVs
GN-1080 - Wireless LAN Module for Toshiba MFPs.
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0088d27b78 upstream.
This device is a dongle made by Philips to enhance their TVs with wireless capabilities,
but works flawlessly on any upstream kernel, provided that the ath9k_htc module is attached to it.
It's correctly recognized by lsusb as "0471:209e Philips (or NXP) PTA01 Wireless Adapter" and the
patch has been tested on real hardware.
Signed-off-by: Leon Nardella <leon.nardella@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 763cbac076 upstream.
Roger says, Ubiquiti produce 2 versions of their WiFiStation USB adapter. One
has an internal antenna, the other has an external antenna and
name suffix EXT. They have separate USB ids and in distribution
openSUSE 12.2 (kernel 3.4.6), file /usr/share/usb.ids shows:
0cf3 Atheros Communications, Inc.
...
b002 Ubiquiti WiFiStation 802.11n [Atheros AR9271]
b003 Ubiquiti WiFiStationEXT 802.11n [Atheros AR9271]
Add b002 Ubiquiti WiFiStation in the PID/VID list.
Reported-by: Roger Price <ath9k@rogerprice.org>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6830733d53 upstream.
The driver uses a relatively large data structure on the stack, which
showed up on my radar as we get a warning with the "latent entropy"
GCC plugin:
drivers/media/usb/pvrusb2/pvrusb2-eeprom.c:153:1: error: the frame size of 1376 bytes is larger than 1152 bytes [-Werror=frame-larger-than=]
The warning is usually hidden as we raise the warning limit to 2048
when the plugin is enabled, but I'd like to lower that again in the
future, and making this function smaller helps to do that without
build regressions.
Further analysis shows that putting an 'i2c_client' structure on
the stack is not really supported, as the embedded 'struct device'
is not initialized here, and we are only saved by the fact that
the function that is called here does not use the pointer at all.
Fixes: d855497edb ("V4L/DVB (4228a): pvrusb2 to kernel 2.6.18")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba3021b2c7 upstream.
snd_timer_user_tselect() reallocates the queue buffer dynamically, but
it forgot to reset its indices. Since the read may happen
concurrently with ioctl and snd_timer_user_tselect() allocates the
buffer via kmalloc(), this may lead to the leak of uninitialized
kernel-space data, as spotted via KMSAN:
BUG: KMSAN: use of unitialized memory in snd_timer_user_read+0x6c4/0xa10
CPU: 0 PID: 1037 Comm: probe Not tainted 4.11.0-rc5+ #2739
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x143/0x1b0 lib/dump_stack.c:52
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
kmsan_check_memory+0xc2/0x140 mm/kmsan/kmsan.c:1086
copy_to_user ./arch/x86/include/asm/uaccess.h:725
snd_timer_user_read+0x6c4/0xa10 sound/core/timer.c:2004
do_loop_readv_writev fs/read_write.c:716
__do_readv_writev+0x94c/0x1380 fs/read_write.c:864
do_readv_writev fs/read_write.c:894
vfs_readv fs/read_write.c:908
do_readv+0x52a/0x5d0 fs/read_write.c:934
SYSC_readv+0xb6/0xd0 fs/read_write.c:1021
SyS_readv+0x87/0xb0 fs/read_write.c:1018
This patch adds the missing reset of queue indices. Together with the
previous fix for the ioctl/read race, we cover the whole problem.
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d11662f4f7 upstream.
The read from ALSA timer device, the function snd_timer_user_tread(),
may access to an uninitialized struct snd_timer_user fields when the
read is concurrently performed while the ioctl like
snd_timer_user_tselect() is invoked. We have already fixed the races
among ioctls via a mutex, but we seem to have forgotten the race
between read vs ioctl.
This patch simply applies (more exactly extends the already applied
range of) tu->ioctl_lock in snd_timer_user_tread() for closing the
race window.
Reported-by: Alexander Potapenko <glider@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98da7d0885 upstream.
When limiting the argv/envp strings during exec to 1/4 of the stack limit,
the storage of the pointers to the strings was not included. This means
that an exec with huge numbers of tiny strings could eat 1/4 of the stack
limit in strings and then additional space would be later used by the
pointers to the strings.
For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
single-byte strings would consume less than 2MB of stack, the max (8MB /
4) amount allowed, but the pointers to the strings would consume the
remaining additional stack space (1677721 * 4 == 6710884).
The result (1677721 + 6710884 == 8388605) would exhaust stack space
entirely. Controlling this stack exhaustion could result in
pathological behavior in setuid binaries (CVE-2017-1000365).
[akpm@linux-foundation.org: additional commenting from Kees]
Fixes: b6a2fea393 ("mm: variable length argument support")
Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Qualys Security Advisory <qsa@qualys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: use ACCESS_ONCE() instead of READ_ONCE()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e21f4af17 upstream.
The lp_setup() code doesn't apply any bounds checking when passing
"lp=none", and only in this case, resulting in an overflow of the
parport_nr[] array. All versions in Git history are affected.
Reported-By: Roee Hay <roee.hay@hcl.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 089bc0143f upstream.
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other backends do.
Build on the fact that all response structure flavors are actually
identical (the old code did make this assumption too).
This is XSA-216.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4866aa812 upstream.
Under CONFIG_STRICT_DEVMEM, reading System RAM through /dev/mem is
disallowed. However, on x86, the first 1MB was always allowed for BIOS
and similar things, regardless of it actually being System RAM. It was
possible for heap to end up getting allocated in low 1MB RAM, and then
read by things like x86info or dd, which would trip hardened usercopy:
usercopy: kernel memory exposure attempt detected from ffff880000090000 (dma-kmalloc-256) (4096 bytes)
This changes the x86 exception for the low 1MB by reading back zeros for
System RAM areas instead of blindly allowing them. More work is needed to
extend this to mmap, but currently mmap doesn't go through usercopy, so
hardened usercopy won't Oops the kernel.
Reported-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Tested-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11faa7b035 upstream.
We dereference "skb" to get "skb->len" so we should probably do that
step before freeing the skb.
Fixes: eea221ce48 ("tc35815 driver update (take 2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a9e840a208 upstream.
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b7c6d26758 upstream.
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad87170 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3018e947d7 upstream.
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
--
Dave, I didn't want to send you a new pull request for a single
commit yet again - can you apply this one patch as is?
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: Put the new code in an else-block since the
previous if-blocks may or may not return]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78f7a45dac upstream.
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
But instead it just showed an empty buffer:
># cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Fixes: 651e22f270 ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fe8c470ab8 upstream.
gcc -O2 cannot always prove that the loop in acpi_power_get_inferred_state()
is enterered at least once, so it assumes that cur_state might not get
initialized:
drivers/acpi/power.c: In function 'acpi_power_get_inferred_state':
drivers/acpi/power.c:222:9: error: 'cur_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This sets the variable to zero at the start of the loop, to ensure that
there is well-defined behavior even for an empty list. This gets rid of
the warning.
The warning first showed up when the -Os flag got removed in a bug fix
patch in linux-4.11-rc5.
I would suggest merging this addon patch on top of that bug fix to avoid
introducing a new warning in the stable kernels.
Fixes: 61b79e16c6 (ACPI: Fix incompatibility with mcount-based function graph tracing)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1644fe041 upstream.
This fixes CVE-2017-6951.
Userspace should not be able to do things with the "dead" key type as it
doesn't have some of the helper functions set upon it that the kernel
needs. Attempting to use it may cause the kernel to crash.
Fix this by changing the name of the type to ".dead" so that it's rejected
up front on userspace syscalls by key_get_type_from_user().
Though this doesn't seem to affect recent kernels, it does affect older
ones, certainly those prior to:
commit c06cfb08b8
Author: David Howells <dhowells@redhat.com>
Date: Tue Sep 16 17:36:06 2014 +0100
KEYS: Remove key_type::match in favour of overriding default by match_preparse
which went in before 3.18-rc1.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34a477e529 upstream.
On x86-32, with CONFIG_FIRMWARE and multiple CPUs, if you enable function
graph tracing and then suspend to RAM, it will triple fault and reboot when
it resumes.
The first fault happens when booting a secondary CPU:
startup_32_smp()
load_ucode_ap()
prepare_ftrace_return()
ftrace_graph_is_dead()
(accesses 'kill_ftrace_graph')
The early head_32.S code calls into load_ucode_ap(), which has an an
ftrace hook, so it calls prepare_ftrace_return(), which calls
ftrace_graph_is_dead(), which tries to access the global
'kill_ftrace_graph' variable with a virtual address, causing a fault
because the CPU is still in real mode.
The fix is to add a check in prepare_ftrace_return() to make sure it's
running in protected mode before continuing. The check makes sure the
stack pointer is a virtual kernel address. It's a bit of a hack, but
it's not very intrusive and it works well enough.
For reference, here are a few other (more difficult) ways this could
have potentially been fixed:
- Move startup_32_smp()'s call to load_ucode_ap() down to *after* paging
is enabled. (No idea what that would break.)
- Track down load_ucode_ap()'s entire callee tree and mark all the
functions 'notrace'. (Probably not realistic.)
- Pause graph tracing in ftrace_suspend_notifier_call() or bringup_cpu()
or __cpu_up(), and ensure that the pause facility can be queried from
real mode.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: linux-acpi@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/5c1272269a580660703ed2eccf44308e790c7a98.1492123841.git.jpoimboe@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e7655fd4f upstream.
The snd_use_lock_sync() (thus its implementation
snd_use_lock_sync_helper()) has the 5 seconds timeout to break out of
the sync loop. It was introduced from the beginning, just to be
"safer", in terms of avoiding the stupid bugs.
However, as Ben Hutchings suggested, this timeout rather introduces a
potential leak or use-after-free that was apparently fixed by the
commit 2d7d54002e ("ALSA: seq: Fix race during FIFO resize"):
for example, snd_seq_fifo_event_in() -> snd_seq_event_dup() ->
copy_from_user() could block for a long time, and snd_use_lock_sync()
goes timeout and still leaves the cell at releasing the pool.
For fixing such a problem, we remove the break by the timeout while
still keeping the warning.
Suggested-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 162b270c66 upstream.
KGDB is a kernel debug stub and it can't be used to debug userland as it
can only safely access kernel memory.
On MIPS however KGDB has always got the register state of sleeping
processes from the userland register context at the beginning of the
kernel stack. This is meaningless for kernel threads (which never enter
userland), and for user threads it prevents the user seeing what it is
doing while in the kernel:
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) 0x0000000000000000 in ?? ()
2 Thread 1 (init) 0x000000007705c4b4 in ?? ()
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Get the register state instead from the (partial) kernel register
context stored in the task's thread_struct for resume() to restore. All
threads now correctly appear to be in context_switch():
(gdb) info threads
Id Target Id Frame
...
3 Thread 2 (kthreadd) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
2 Thread 1 (init) context_switch (rq=<optimized out>, cookie=..., next=<optimized out>, prev=0x0) at kernel/sched/core.c:2903
1 Thread -2 (shadowCPU0) 0xffffffff8012524c in arch_kgdb_breakpoint () at arch/mips/kernel/kgdb.c:201
Call clobbered registers which aren't saved and exception registers
(BadVAddr & Cause) which can't be easily determined without stack
unwinding are reported as 0. The PC is taken from the return address,
such that the state presented matches that found immediately after
returning from resume().
Fixes: 8854700115 ("[MIPS] kgdb: add arch support for the kernel's kgdb core")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/15829/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6fdc6dd902 upstream.
The vsyscall32 sysctl can racy against a concurrent fork when it switches
from disabled to enabled:
arch_setup_additional_pages()
if (vdso32_enabled)
--> No mapping
sysctl.vsysscall32()
--> vdso32_enabled = true
create_elf_tables()
ARCH_DLINFO_IA32
if (vdso32_enabled) {
--> Add VDSO entry with NULL pointer
Make ARCH_DLINFO_IA32 check whether the VDSO mapping has been set up for
the newly forked process or not.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/20170410151723.602367196@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: change the flag passed to ARCH_DLINFO_IA32()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 321a52a391 upstream.
pppol2tp_getsockopt() doesn't take into account the error code returned
by pppol2tp_tunnel_getsockopt() or pppol2tp_session_getsockopt(). If
error occurs there, pppol2tp_getsockopt() continues unconditionally and
reports erroneous values.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 364700cf8f upstream.
pppol2tp_setsockopt() unconditionally overwrites the error value
returned by pppol2tp_tunnel_setsockopt() or
pppol2tp_session_setsockopt(), thus hiding errors from userspace.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5402e97af6 upstream.
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7c856152cb upstream.
We previously made sure that the reported disk capacity was less than
0xffffffff blocks when the kernel was not compiled with large sector_t
support (CONFIG_LBDAF). However, this check assumed that the capacity
was reported in units of 512 bytes.
Add a sanity check function to ensure that we only enable disks if the
entire reported capacity can be expressed in terms of sector_t.
Reported-by: Steve Magnani <steve.magnani@digidescorp.com>
Cc: Bart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: use integer literal instead of U32_MAX]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a00a786251 upstream.
Kefeng Wang discovered that old versions of the QEMU CD driver would
return mangled mode data causing us to walk off the end of the buffer in
an attempt to parse it. Sanity check the returned mode sense data.
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9121b15b56 upstream.
Connecting to the backend isn't working reliably in xen-fbfront: in
case XenbusStateInitWait of the backend has been missed the backend
transition to XenbusStateConnected will trigger the connected state
only without doing the actions required when the backend has
connected.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e08293a4cc upstream.
Take a reference on the sessions returned by l2tp_session_find_nth()
(and rename it l2tp_session_get_nth() to reflect this change), so that
caller is assured that the session isn't going to disappear while
processing it.
For procfs and debugfs handlers, the session is held in the .start()
callback and dropped in .show(). Given that pppol2tp_seq_session_show()
dereferences the associated PPPoL2TP socket and that
l2tp_dfs_seq_session_show() might call pppol2tp_show(), we also need to
call the session's .ref() callback to prevent the socket from going
away from under us.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Fixes: 0ad6614048 ("l2tp: Add debugfs files for dumping l2tp debug info")
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 48fe9e9488 upstream.
In the past, there was only one load-with-reservation instruction,
lwarx, and if a program attempted a lwarx on a misaligned address, it
would take an alignment interrupt and the kernel handler would emulate
it as though it was lwzx, which was not really correct, but benign since
it is loading the right amount of data, and the lwarx should be paired
with a stwcx. to the same address, which would also cause an alignment
interrupt which would result in a SIGBUS being delivered to the process.
We now have 5 different sizes of load-with-reservation instruction. Of
those, lharx and ldarx cause an immediate SIGBUS by luck since their
entries in aligninfo[] overlap instructions which were not fixed up, but
lqarx overlaps with lhz and will be emulated as such. lbarx can never
generate an alignment interrupt since it only operates on 1 byte.
To straighten this out and fix the lqarx case, this adds code to detect
the l[hwdq]arx instructions and return without fixing them up, resulting
in a SIGBUS being delivered to the process.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: open-code get_xop()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89e357d83c upstream.
A dump may come in the middle of another dump, modifying its dump
structure members. This race condition will result in NULL pointer
dereference in kernel. So add a lock to prevent that race.
Fixes: 83321d6b98 ("[AF_KEY]: Dump SA/SP entries non-atomically")
Signed-off-by: Yuejie Shi <syjcnss@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
[bwh: Backported to 3.2:
- pfkey_dump() doesn't support filters
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c99de981f upstream.
Once upon a time back in 2009, a work-around was added to support
the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login
did not propose nor respond to MaxBurstLength, FirstBurstLength,
DefaultTime2Wait and DefaultTime2Retain keys.
The work-around in iscsi_check_proposer_for_optional_reply()
allowed the missing keys to be proposed, but did not require
waiting for a response before moving to full feature phase
operation. This allowed GlobalSAN v3.3 to work out-of-the
box, and for many years we didn't run into login interopt
issues with any other initiators..
Until recently, when Martin tried a QLogic 57840S iSCSI Offload
HBA on Windows 2016 which completed login, but subsequently
failed with:
Got unknown iSCSI OpCode: 0x43
The issue was QLogic MSFT side did not propose DefaultTime2Wait +
DefaultTime2Retain, so LIO proposes them itself, and immediately
transitions to full feature phase because of the GlobalSAN hack.
However, the QLogic MSFT side still attempts to respond to
DefaultTime2Retain + DefaultTime2Wait, even though LIO has set
ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT
in last login response.
So while the QLogic MSFT side should have been proposing these
two keys to start, it was doing the correct thing per RFC-3720
attempting to respond to proposed keys before transitioning to
full feature phase.
All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541)
does correctly propose the four keys during login, making the
original work-around moot.
So in order to allow QLogic MSFT to run unmodified as-is, go
ahead and drop this long standing work-around.
Reported-by: Martin Svec <martin.svec@zoner.cz>
Cc: Martin Svec <martin.svec@zoner.cz>
Cc: Himanshu Madhani <Himanshu.Madhani@cavium.com>
Cc: Arun Easi <arun.easi@cavium.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2777e2ab5a upstream.
Callers of l2tp_nl_session_find() need to hold a reference on the
returned session since there's no guarantee that it isn't going to
disappear from under them.
Relying on the fact that no l2tp netlink message may be processed
concurrently isn't enough: sessions can be deleted by other means
(e.g. by closing the PPPOL2TP socket of a ppp pseudowire).
l2tp_nl_cmd_session_delete() is a bit special: it runs a callback
function that may require a previous call to session->ref(). In
particular, for ppp pseudowires, the callback is l2tp_session_delete(),
which then calls pppol2tp_session_close() and dereferences the PPPOL2TP
socket. The socket might already be gone at the moment
l2tp_session_delete() calls session->ref(), so we need to take a
reference during the session lookup. So we need to pass the do_ref
variable down to l2tp_session_get() and l2tp_session_get_by_ifname().
Since all callers have to be updated, l2tp_session_find_by_ifname() and
l2tp_nl_session_find() are renamed to reflect their new behaviour.
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dbdbc73b44 upstream.
l2tp_session_create() relies on its caller for checking for duplicate
sessions. This is racy since a session can be concurrently inserted
after the caller's verification.
Fix this by letting l2tp_session_create() verify sessions uniqueness
upon insertion. Callers need to be adapted to check for
l2tp_session_create()'s return code instead of calling
l2tp_session_find().
pppol2tp_connect() is a bit special because it has to work on existing
sessions (if they're not connected) or to create a new session if none
is found. When acting on a preexisting session, a reference must be
held or it could go away on us. So we have to use l2tp_session_get()
instead of l2tp_session_find() and drop the reference before exiting.
Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: add 'pos' parameter to
hlist_for_each_entry{,_rcu}() calls]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57377d6354 upstream.
Holding a reference on session is required before calling
pppol2tp_session_ioctl(). The session could get freed while processing the
ioctl otherwise. Since pppol2tp_session_ioctl() uses the session's socket,
we also need to take a reference on it in l2tp_session_get().
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 61b9a04772 upstream.
Taking a reference on sessions in l2tp_recv_common() is racy; this
has to be done by the callers.
To this end, a new function is required (l2tp_session_get()) to
atomically lookup a session and take a reference on it. Callers then
have to manually drop this reference.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop changes to l2tp_ip6.c
- Add 'pos' parameter to hlist_for_each_entry{,_rcu}() calls
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6040bc610 upstream.
The reference manual for the i.MX28 recommends to calculate the divisor
as
divisor = (UARTCLK * 32) / baud rate, rounded to the nearest integer
, so let's do this. For a typical setup of UARTCLK = 24 MHz and baud
rate = 115200 this changes the divisor from 6666 to 6667 and so the
actual baud rate improves from 115211.521 Bd (error ≅ 0.01 %) to
115194.240 Bd (error ≅ 0.005 %).
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit df57cf6a87 upstream.
Currently mxs-auart doesn't care correctly about the baud rate divisor.
According to reference manual the baud rate divisor must be between
0x000000EC and 0x003FFFC0. So calculate the possible baud rate range
and use it for uart_get_baud_rate().
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 923713b357 upstream.
SDIO cards may need clock to send the card interrupt to the host.
On a cherrytrail tablet with a RTL8723BS wifi chip, without this patch
pinging the tablet results in:
PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=78.6 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1760 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=753 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=3.88 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=795 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1841 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=810 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1860 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=812 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=48.6 ms
Where as with this patch I get:
PING 192.168.1.14 (192.168.1.14) 56(84) bytes of data.
64 bytes from 192.168.1.14: icmp_seq=1 ttl=64 time=3.96 ms
64 bytes from 192.168.1.14: icmp_seq=2 ttl=64 time=1.97 ms
64 bytes from 192.168.1.14: icmp_seq=3 ttl=64 time=17.2 ms
64 bytes from 192.168.1.14: icmp_seq=4 ttl=64 time=2.46 ms
64 bytes from 192.168.1.14: icmp_seq=5 ttl=64 time=2.83 ms
64 bytes from 192.168.1.14: icmp_seq=6 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=7 ttl=64 time=2.10 ms
64 bytes from 192.168.1.14: icmp_seq=8 ttl=64 time=1.40 ms
64 bytes from 192.168.1.14: icmp_seq=9 ttl=64 time=2.04 ms
64 bytes from 192.168.1.14: icmp_seq=10 ttl=64 time=1.40 ms
Cc: Dong Aisheng <b29396@freescale.com>
Cc: Ian W MORRISON <ianwmorrison@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 53e16798b0 upstream.
The mesa winsys sometimes uses unimplemented parameter requests to
check for features. Remove the error message to avoid bloating the
kernel log.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 63774069d9 upstream.
In vmw_get_cap_3d_ioctl(), a user can supply 0 for a size that is
used in vzalloc(). This eventually calls dump_stack() (in warn_alloc()),
which can leak useful addresses to dmesg.
Add check to avoid a size of 0.
Signed-off-by: Murray McAllister <murray.mcallister@insomniasec.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f7652afa8e upstream.
A malicious caller could otherwise hand over handles to other objects
causing all sorts of interesting problems.
Testing done: Ran a Fedora 25 desktop using both Xorg and
gnome-shell/Wayland.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9cd9a21ce0 upstream.
In commit 6afaf8a484 ("UBI: flush wl before clearing update marker") I
managed to trigger and fix a similar bug. Now here is another version of
which I assumed it wouldn't matter back then but it turns out UBI has a
check for it and will error out like this:
|ubi0 warning: validate_vid_hdr: inconsistent used_ebs
|ubi0 error: validate_vid_hdr: inconsistent VID header at PEB 592
All you need to trigger this is? "ubiupdatevol /dev/ubi0_0 file" + a
powercut in the middle of the operation.
ubi_start_update() sets the update-marker and puts all EBs on the erase
list. After that userland can proceed to write new data while the old EB
aren't erased completely. A powercut at this point is usually not that
much of a tragedy. UBI won't give read access to the static volume
because it has the update marker. It will most likely set the corrupted
flag because it misses some EBs.
So we are all good. Unless the size of the image that has been written
differs from the old image in the magnitude of at least one EB. In that
case UBI will find two different values for `used_ebs' and refuse to
attach the image with the error message mentioned above.
So in order not to get in the situation, the patch will ensure that we
wait until everything is removed before it tries to write any data.
The alternative would be to detect such a case and remove all EBs at the
attached time after we processed the volume-table and see the
update-marker set. The patch looks bigger and I doubt it is worth it
since usually the write() will wait from time to time for a new EB since
usually there not that many spare EB that can be used.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2: ubi_wl_flush() only takes 1 parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d3519b9d96 upstream.
xhci needs to take care of four scenarios when asked to cancel a URB.
1 URB is not queued or already given back.
usb_hcd_check_unlink_urb() will return an error, we pass the error on
2 We fail to find xhci internal structures from urb private data such as
virtual device and endpoint ring.
Give back URB immediately, can't do anything about internal structures.
3 URB private data has valid pointers to xhci internal data, but host is
not responding.
give back URB immedately and remove the URB from the endpoint lists.
4 Everyting is working
add URB to cancel list, queue a command to stop the endpoint, after
which the URB can be turned to no-op or skipped, removed from lists,
and given back.
We failed to give back the urb in case 2 where the correct device and
endpoint pointers could not be retrieved from URB private data.
This caused a hang on Dell Inspiron 5558/0VNM2T at resume from suspend
as urb was never returned.
[ 245.270505] INFO: task rtsx_usb_ms_1:254 blocked for more than 120 seconds.
[ 245.272244] Tainted: G W 4.11.0-rc3-ARCH #2
[ 245.273983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.275737] rtsx_usb_ms_1 D 0 254 2 0x00000000
[ 245.277524] Call Trace:
[ 245.279278] __schedule+0x2d3/0x8a0
[ 245.281077] schedule+0x3d/0x90
[ 245.281961] usb_kill_urb.part.3+0x6c/0xa0 [usbcore]
[ 245.282861] ? wake_atomic_t_function+0x60/0x60
[ 245.283760] usb_kill_urb+0x21/0x30 [usbcore]
[ 245.284649] usb_start_wait_urb+0xe5/0x170 [usbcore]
[ 245.285541] ? try_to_del_timer_sync+0x53/0x80
[ 245.286434] usb_bulk_msg+0xbd/0x160 [usbcore]
[ 245.287326] rtsx_usb_send_cmd+0x63/0x90 [rtsx_usb]
Reported-by: diego.viola@gmail.com
Tested-by: diego.viola@gmail.com
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Move debug logging along with endpoint lookup
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 61b79e16c6 upstream.
Paul Menzel reported a warning:
WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
Bad frame pointer: expected f6919d98, received f6919db0
from func acpi_pm_device_sleep_wake return to c43b6f9d
The warning means that function graph tracing is broken for the
acpi_pm_device_sleep_wake() function. That's because the ACPI Makefile
unconditionally sets the '-Os' gcc flag to optimize for size. That's an
issue because mcount-based function graph tracing is incompatible with
'-Os' on x86, thanks to the following gcc bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
I have another patch pending which will ensure that mcount-based
function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
x86.
But this patch is needed in addition to that one because the ACPI
Makefile overrides that config option for no apparent reason. It has
had this flag since the beginning of git history, and there's no related
comment, so I don't know why it's there. As far as I can tell, there's
no reason for it to be there. The appropriate behavior is for it to
honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
kernel.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7d64f82cce upstream.
When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 81e88fdc43 (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0bb2d50df upstream.
The latest gcc-7.0.1 snapshot reports a new warning:
virtio/virtio_balloon.c: In function 'update_balloon_stats':
virtio/virtio_balloon.c:258:26: error: 'events[2]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:260:26: error: 'events[3]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:261:56: error: 'events[18]' is used uninitialized in this function [-Werror=uninitialized]
virtio/virtio_balloon.c:262:56: error: 'events[17]' is used uninitialized in this function [-Werror=uninitialized]
This seems absolutely right, so we should add an extra check to
prevent copying uninitialized stack data into the statistics.
>From all I can tell, this has been broken since the statistics code
was originally added in 2.6.34.
Fixes: 9564e138b1 ("virtio: Add memory statistics reporting to the balloon driver (V4)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc8653228c upstream.
When init_vqs runs, virtio_balloon.stats is either uninitialized or
contains stale values. The host updates its state with garbage data
because it has no way of knowing that this is just a marker buffer
used for signaling.
This patch updates the stats before pushing the initial buffer.
Alternative fixes:
* Push an empty buffer in init_vqs. Not easily done with the current
virtio implementation and violates the spec "Driver MUST supply the
same subset of statistics in all buffers submitted to the statsq".
* Push a buffer with invalid tags in init_vqs. Violates the same
spec clause, plus "invalid tag" is not really defined.
Note: the spec says:
When using the legacy interface, the device SHOULD ignore all values in
the first buffer in the statsq supplied by the driver after device
initialization. Note: Historically, drivers supplied an uninitialized
buffer in the first buffer.
Unfortunately QEMU does not seem to implement the recommendation
even for the legacy interface.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 75c689dca9 upstream.
In the commit 93557f53e1 ("netfilter: nf_conntrack: nf_conntrack snmp
helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
snmp_helper is never registered. But it still tries to unregister the
snmp_helper, it could cause the panic.
Now remove the useless snmp_helper and the unregister call in the
error handler.
Fixes: 93557f53e1 ("netfilter: nf_conntrack: nf_conntrack snmp helper")
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f6aafac184 upstream.
aarch64-linux-gcc-7 complains about code it doesn't fully understand:
drivers/infiniband/hw/qib/qib_iba7322.c: In function 'qib_7322_txchk_change':
include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The code is right, and despite trying hard, I could not come up with a version
that I liked better than just adding a fake initialization here to shut up the
warning.
Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49d52e8108 upstream.
If the PHY is halted on stop, then do not set the state to PHY_UP. This
ensures the phy will be restarted later in phy_start when the machine is
started again.
Fixes: 00db8189d9 ("This patch adds a PHY Abstraction Layer to the Linux Kernel, enabling ethernet drivers to remain as ignorant as is reasonable of the connected PHY's design and operation details.")
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Xander Huff <xander.huff@ni.com>
Acked-by: Kyle Roeschley <kyle.roeschley@ni.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d7d54002e upstream.
When a new event is queued while processing to resize the FIFO in
snd_seq_fifo_clear(), it may lead to a use-after-free, as the old pool
that is being queued gets removed. For avoiding this race, we need to
close the pool to be deleted and sync its usage before actually
deleting it.
The issue was spotted by syzkaller.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a2125d0244 upstream.
The latest gcc-7 snapshot adds a warning to point out that when
atk_read_value_old or atk_read_value_new fails, we copy
uninitialized data into sensor->cached_value:
drivers/hwmon/asus_atk0110.c: In function 'atk_input_show':
drivers/hwmon/asus_atk0110.c:651:26: error: 'value' may be used uninitialized in this function [-Werror=maybe-uninitialized]
Adding an error check avoids this. All versions of the driver
are affected.
Fixes: 2c03d07ad5 ("hwmon: Add Asus ATK0110 support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e2ebfb2142 upstream.
Disabling interrupts for even a millisecond can cause problems for some
devices. That can happen when sdhci changes clock frequency because it
waits for the clock to become stable under a spin lock.
The spin lock is not necessary here. Anything that is racing with changes
to the I/O state is already broken. The mmc core already provides
synchronization via "claiming" the host.
Although the spin lock probably should be removed from the code paths that
lead to this point, such a patch would touch too much code to be suitable
for stable trees. Consequently, for this patch, just drop the spin lock
while waiting.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d82c0d12c9 upstream.
Reorder the operations in decompress_kernel() to ensure initrd is moved
to a safe location before the bss section is zeroed.
During decompression bss can overlap with the initrd and this can
corrupt the initrd contents depending on the size of the compressed
kernel (which affects where the initrd is placed by the bootloader) and
the size of the bss section of the decompressor.
Also use the correct initrd size when checking for overlaps with
parmblock.
Fixes: 06c0dd72ae ([S390] fix boot failures with compressed kernels)
Reviewed-by: Joy Latten <joy.latten@canonical.com>
Reviewed-by: Vineetha HariPai <vineetha.hari.pai@canonical.com>
Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aea92fb2e0 upstream.
skb_cow(skb, sizeof(ip header)) is not very helpful in this context.
First we need to use pskb_may_pull() to make sure the ip header
is in skb linear part, then use skb_try_make_writable() to
address clones issues.
Fixes: 4c30719f4f ("[PKT_SCHED] dsmark: handle cloned and non-linear skb's")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3697649ff2 upstream.
When we're dealing with clones and the area is not writeable, try
harder and get a copy via pskb_expand_head(). Replace also other
occurences in tc actions with the new skb_try_make_writable().
Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: drop changes to bpf; only tc actions need fixing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7df9c24625 upstream.
Dmitry has reported that a BUG_ON() condition in unix_notinflight()
may be triggered by a simple code that forwards unix socket in an
SCM_RIGHTS message.
That is caused by incorrect unix socket GC implementation in unix_gc().
The GC first collects list of candidates, then (a) decrements their
"children's" inflight counter, (b) checks which inflight counters are
now 0, and then (c) increments all inflight counters back.
(a) and (c) are done by calling scan_children() with inc_inflight or
dec_inflight as the second argument.
Commit 6209344f5a ("net: unix: fix inflight counting bug in garbage
collector") changed scan_children() such that it no longer considers
sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
of code that that unsets this flag _before_ invoking
scan_children(, dec_iflight, ). This may lead to incorrect inflight
counters for some sockets.
This change fixes this bug by changing order of operations:
UNIX_GC_CANDIDATE is now unset only after all inflight counters are
restored to the original state.
kernel BUG at net/unix/garbage.c:149!
RIP: 0010:[<ffffffff8717ebf4>] [<ffffffff8717ebf4>]
unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
Call Trace:
[<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
[<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
[<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
[<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
[<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
[<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705
[<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
[<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836
[<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570
[<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017
[<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208
[<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244
[<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
[< inline >] exit_task_work include/linux/task_work.h:21
[<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828
[<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931
[<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307
[<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
[<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
arch/x86/entry/common.c:259
[<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Link: https://lkml.org/lkml/2017/3/6/252
Signed-off-by: Andrey Ulanov <andreyu@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c520ff3d03 upstream.
When snd_seq_pool_done() is called, it marks the closing flag to
refuse the further cell insertions. But snd_seq_pool_done() itself
doesn't clear the cells but just waits until all cells are cleared by
the caller side. That is, it's racy, and this leads to the endless
stall as syzkaller spotted.
This patch addresses the racy by splitting the setup of pool->closing
flag out of snd_seq_pool_done(), and calling it properly before
snd_seq_pool_done().
BugLink: http://lkml.kernel.org/r/CACT4Y+aqqy8bZA1fFieifNxR2fAfFQQABcBHj801+u5ePV0URw@mail.gmail.com
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9702c67c60 upstream.
The total ata xfer length may not be calculated properly, in that we do
not use the proper method to get an sg element dma length.
According to the code comment, sg_dma_len() should be used after
dma_map_sg() is called.
This issue was found by turning on the SMMUv3 in front of the hisi_sas
controller in hip07. Multiple sg elements were being combined into a
single element, but the original first element length was being use as
the total xfer length.
Fixes: ff2aeb1eb6 ("libata: convert to chained sg")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f363a06642 upstream.
In the commit [15c75b09f8: ALSA: ctxfi: Fallback DMA mask to 32bit],
I forgot to put "!" at dam_set_mask() call check in cthw20k1.c (while
cthw20k2.c is OK). This patch fixes that obvious bug.
(As a side note: although the original commit was completely wrong,
it's still working for most of machines, as it sets to 32bit DMA mask
in the end. So the bug severity is low.)
Fixes: 15c75b09f8 ("ALSA: ctxfi: Fallback DMA mask to 32bit")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7ede72a6d upstream.
The current symbols__fixup_end() heuristic for the last entry in the rb
tree is suboptimal as it leads to not being able to recognize the symbol
in the call graph in a couple of corner cases, for example:
i) If the symbol has a start address (f.e. exposed via kallsyms)
that is at a page boundary, then the roundup(curr->start, 4096)
for the last entry will result in curr->start == curr->end with
a symbol length of zero.
ii) If the symbol has a start address that is shortly before a page
boundary, then also here, curr->end - curr->start will just be
very few bytes, where it's unrealistic that we could perform a
match against.
Instead, change the heuristic to roundup(curr->start, 4096) + 4096, so
that we can catch such corner cases and have a better chance to find
that specific symbol. It's still just best effort as the real end of the
symbol is unknown to us (and could even be at a larger offset than the
current range), but better than the current situation.
Alexei reported that he recently run into case i) with a JITed eBPF
program (these are all page aligned) as the last symbol which wasn't
properly shown in the call graph (while other eBPF program symbols in
the rb tree were displayed correctly). Since this is a generic issue,
lets try to improve the heuristic a bit.
Reported-and-Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes: 2e538c4a18 ("perf tools: Improve kernel/modules symbol lookup")
Link: http://lkml.kernel.org/r/bb5c80d27743be6f12afc68405f1956a330e1bc9.1489614365.git.daniel@iogearbox.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cdd7928df0 upstream.
The gadget code exports the bitfield for serial status changes
over the wire in its internal endianness. The fix is to convert
to little endian before sending it over the wire.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Tested-by: 家瑋 <momo1208@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 687e0687f7 upstream.
USBTMC devices are required to have a bulk-in and a bulk-out endpoint,
but the driver failed to verify this, something which could lead to the
endpoint addresses being taken from uninitialised memory.
Make sure to zero all private data as part of allocation, and add the
missing endpoint sanity check.
Note that this also addresses a more recently introduced issue, where
the interrupt-in-presence flag would also be uninitialised whenever the
optional interrupt-in endpoint is not present. This in turn could lead
to an interrupt urb being allocated, initialised and submitted based on
uninitialised values.
Fixes: dbf3e7f654 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.")
Fixes: 5b775f672c ("USB: add USB test and measurement class driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ee39733fb upstream.
Anycast routes have the RTF_ANYCAST flag set, but when dumping routes
for userspace the route type is not set to RTN_ANYCAST. Make it so.
Fixes: 58c4fb86ea ("[IPV6]: Flag RTF_ANYCAST for anycast routes")
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb1b494663 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba340d7b83 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: bba5394ad3 ("Input: add support for Hanwang tablets")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5cc4a1a9f5 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: aca951a22a ("[PATCH] input-driver-yealink-P1K-usb-phone")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ac2ee9ba95 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: c04148f915 ("Input: add driver for USB VoIP phones with CM109...")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 59cf8bed44 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory that lie beyond the end of the endpoint
array should a malicious device lack the expected endpoints.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 181302dc72 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: 53f3a9e26e ("mmc: USB SD Host Controller (USHC) driver")
Cc: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e5f32f7a4 upstream.
If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
the pending sample window time on exit, setting the next update not
one window into the future, but two.
This situation on exiting NO_HZ is described by:
this_rq->calc_load_update < jiffies < calc_load_update
In this scenario, what we should be doing is:
this_rq->calc_load_update = calc_load_update [ next window ]
But what we actually do is:
this_rq->calc_load_update = calc_load_update + LOAD_FREQ [ next+1 window ]
This has the effect of delaying load average updates for potentially
up to ~9seconds.
This can result in huge spikes in the load average values due to
per-cpu uninterruptible task counts being out of sync when accumulated
across all CPUs.
It's safe to update the per-cpu active count if we wake between sample
windows because any load that we left in 'calc_load_idle' will have
been zero'd when the idle load was folded in calc_global_load().
This issue is easy to reproduce before,
commit 9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")
just by forking short-lived process pipelines built from ps(1) and
grep(1) in a loop. I'm unable to reproduce the spikes after that
commit, but the bug still seems to be present from code review.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: commit 5167e8d ("sched/nohz: Rewrite and fix load-avg computation -- again")
Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e9f44eaae upstream.
Add Quectel UC15, UC20, EC21, and EC25. The EC20 is handled by
qcserial due to a USB VID/PID conflict with an existing Acer
device.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3243367b20 upstream.
Some USB 2.0 devices erroneously report millisecond values in
bInterval. The generic config code manages to catch most of them,
but in some cases it's not completely enough.
The case at stake here is a USB 2.0 braille device, which wants to
announce 10ms and thus sets bInterval to 10, but with the USB 2.0
computation that yields to 64ms. It happens that one can type fast
enough to reach this interval and get the device buffers overflown,
leading to problematic latencies. The generic config code does not
catch this case because the 64ms is considered a sane enough value.
This change thus adds a USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL quirk
to mark devices which actually report milliseconds in bInterval,
and marks Vario Ultra devices as needing it.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ce362711d upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Note that the dereference happens in the cmd and wait_init_done
callbacks which are called during probe.
Fixes: 1ba47da527 ("uwb: add the i1480 DFU driver")
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit daf229b159 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Note that the dereference happens in the start callback which is called
during probe.
Fixes: de520b8bd5 ("uwb: add HWA radio controller driver")
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 03ace948a4 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
This specifically fixes the NULL-pointer dereference when probing HWA HC
devices.
Fixes: df3654236e ("wusb: add the Wire Adapter (WA) core")
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: David Vrabel <david.vrabel@csr.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f259ca3eed upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
Note that the endpoint access that causes the NULL-deref is currently
only used for debugging purposes during probe so the oops only happens
when dynamic debugging is enabled. This means the driver could be
rewritten to continue to accept device with only two endpoints, should
such devices exist.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0addd3fa6 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e526fdff7 upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer or accessing memory beyond the endpoint array should a
malicious device lack the expected endpoints.
The endpoints are specifically dereferenced in the i2400m_bootrom_init
path during probe (e.g. in i2400mu_tx_bulk_out).
Fixes: f398e4240f ("i2400m/USB: probe/disconnect, dev init/shutdown
and reset backends")
Cc: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 68c32f9c2a upstream.
Make sure to check the number of endpoints to avoid dereferencing a
NULL-pointer should a malicious device lack endpoints.
Fixes: cf7776dc05 ("[PATCH] isdn4linux: Siemens Gigaset drivers -
direct USB connection")
Cc: Hansjoerg Lipp <hjlipp@web.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4b3b45edba upstream.
commit c146066ab8 ("ipv4: Don't use ufo handling on later transformed
packets") and commit f89c56ce71 ("ipv6: Don't use ufo handling on
later transformed packets") added a check that 'rt->dst.header_len' isn't
zero in order to skip UFO, but it doesn't include IPcomp in transport mode
where it equals zero.
Packets, after payload compression, may not require further fragmentation,
and if original length exceeds MTU, later compressed packets will be
transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
* IPv4 case, offset is wrong + unnecessary fragmentation
udp_ipsec.sh -p comp -m transport -s 2000 &
tcpdump -ni ltp_ns_veth2
...
IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
proto Compressed IP (108), length 49)
10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
* IPv6 case, sending small fragments
udp_ipsec.sh -6 -p comp -m transport -s 2000 &
tcpdump -ni ltp_ns_veth2
...
IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
Fixes: c146066ab8 ("ipv4: Don't use ufo handling on later transformed packets")
Fixes: f89c56ce71 ("ipv6: Don't use ufo handling on later transformed packets")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit de46e56653 upstream.
Make sure to verify that we have the required interrupt-out endpoint for
IOWarrior56 devices to avoid dereferencing a NULL-pointer in write
should a malicious device lack such an endpoint.
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 67b0503db9 upstream.
The buffer allocation for the firmware data was changed in
commit 43fab9793c ("[media] dvb-usb: don't use stack for firmware load")
but the same applies for the reset value.
Fixes: 43fab9793c ("[media] dvb-usb: don't use stack for firmware load")
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c76d7cd52 upstream.
Add missing sanity check to the bulk-in completion handler to avoid an
integer underflow that could be triggered by a malicious device.
This avoids leaking up to 56 bytes from after the URB transfer buffer to
user space.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0b1d250afb upstream.
Fix a NULL-pointer dereference in the interrupt callback should a
malicious device send data containing a bad port number by adding the
missing sanity check.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- Use &urb->dev->dev instead of local dev variable
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7369090a9f upstream.
Some gadget drivers are bad, bad boys. We notice
that ADB was passing bad Burst Size which caused top
bits of param0 to be overwritten which confused DWC3
when running this command.
In order to avoid future issues, we're going to make
sure values passed by macros are always safe for the
controller. Note that ADB still needs a fix to *not*
pass bad values.
Reported-by: Mohamed Abbas <mohamed.abbas@intel.com>
Sugested-by: Adam Andruszak <adam.andruszak@intel.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bf7165cfa2 upstream.
There are several trace include files that define TRACE_INCLUDE_FILE.
Include several of them in the same .c file (as I currently have in
some code I am working on), and the compile will blow up with a
"warning: "TRACE_INCLUDE_FILE" redefined #define TRACE_INCLUDE_FILE syscalls"
Every other include file in include/trace/events/ avoids that issue
by having a #undef TRACE_INCLUDE_FILE before the #define; syscalls.h
should have one, too.
Link: http://lkml.kernel.org/r/20160928225554.13bd7ac6@annuminas.surriel.com
Fixes: b8007ef742 ("tracing: Separate raw syscall from syscall tracer")
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c74fd80f2f upstream.
Revert the main part of commit:
af42b8d12f ("xen: fix MSI setup and teardown for PV on HVM guests")
That commit introduced reading the pci device's msi message data to see
if a pirq was previously configured for the device's msi/msix, and re-use
that pirq. At the time, that was the correct behavior. However, a
later change to Qemu caused it to call into the Xen hypervisor to unmap
all pirqs for a pci device, when the pci device disables its MSI/MSIX
vectors; specifically the Qemu commit:
c976437c7dba9c7444fb41df45468968aaa326ad
("qemu-xen: free all the pirqs for msi/msix when driver unload")
Once Qemu added this pirq unmapping, it was no longer correct for the
kernel to re-use the pirq number cached in the pci device msi message
data. All Qemu releases since 2.1.0 contain the patch that unmaps the
pirqs when the pci device disables its MSI/MSIX vectors.
This bug is causing failures to initialize multiple NVMe controllers
under Xen, because the NVMe driver sets up a single MSIX vector for
each controller (concurrently), and then after using that to talk to
the controller for some configuration data, it disables the single MSIX
vector and re-configures all the MSIX vectors it needs. So the MSIX
setup code tries to re-use the cached pirq from the first vector
for each controller, but the hypervisor has already given away that
pirq to another controller, and its initialization fails.
This is discussed in more detail at:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html
Fixes: af42b8d12f ("xen: fix MSI setup and teardown for PV on HVM guests")
Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 83cd904d27 upstream.
Commit 6bd4837de9 ("mm: simplify find_vma_prev()") broke memory
management on PA-RISC.
After application of the patch, programs that allocate big arrays on the
stack crash with segfault, for example, this will crash if compiled
without optimization:
int main()
{
char array[200000];
array[199999] = 0;
return 0;
}
The reason is that PA-RISC has up-growing stack and the stack is usually
the last memory area. In the above example, a page fault happens above
the stack.
Previously, if we passed too high address to find_vma_prev, it returned
NULL and stored the last VMA in *pprev. After "simplify find_vma_prev"
change, it stores NULL in *pprev. Consequently, the stack area is not
found and it is not expanded, as it used to be before the change.
This patch restores the old behavior and makes it return the last VMA in
*pprev if the requested address is higher than address of any other VMA.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f2f97656a upstream.
This fixes CVE-2017-7482.
When a kerberos 5 ticket is being decoded so that it can be loaded into an
rxrpc-type key, there are several places in which the length of a
variable-length field is checked to make sure that it's not going to
overrun the available data - but the data is padded to the nearest
four-byte boundary and the code doesn't check for this extra. This could
lead to the size-remaining variable wrapping and the data pointer going
over the end of the buffer.
Fix this by making the various variable-length data checks use the padded
length.
Reported-by: 石磊 <shilei-c@360.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.c.dionne@auristor.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd726c90b6 upstream.
Fix expand_upwards() on architectures with an upward-growing stack (parisc,
metag and partly IA-64) to allow the stack to reliably grow exactly up to
the address space limit given by TASK_SIZE.
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1be7107fbe upstream.
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Hugh Dickins: Backported to 3.2]
[bwh: Fix more instances of vma->vm_start in sparc64 impl. of
arch_get_unmapped_area_topdown() and generic impl. of
hugetlb_get_unmapped_area()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0988496433 upstream.
The stack vma is designed to grow automatically (marked with VM_GROWSUP
or VM_GROWSDOWN depending on architecture) when an access is made beyond
the existing boundary. However, particularly if you have not limited
your stack at all ("ulimit -s unlimited"), this can cause the stack to
grow even if the access was really just one past *another* segment.
And that's wrong, especially since we first grow the segment, but then
immediately later enforce the stack guard page on the last page of the
segment. So _despite_ first growing the stack segment as a result of
the access, the kernel will then make the access cause a SIGSEGV anyway!
So do the same logic as the guard page check does, and consider an
access to within one page of the next segment to be a bad access, rather
than growing the stack to abut the next segment.
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 232cd35d08 upstream.
Andrey Konovalov and idaifish@gmail.com reported crashes caused by
one skb shared_info being overwritten from __ip6_append_data()
Andrey program lead to following state :
copy -4200 datalen 2000 fraglen 2040
maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
fraggap, 0); is overwriting skb->head and skb_shared_info
Since we apparently detect this rare condition too late, move the
code earlier to even avoid allocating skb and risking crashes.
Once again, many thanks to Andrey and syzkaller team.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: <idaifish@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7dd7eb9513 upstream.
Do not use unsigned variables to see if it returns a negative
error or not.
Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30572418b4 upstream.
This driver needlessly took another reference to the tty on open, a
reference which was then never released on close. This lead to not just
a leak of the tty, but also a driver reference leak that prevented the
driver from being unloaded after a port had once been opened.
Fixes: 4a90f09b20 ("tty: usb-serial krefs")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- The 'serial' variable is still needed for other initialisation
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 654b404f2a upstream.
Add missing sanity check to the bulk-in completion handler to avoid an
integer underflow that can be triggered by a malicious device.
This avoids leaking 128 kB of memory content from after the URB transfer
buffer to user space.
Fixes: 8c209e6782 ("USB: make actual_length in struct urb field u32")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 657831ffc3 upstream.
syzkaller found a way to trigger double frees from ip_mc_drop_socket()
It turns out that leave a copy of parent mc_list at accept() time,
which is very bad.
Very similar to commit 8b485ce698 ("tcp: do not inherit
fastopen_req from parent")
Initial report from Pray3r, completed by Andrey one.
Thanks a lot to them !
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pray3r <pray3r.z@gmail.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13bf9fbff0 upstream.
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db44bac41b upstream.
Use a couple shortcuts that will simplify a following bugfix.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: in nfs3svc_decode_writeargs(), dlen doesn't include
tail]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e6838a29ec upstream.
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef0579b64e upstream.
The ahash API modifies the request's callback function in order
to clean up after itself in some corner cases (unaligned final
and missing finup).
When the request is complete ahash will restore the original
callback and everything is fine. However, when the request gets
an EBUSY on a full queue, an EINPROGRESS callback is made while
the request is still ongoing.
In this case the ahash API will incorrectly call its own callback.
This patch fixes the problem by creating a temporary request
object on the stack which is used to relay EINPROGRESS back to
the original completion function.
This patch also adds code to preserve the original flags value.
Fixes: ab6bf4e5e5 ("crypto: hash - Fix the pointer voodoo in...")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d4a7a0fbe9 upstream.
The ahash_def_finup() can make use of the request save/restore functions,
thus make it so. This simplifies the code a little and unifies the code
paths.
Note that the same remark about free()ing the req->priv applies here, the
req->priv can only be free()'d after the original request was restored.
Finally, squash a bug in the invocation of completion in the ASYNC path.
In both ahash_def_finup_done{1,2}, the function areq->base.complete(X, err);
was called with X=areq->base.data . This is incorrect , as X=&areq->base
is the correct value. By analysis of the data structures, we see the areq is
of type 'struct ahash_request' , areq->base is of type 'struct crypto_async_request'
and areq->base.completion is of type crypto_completion_t, which is defined in
include/linux/crypto.h as:
typedef void (*crypto_completion_t)(struct crypto_async_request *req, int err);
This is one lead that the X should be &areq->base . Next up, we can inspect
other code which calls the completion callback to give us kind-of statistical
idea of how this callback is used. We can try:
$ git grep base\.complete\( drivers/crypto/
Finally, by inspecting ahash_request_set_callback() implementation defined
in include/crypto/hash.h , we observe that the .data entry of 'struct
crypto_async_request' is intended for arbitrary data, not for completion
argument.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ab6bf4e5e5 upstream.
Add documentation for the pointer voodoo that is happening in crypto/ahash.c
in ahash_op_unaligned(). This code is quite confusing, so add a beefy chunk
of documentation.
Moreover, make sure the mangled request is completely restored after finishing
this unaligned operation. This means restoring all of .result, .base.data
and .base.complete .
Also, remove the crypto_completion_t complete = ... line present in the
ahash_op_unaligned_done() function. This type actually declares a function
pointer, which is very confusing.
Finally, yet very important nonetheless, make sure the req->priv is free()'d
only after the original request is restored in ahash_op_unaligned_done().
The req->priv data must not be free()'d before that in ahash_op_unaligned_finish(),
since we would be accessing previously free()'d data in ahash_op_unaligned_done()
and cause corruption.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d9a394b97 upstream.
When finishing the ahash request, the ahash_op_unaligned_done() will
call complete() on the request. Yet, this will not call the correct
complete callback. The correct complete callback was previously stored
in the requests' private data, as seen in ahash_op_unaligned(). This
patch restores the correct complete callback and .data field of the
request before calling complete() on it.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cf01fb9985 upstream.
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9f838d104 upstream.
This fixes CVE-2017-7472.
Running the following program as an unprivileged user exhausts kernel
memory by leaking thread keyrings:
#include <keyutils.h>
int main()
{
for (;;)
keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
}
Fix it by only creating a new thread keyring if there wasn't one before.
To make things more consistent, make install_thread_keyring_to_cred()
and install_process_keyring_to_cred() both return 0 if the corresponding
keyring is already present.
Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bcc5364bdc upstream.
When calculating po->tp_hdrlen + po->tp_reserve the result can overflow.
Fix by checking that tp_reserve <= INT_MAX on assign.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f8d28e4d6 upstream.
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow.
Add a check that tp_block_size * tp_block_nr <= UINT_MAX.
Since frames_per_block <= tp_block_size, the expression would
never overflow.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2b6867c2ce upstream.
Subtracting tp_sizeof_priv from tp_block_size and casting to int
to check whether one is less then the other doesn't always work
(both of them are unsigned ints).
Compare them as is instead.
Also cast tp_sizeof_priv to u64 before using BLK_PLUS_PRIV, as
it can overflow inside BLK_PLUS_PRIV otherwise.
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dc808110bb upstream.
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.
This is not generally the case, even if most existing tools do it right.
This patch clamps too long frames as API permits, and issue a one time
error on syslog.
[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7e11f9956 upstream.
In vmw_surface_define_ioctl(), the 'num_sizes' is the sum of the
'req->mip_levels' array. This array can be assigned any value from
the user space. As both the 'num_sizes' and the array is uint32_t,
it is easy to make 'num_sizes' overflow. The later 'mip_levels' is
used as the loop count. This can lead an oob write. Add the check of
'req->mip_levels' to avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36274ab8c5 upstream.
Before memory allocations vmw_surface_define_ioctl() checks the
upper-bounds of a user-supplied size, but does not check if the
supplied size is 0.
Add check to avoid NULL pointer dereferences.
Signed-off-by: Murray McAllister <murray.mcallister@insomniasec.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f843ee6dd0 upstream.
Kees Cook has pointed out that xfrm_replay_state_esn_len() is subject to
wrapping issues. To ensure we are correctly ensuring that the two ESN
structures are the same size compare both the overall size as reported
by xfrm_replay_state_esn_len() and the internal length are the same.
CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 677e806da4 upstream.
When a new xfrm state is created during an XFRM_MSG_NEWSA call we
validate the user supplied replay_esn to ensure that the size is valid
and to ensure that the replay_window size is within the allocated
buffer. However later it is possible to update this replay_esn via a
XFRM_MSG_NEWAE call. There we again validate the size of the supplied
buffer matches the existing state and if so inject the contents. We do
not at this point check that the replay_window is within the allocated
memory. This leads to out-of-bounds reads and writes triggered by
netlink packets. This leads to memory corruption and the potential for
priviledge escalation.
We already attempt to validate the incoming replay information in
xfrm_new_ae() via xfrm_replay_verify_len(). This confirms that the user
is not trying to change the size of the replay state buffer which
includes the replay_esn. It however does not check the replay_window
remains within that buffer. Add validation of the contained
replay_window.
CVE-2017-7184
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee8f844e3c upstream.
This fixes CVE-2016-9604.
Keyrings whose name begin with a '.' are special internal keyrings and so
userspace isn't allowed to create keyrings by this name to prevent
shadowing. However, the patch that added the guard didn't fix
KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings,
it can also subscribe to them as a session keyring if they grant SEARCH
permission to the user.
This, for example, allows a root process to set .builtin_trusted_keys as
its session keyring, at which point it has full access because now the
possessor permissions are added. This permits root to add extra public
keys, thereby bypassing module verification.
This also affects kexec and IMA.
This can be tested by (as root):
keyctl session .builtin_trusted_keys
keyctl add user a a @s
keyctl list @s
which on my test box gives me:
2 keys in keyring:
180010936: ---lswrv 0 0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
801382539: --alswrv 0 0 user: a
Fix this by rejecting names beginning with a '.' in the keyctl.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
cc: linux-ima-devel@lists.sourceforge.net
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 54e2c2c1a9 upstream.
Reinstate the generation of EPERM for a key type name beginning with a '.' in
a userspace call. Types whose name begins with a '.' are internal only.
The test was removed by:
commit a4e3b8d79a
Author: Mimi Zohar <zohar@linux.vnet.ibm.com>
Date: Thu May 22 14:02:23 2014 -0400
Subject: KEYS: special dot prefixed keyring name bug fix
I think we want to keep the restriction on type name so that userspace can't
add keys of a special internal type.
Note that removal of the test causes several of the tests in the keyutils
testsuite to fail.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4e3b8d79a upstream.
Dot prefixed keyring names are supposed to be reserved for the
kernel, but add_key() calls key_get_type_from_user(), which
incorrectly verifies the 'type' field, not the 'description' field.
This patch verifies the 'description' field isn't dot prefixed,
when creating a new keyring, and removes the dot prefix test in
key_get_type_from_user().
Changelog v6:
- whitespace and other cleanup
Changelog v5:
- Only prevent userspace from creating a dot prefixed keyring, not
regular keys - Dmitry
Reported-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b7321e81fc upstream.
Make sure to check for the required interrupt-in endpoint to avoid
dereferencing a NULL-pointer should a malicious device lack such an
endpoint.
Note that a fairly recent change purported to fix this issue, but added
an insufficient test on the number of endpoints only, a test which can
now be removed.
Fixes: 4ec0ef3a82 ("USB: iowarrior: fix oops with malicious USB descriptors")
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3ac9f7376 upstream.
The sequencer FIFO management has a bug that may lead to a corruption
(shortage) of the cell linked list. When a sequencer client faces an
error at the event delivery, it tries to put back the dequeued cell.
When the first queue was put back, this forgot the tail pointer
tracking, and the link will be screwed up.
Although there is no memory corruption, the sequencer client may stall
forever at exit while flushing the pending FIFO cells in
snd_seq_pool_done(), as spotted by syzkaller.
This patch addresses the missing tail pointer tracking at
snd_seq_fifo_cell_putback(). Also the patch makes sure to clear the
cell->enxt pointer at snd_seq_fifo_event_in() for avoiding a similar
mess-up of the FIFO linked list.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15c75b09f8 upstream.
Currently ctxfi driver tries to set only the 64bit DMA mask on 64bit
architectures, and bails out if it fails. This causes a problem on
some platforms since the 64bit DMA isn't always guaranteed. We should
fall back to the default 32bit DMA when 64bit DMA fails.
Fixes: 6d74b86d3c ("ALSA: ctxfi - Allow 64bit DMA")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2:
- Old code was using PCI DMA mask functions
- Deleted error message was different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71321eb3f2 upstream.
When a user sets a too small ticks with a fine-grained timer like
hrtimer, the kernel tries to fire up the timer irq too frequently.
This may lead to the condensed locks, eventually the kernel spinlock
lockup with warnings.
For avoiding such a situation, we define a lower limit of the
resolution, namely 1ms. When the user passes a too small tick value
that results in less than that, the kernel returns -EINVAL now.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit edb9d1bff4 upstream.
When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.
Following is example with GACT action:
% sudo modprobe act_gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 1
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....
After the fix:
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module Size Used by
act_gact 16384 3
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
% sudo rmmod act_gact
% lsmod
Module Size Used by
%
Fixes: f97017cdef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed92d8c137 upstream.
We're not taking into account that the space needed for the (variable
length) attr bitmap, with the result that we'd sometimes get a spurious
ERANGE when the ACL data got close to the end of a page.
Just add in an extra page to make sure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 21f498c2f7 upstream.
Ensure that the user supplied buffer size doesn't cause us to overflow
the 'pages' array.
Also fix up some confusion between the use of PAGE_SIZE and
PAGE_CACHE_SIZE when calculating buffer sizes. We're not using
the page cache for anything here.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c421530bf8 upstream.
The driver currently checks the SELF_TEST_FAILED first and then
KERNEL_PANIC next. Under error conditions(boot code failure) both
SELF_TEST_FAILED and KERNEL_PANIC can be set at the same time.
The driver has the capability to reset the controller on an KERNEL_PANIC,
but not on SELF_TEST_FAILED.
Fixed by first checking KERNEL_PANIC and then the others.
Fixes: e8b12f0fb8 ([SCSI] aacraid: Add new code for PMC-Sierra's SRC base controller family)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1bff5abca6 upstream.
aac_fib_map_free frees misaligned fib dma memory, additionally it does not
free up the whole memory.
Fixed by changing the code to free up the correct and full memory
allocation.
Fixes: e8b12f0fb8 ([SCSI] aacraid: Add new code for PMC-Sierra's SRC based controller family)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: David Carroll <David.Carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: s/max_cmd_size/max_fib_size/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ec7cb62d18 upstream.
DCCP doesn't purge timewait sockets on network namespace shutdown.
So, after net namespace destroyed we could still have an active timer
which will trigger use after free in tw_timer_handler():
BUG: KASAN: use-after-free in tw_timer_handler+0x4a/0xa0 at addr ffff88010e0d1e10
Read of size 8 by task swapper/1/0
Call Trace:
__asan_load8+0x54/0x90
tw_timer_handler+0x4a/0xa0
call_timer_fn+0x127/0x480
expire_timers+0x1db/0x2e0
run_timer_softirq+0x12f/0x2a0
__do_softirq+0x105/0x5b4
irq_exit+0xdd/0xf0
smp_apic_timer_interrupt+0x57/0x70
apic_timer_interrupt+0x90/0xa0
Object at ffff88010e0d1bc0, in cache net_namespace size: 6848
Allocated:
save_stack_trace+0x1b/0x20
kasan_kmalloc+0xee/0x180
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x134/0x310
copy_net_ns+0x8d/0x280
create_new_namespaces+0x23f/0x340
unshare_nsproxy_namespaces+0x75/0xf0
SyS_unshare+0x299/0x4f0
entry_SYSCALL_64_fastpath+0x18/0xad
Freed:
save_stack_trace+0x1b/0x20
kasan_slab_free+0xae/0x180
kmem_cache_free+0xb4/0x350
net_drop_ns+0x3f/0x50
cleanup_net+0x3df/0x450
process_one_work+0x419/0xbb0
worker_thread+0x92/0x850
kthread+0x192/0x1e0
ret_from_fork+0x2e/0x40
Add .exit_batch hook to dccp_v4_ops()/dccp_v6_ops() which will purge
timewait sockets on net namespace destruction and prevent above issue.
Fixes: f2bf415cfe ("mib: add net to NET_ADD_STATS_BH")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: pass twdr parameter to inet_twsk_purge()
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e38bea99a upstream.
fuse_file_put() was missing the "force" flag for the RELEASE request when
sending synchronously (fuseblk).
If this flag is not set, then a sync request may be interrupted before it
is dequeued by the userspace filesystem. In this case the OPEN won't be
balanced with a RELEASE.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 5a18ec176c ("fuse: fix hang of single threaded fuseblk filesystem")
[bwh: Backported to 3.2:
- "force" flag is a bitfield
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 783112f740 upstream.
Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributes to set to set various file attributes including the
file size and the uid/gid.
The Linux syscalls never mix size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates don't update random other attributes, and many other file
systems handle the case but do not update the other attributes in the
same transaction. NFSD on the other hand passes the attributes it gets
on the wire more or less directly through to the VFS, leading to updates
the file systems don't expect. XFS at least has an assert on the
allowed attributes, which caught an unusual NFS client setting the size
and group at the same time.
To handle this issue properly this splits the notify_change call in
nfsd_setattr into two separate ones.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2:
- notify_change() doesn't take a struct inode ** parameter
- Move call to nfsd_break_lease() up along with fh_lock()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0c63124a6 upstream.
This fixes a failure in xfstests generic/313 because nfs doesn't update
mtime on a truncate. The protocol requires this to be done implicity
for a size changing setattr.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b617649468 upstream.
One of the last remaining failures in kernelci.org is for a gcc bug:
drivers/net/ethernet/qlogic/qlge/qlge_main.c:4819:1: error: insn does not satisfy its constraints:
drivers/net/ethernet/qlogic/qlge/qlge_main.c:4819:1: internal compiler error: in extract_constrain_insn, at recog.c:2190
This is apparently broken in gcc-6 but fixed in gcc-7, and I cannot
reproduce the problem here. However, it is clear that ip27_defconfig
does not actually need this driver as the platform has only PCI-X but
not PCIe, and the qlge adapter in turn is PCIe-only.
The driver was originally enabled in 2010 along with lots of other
drivers.
Fixes: 59d302b342 ("MIPS: IP27: Make defconfig useful again.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15197/
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 884b426917 upstream.
If copy_from_user is called with a large buffer (>= 128 bytes) and the
userspace buffer refers partially to unreadable memory, then it is
possible for Octeon's copy_from_user to report the wrong number of bytes
have been copied. In the case where the buffer size is an exact multiple
of 128 and the fault occurs in the last 64 bytes, copy_from_user will
report that all the bytes were copied successfully but leave some
garbage in the destination buffer.
The bug is in the main __copy_user_common loop in octeon-memcpy.S where
in the middle of the loop, src and dst are incremented by 128 bytes. The
l_exc_copy fault handler is used after this but that assumes that
"src < THREAD_BUADDR($28)". This is not the case if src has already been
incremented.
Fix by adding an extra fault handler which rewinds the src and dst
pointers 128 bytes before falling though to l_exc_copy.
Thanks to the pwritev test from the strace test suite for originally
highlighting this bug!
Fixes: 5b3b16880f ("MIPS: Add Cavium OCTEON processor support ...")
Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Acked-by: David Daney <david.daney@cavium.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14978/
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66fd848cad upstream.
For certain arguments such as saddr = 0xc0a8fd60, daddr = 0xc0a8fda1,
len = 80, proto = 17, sum = 0x7eae049d there will be a carry when
folding the intermediate 64 bit checksum to 32 bit but the code doesn't
add the carry back to the one's complement sum, thus an incorrect result
will be generated.
Reported-by: Mark Zhang <bomb.zhang@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c21a493a2b upstream.
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 251af29c32 upstream.
It is not sufficient to just check that the lock pids match when
granting a callback, we also need to ensure that we're granting
the callback on the right file.
Reported-by: Pankaj Singh <psingh.ait@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2: open-code file_inode()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9356863c94 upstream.
Commit: cbd1998377 ("md: Fix unfortunate interaction with evms")
change mddev_put() so that it would not destroy an md device while
->ctime was non-zero.
Unfortunately, we didn't make sure to clear ->ctime when unloading
the module, so it is possible for an md device to remain after
module unload. An attempt to open such a device will trigger
an invalid memory reference in:
get_gendisk -> kobj_lookup -> exact_lock -> get_disk
when tring to access disk->fops, which was in the module that has
been removed.
So ensure we clear ->ctime in md_exit(), and explain how that is useful,
as it isn't immediately obvious when looking at the code.
Fixes: cbd1998377 ("md: Fix unfortunate interaction with evms")
Tested-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 03a9e24ef2 upstream.
Recently I receive a bug report that on Linux v3.0 based kerenl, hot add
disk to a md linear device causes kernel crash at linear_congested(). From
the crash image analysis, I find in linear_congested(), mddev->raid_disks
contains value N, but conf->disks[] only has N-1 pointers available. Then
a NULL pointer deference crashes the kernel.
There is a race between linear_add() and linear_congested(), RCU stuffs
used in these two functions cannot avoid the race. Since Linuv v4.0
RCU code is replaced by introducing mddev_suspend(). After checking the
upstream code, it seems linear_congested() is not called in
generic_make_request() code patch, so mddev_suspend() cannot provent it
from being called. The possible race still exists.
Here I explain how the race still exists in current code. For a machine
has many CPUs, on one CPU, linear_add() is called to add a hard disk to a
md linear device; at the same time on other CPU, linear_congested() is
called to detect whether this md linear device is congested before issuing
an I/O request onto it.
Now I use a possible code execution time sequence to demo how the possible
race happens,
seq linear_add() linear_congested()
0 conf=mddev->private
1 oldconf=mddev->private
2 mddev->raid_disks++
3 for (i=0; i<mddev->raid_disks;i++)
4 bdev_get_queue(conf->disks[i].rdev->bdev)
5 mddev->private=newconf
In linear_add() mddev->raid_disks is increased in time seq 2, and on
another CPU in linear_congested() the for-loop iterates conf->disks[i] by
the increased mddev->raid_disks in time seq 3,4. But conf with one more
element (which is a pointer to struct dev_info type) to conf->disks[] is
not updated yet, accessing its structure member in time seq 4 will cause a
NULL pointer deference fault.
To fix this race, there are 2 parts of modification in the patch,
1) Add 'int raid_disks' in struct linear_conf, as a copy of
mddev->raid_disks. It is initialized in linear_conf(), always being
consistent with pointers number of 'struct dev_info disks[]'. When
iterating conf->disks[] in linear_congested(), use conf->raid_disks to
replace mddev->raid_disks in the for-loop, then NULL pointer deference
will not happen again.
2) RCU stuffs are back again, and use kfree_rcu() in linear_add() to
free oldconf memory. Because oldconf may be referenced as mddev->private
in linear_congested(), kfree_rcu() makes sure that its memory will not
be released until no one uses it any more.
Also some code comments are added in this patch, to make this modification
to be easier understandable.
This patch can be applied for kernels since v4.0 after commit:
3be260cc18 ("md/linear: remove rcu protections in favour of
suspend/resume"). But this bug is reported on Linux v3.0 based kernel, for
people who maintain kernels before Linux v4.0, they need to do some back
back port to this patch.
Changelog:
- V3: add 'int raid_disks' in struct linear_conf, and use kfree_rcu() to
replace rcu_call() in linear_add().
- v2: add RCU stuffs by suggestion from Shaohua and Neil.
- v1: initial effort.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
[bwh: Backported to 3.2: no need to restore RCU protections]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a53210f56d upstream.
Fixes: a45c6cb816 ("[ARM] 5369/1: omap mmc: Add new omap
hsmmc controller for 2430 and 34xx, v3")
when using really large timeout (up to 4*60*1000 ms for bkops)
there is a possibility of data overflow using
unsigned int so use 64 bit unsigned long long.
Signed-off-by: Ravikumar Kattekola <rk@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[bwh: Backported to 3.2:
- Drop change in omap_hsmmc_prepare_data()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57cb17e764 upstream.
This function has two callers and neither are able to handle a NULL
return. Really, -EINVAL is the correct thing return here anyway. This
fixes some static checker warnings like:
security/keys/encrypted-keys/encrypted.c:709 encrypted_key_decrypt()
error: uninitialized symbol 'master_key'.
Fixes: 7e70cb4978 ("keys: add new key-type encrypted")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5182c2cf2a upstream.
Fix another NULL-pointer dereference at open should a malicious device
lack an interrupt-in endpoint.
Note that the driver has a broken check for an interrupt-in endpoint
which means that an interrupt URB has never even been submitted.
Fixes: 3f5429746d ("USB: Moschip 7840 USB-Serial Driver")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6bb1e17a3 upstream.
FTDI devices use a receive latency timer to periodically empty the
receive buffer and report modem and line status (also when the buffer is
empty).
When a break or error condition is detected the corresponding status
flags will be set on a packet with nonzero data payload and the flags
are not updated until the break is over or further characters are
received.
In order to avoid over-reporting break and error conditions, these flags
must therefore only be processed for packets with payload.
This specifically fixes the case where after an overrun, the error
condition is continuously reported and NULL-characters inserted until
further data is received.
Reported-by: Michael Walle <michael@walle.cc>
Fixes: 72fda3ca6f ("USB: serial: ftd_sio: implement sysrq handling on
break")
Fixes: 166ceb6907 ("USB: ftdi_sio: clean up line-status handling")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c9c858e2f upstream.
The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta
Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports,
respectively. The first 4 ports are implemented by an OX16PCI954 chip,
and the second 4 ports are implemented by an OX16C954 chip on a local
bus, bridged by the second PCI function of the OX16PCI954. The ports
are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a
non-standard oscillator frequency of 20 MHz (base_baud = 1250000).
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97abd7d4b5 upstream.
If the journal is aborted, the needs_recovery feature flag should not
be removed. Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e112666b49 upstream.
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified. And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1e4a382fdc upstream.
For devices with multiple input queues, tiqdio_call_inq_handlers()
iterates over all input queues and clears the device's DSCI
during each iteration. If the DSCI is re-armed during one
of the later iterations, we therefore do not scan the previous
queues again.
The re-arming also raises a new adapter interrupt. But its
handler does not trigger a rescan for the device, as the DSCI
has already been erroneously cleared.
This can result in queue stalls on devices with multiple
input queues.
Fix it by clearing the DSCI just once, prior to scanning the queues.
As the code is moved in front of the loop, we also need to access
the DSCI directly (ie irq->dsci) instead of going via each queue's
parent pointer to the same irq. This is not a functional change,
and a follow-up patch will clean up the other users.
In practice, this bug only affects CQ-enabled HiperSockets devices,
ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is
needed for AF_IUCV socket applications that use HiperSockets
communication.
Fixes: 104ea556ee ("qdio: support asynchronous delivery of storage blocks")
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c528fcb116 upstream.
Make sure to check for short transfers before parsing the receive buffer
to avoid acting on stale data.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Keep the check for !tty in the data case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b0aed2b16 upstream.
Make sure the received data has the required headers before parsing it.
Also drop the redundant urb-status check, which has already been handled
by the caller.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f6821462f upstream.
A recent change claimed to fix an off-by-one error in the OOB-port
completion handler, but instead introduced such an error. This could
specifically led to modem-status changes going unnoticed, effectively
breaking TIOCMGET.
Note that the offending commit fixes a loop-condition underflow and is
marked for stable, but should not be backported without this fix.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2d38088921 ("USB: serial: digi_acceleport: fix OOB data sanity check")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d38088921 upstream.
Make sure to check for short transfers to avoid underflow in a loop
condition when parsing the receive buffer.
Also fix an off-by-one error in the incomplete sanity check which could
lead to invalid data being parsed.
Fixes: 8c209e6782 ("USB: make actual_length in struct urb field u32")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3bb3cddd1 upstream.
Fix dm1105 build error when CONFIG_I2C_ALGOBIT=m and
CONFIG_DVB_DM1105=y.
drivers/built-in.o: In function `dm1105_probe':
dm1105.c:(.text+0x2836e7): undefined reference to `i2c_bit_add_bus'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b136499e9 upstream.
ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b90197b655 upstream.
If there is a error while copying data from userspace into the page
cache during a write(2) system call, in data=journal mode, in
ext4_journalled_write_end() were using page_zero_new_buffers() from
fs/buffer.c. Unfortunately, this sets the buffer dirty flag, which is
no good if journalling is enabled. This is a long-standing bug that
goes back for years and years in ext3, but a combination of (a)
data=journal not being very common, (b) in many case it only results
in a warning message. and (c) only very rarely causes the kernel hang,
means that we only really noticed this as a problem when commit
998ef75ddb caused this failure to happen frequently enough to cause
generic/208 to fail when run in data=journal mode.
The fix is to have our own version of this function that doesn't call
mark_dirty_buffer(), since we will end up calling
ext4_handle_dirty_metadata() on the buffer head(s) in questions very
shortly afterwards in ext4_journalled_write_end().
Thanks to Dave Hansen and Linus Torvalds for helping to identify the
root cause of the problem.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd648b8a8f upstream.
If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a882f5de40 upstream.
The vfct table can contain multiple vbios images if the
platform contains multiple GPUs. Noticed by netkas on
phoronix forums. This patch fixes those platforms.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 239ac65fa5 upstream.
The current caching state may not be tt_cached, even though the
placement contains TTM_PL_FLAG_CACHED, because placement can contain
multiple caching flags. Trying to swap out such a BO would trip up the
BUG_ON(ttm->caching_state != tt_cached);
in ttm_tt_swapout.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c6dce26266 upstream.
Since commit 557aaa7ffa ("ft232: support the ASYNC_LOW_LATENCY
flag") the FTDI driver has been using a receive latency-timer value of
1 ms instead of the device default of 16 ms.
The latency timer is used to periodically empty a non-full receive
buffer, but a status header is always sent when the timer expires
including when the buffer is empty. This means that a two-byte bulk
message is received every millisecond also for an otherwise idle port as
long as it is open.
Let's restore the pre-2009 behaviour which reduces the rate of the
status messages to 1/16th (e.g. interrupt frequency drops from 1 kHz to
62.5 Hz) by not setting ASYNC_LOW_LATENCY by default.
Anyone willing to pay the price for the minimum-latency behaviour should
set the flag explicitly instead using the TIOCSSERIAL ioctl or a tool
such as setserial (e.g. setserial /dev/ttyUSB0 low_latency).
Note that since commit 0cbd81a9f6 ("USB: ftdi_sio: remove
tty->low_latency") the ASYNC_LOW_LATENCY flag has no other effects but
to set a minimal latency timer.
Reported-by: Antoine Aubert <a.aubert@overkiz.com>
Fixes: 557aaa7ffa ("ft232: support the ASYNC_LOW_LATENCY flag")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 33e4c1a998 upstream.
As IN request has to be allocated in set_alt() and released in
disable() we cannot use mutex to protect it as we cannot sleep
in those funcitons. Let's replace this mutex with a spinlock.
Tested-by: David Lechner <david@lechnology.com>
Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ffb80fc672 upstream.
At least macOS seems to be sending
ClearFeature(ENDPOINT_HALT) to endpoints which
aren't Halted. This makes DWC3's CLEARSTALL command
time out which causes several issues for the driver.
Instead, let's just return 0 and bail out early.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6183468a23 upstream.
Similar to commit fcd2042e8d ("mwifiex: printk() overflow with 32-byte
SSIDs"), we failed to account for the existence of 32-char SSIDs in our
debugfs code. Unlike in that case though, we zeroed out the containing
struct first, and I'm pretty sure we're guaranteed to have some padding
after the 'ssid.ssid' and 'ssid.ssid_len' fields (the struct is 33 bytes
long).
So, this is the difference between:
# cat /sys/kernel/debug/mwifiex/mlan0/info
...
essid="0123456789abcdef0123456789abcdef "
...
and the correct output:
# cat /sys/kernel/debug/mwifiex/mlan0/info
...
essid="0123456789abcdef0123456789abcdef"
...
Fixes: 5e6e3a92b9 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]g
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e01700602 upstream.
gcc-7 detects that wlanhdr_to_ethhdr() in two drivers calls memcpy() with
a destination argument that an earlier function call may have set to NULL:
staging/rtl8188eu/core/rtw_recv.c: In function 'wlanhdr_to_ethhdr':
staging/rtl8188eu/core/rtw_recv.c:1318:2: warning: argument 1 null where non-null expected [-Wnonnull]
staging/rtl8712/rtl871x_recv.c: In function 'r8712_wlanhdr_to_ethhdr':
staging/rtl8712/rtl871x_recv.c:649:2: warning: argument 1 null where non-null expected [-Wnonnull]
I'm fixing this by adding a NULL pointer check and returning failure
from the function, which is hopefully already handled properly.
This seems to date back to when the drivers were originally added,
so backporting the fix to stable seems appropriate. There are other
related realtek drivers in the kernel, but none of them contain a
function with a similar name or produce this warning.
Fixes: 1cc18a22b9 ("staging: r8188eu: Add files for new driver - part 5")
Fixes: 2865d42c78 ("staging: r8712u: Add the new driver to the mainline kernel")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: drop changes to r8188eu]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 39712e8bfa upstream.
Make sure to detect and return an error on zero-length control-message
transfers when reading from the device.
This addresses a potential failure to detect an empty transmit buffer
during close.
Also remove a redundant check for short transfer when sending a command.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1eac5c244f upstream.
Make sure to detect short control-message transfers rather than continue
with zero-initialised data when retrieving modem status and during
device initialisation.
Fixes: 52af954599 ("USB: add USB serial ssu100 driver")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36356a669e upstream.
Make sure to detect short control-message transfers so that errors are
logged when reading the modem status at open.
Note that while this also avoids initialising the modem status using
uninitialised heap data, these bits could not leak to user space as they
are currently not used.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c0e25d883 upstream.
Make sure to detect short control-message transfers and log an error
when reading incomplete manufacturer and boot descriptors.
Note that the default all-zero descriptors will now be used after a
short transfer is detected instead of partially initialised ones.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4457d9798 upstream.
Use a dedicated buffer for the DMA transfer and make sure to detect
short transfers to avoid parsing a corrupt descriptor.
Fixes: 6e8cf7751f ("USB: add EPIC support to the io_edgeport driver")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3e574ad85 upstream.
Make sure to detect short responses when reading the latency timer to
avoid using stale buffer data.
Note that no heap data would currently leak through sysfs as
ASYNC_LOW_LATENCY is set by default.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 427c3a95e3 upstream.
Make sure to detect short responses when fetching the modem status in
order to avoid parsing uninitialised buffer data and having bits of it
leak to user space.
Note that we still allow for short 1-byte responses.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b631433b17 upstream.
Fix open error handling which failed to detect errors when reading the
MSR and LSR registers, something which could lead to the shadow
registers being initialised from errnos.
Note that calling the generic close implementation is sufficient in the
error paths as the interrupt urb has not yet been submitted and the
register updates have not been made.
Fixes: f4c1e8d597 ("USB: ark3116: Make existing functions 16450-aware
and add close and release functions.")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9fef37d7cf upstream.
The current implementation failed to detect short transfers, something
which could lead to bits of the uninitialised heap transfer buffer
leaking to user space.
Fixes: 149fc791a4 ("USB: ark3116: Setup some basic infrastructure for
new ark3116 driver.")
Fixes: f4c1e8d597 ("USB: ark3116: Make existing functions 16450-aware
and add close and release functions.")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0467a967f upstream.
The modem-status register was read as part of device configuration at
port_probe and then again at open (and reset-resume). During open (and
reset-resume) the MSR was read before submitting the interrupt URB,
something which could lead to an MSR-change going unnoticed when it
races with open (reset-resume).
Fix this by dropping the redundant reconfiguration of the port at every
open, and only read the MSR after the interrupt URB has been submitted.
Fixes: 664d5df92e ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Keep the 'serial' variable in ch341_open()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27d41d29c7 upstream.
Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.
Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80b5b35aba upstream.
When changing the connection mode, the ipoib_set_mode function
did not check if the previous connection mode equals to the
new one. This commit adds the required check and return 0 if the new
mode equals to the previous one.
Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2:
- Adjust filename
- Unlock RTNL lock before returning]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55efcfcd77 upstream.
The RDMA core uses ib_pack() to convert from unpacked CPU structs
to on-the-wire bitpacked structs.
This process requires that 1 bit fields are declared as u8 in the
unpacked struct, otherwise the packing process does not read the
value properly and the packed result is wired to 0. Several
places wrongly used int.
Crucially this means the kernel has never, set reversible
correctly in the path record request. It has always asked for
irreversible paths even if the ULP requests otherwise.
When the kernel is used with a SM that supports this feature, it
completely breaks communication management if reversible paths are
not properly requested.
The only reason this ever worked is because opensm ignores the
reversible bit.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit abe81f3b8e upstream.
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/tty/serial/msm_serial.ko | grep alias
$
After this patch:
$ modinfo drivers/tty/serial/msm_serial.ko | grep alias
alias: of:N*T*Cqcom,msm-uartdmC*
alias: of:N*T*Cqcom,msm-uartdm
alias: of:N*T*Cqcom,msm-uartC*
alias: of:N*T*Cqcom,msm-uart
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c705a6b3aa upstream.
We accidentally return success when adm8211_alloc_rings() fails but we
should preserve the error code.
Fixes: cc0b88cf5e ("[PATCH] Add adm8211 802.11b wireless driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The "dead" key type has no match operation, and a search for keys of
this type can cause a null dereference in keyring_search_aux().
keyring_search() has a check for this, but request_keyring_and_link()
does not. Move the check into keyring_search_aux(), covering both of
them.
This was fixed upstream by commit c06cfb08b8 ("KEYS: Remove
key_type::match in favour of overriding default by match_preparse"),
part of a series of large changes that are not suitable for
backporting.
CVE-2017-2647 / CVE-2017-6951
Reported-by: Igor Redko <redkoi@virtuozzo.com>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2017-2647
Reported-by: idl3r <idler1984@gmail.com>
References: https://www.spinics.net/lists/keyrings/msg01845.html
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: David Howells <dhowells@redhat.com>
This is a stable follow up fix for an incorrect backport. The issue is
not present in the upstream kernel.
Miroslav has noticed the following splat when testing my 3.2 forward
port of 8310d48b12 ("mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for
thp") to 3.12:
BUG: Bad page state in process a.out pfn:26400
page:ffffea000085e000 count:0 mapcount:1 mapping: (null) index:0x7f049d600
page flags: 0x1fffff80108018(uptodate|dirty|head|swapbacked)
page dumped because: nonzero mapcount
[iii]
CPU: 2 PID: 5926 Comm: a.out Tainted: G E 3.12.61-0-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffffffff81515830 ffffea000085e000 ffffffff81800ad7
ffffffff815118a5 ffffea000085e000 0000000000000000 000fffff80000000
ffffffff81140f18 fff000007c000000 ffffea000085e000 0000000000000009
Call Trace:
[<ffffffff8100475d>] dump_trace+0x7d/0x2d0
[<ffffffff81004a44>] show_stack_log_lvl+0x94/0x170
[<ffffffff81005ce1>] show_stack+0x21/0x50
[<ffffffff81515830>] dump_stack+0x5d/0x78
[<ffffffff815118a5>] bad_page.part.67+0xe8/0x102
[<ffffffff81140f18>] free_pages_prepare+0x198/0x1b0
[<ffffffff81141275>] __free_pages_ok+0x15/0xd0
[<ffffffff8116444c>] __access_remote_vm+0x7c/0x1e0
[<ffffffff81205afb>] mem_rw.isra.13+0x14b/0x1a0
[<ffffffff811a3b18>] vfs_write+0xb8/0x1e0
[<ffffffff811a469b>] SyS_pwrite64+0x6b/0xa0
[<ffffffff81523b49>] system_call_fastpath+0x16/0x1b
[<00007f049da18573>] 0x7f049da18572
The problem is that the original 3.2 backport didn't return NULL page on
the FOLL_COW page and so the page got reused.
Reported-and-tested-by: Miroslav Beneš <mbenes@suse.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Upstream commit 34b2cef20f
("ipv4: keep skb->dst around in presence of IP options") incorrectly
root caused commit d826eb14ec ("ipv4: PKTINFO doesnt need dst
reference") as bug origin.
This patch should fix the issue for 3.2.xx stable kernels, since IPv4
options seem to get more traction these days, after years of oblivion ;)
Fixes: f84af32cbc ("net: ip_queue_rcv_skb() helper"))
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Anarcheuz Fritz <anarcheuz@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 82f2341c94 upstream.
Currently N_HDLC line discipline uses a self-made singly linked list for
data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
an error.
The commit be10eb7589
("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
one data buffer to tx_free_buf_list twice. That causes double free in
n_hdlc_release().
Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
in case of tx error put current data buffer after the head of tx_buf_list.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9b736d88a upstream.
The class of 4 n_hdls buf locks is the same because a single function
n_hdlc_buf_list_init is used to init all the locks. But since
flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
emits a warning:
=============================================
[ INFO: possible recursive locking detected ]
4.3.0-25.g91e30a7-default #1 Not tainted
---------------------------------------------
a.out/1248 is trying to acquire lock:
(&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
but task is already holding lock:
(&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&list->spinlock)->rlock);
lock(&(&list->spinlock)->rlock);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by a.out/1248:
#0: (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
#1: (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
...
Call Trace:
...
[<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
[<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
[<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
[<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
...
Fix it by initializing the spin_locks separately. This removes also
reduntand memset of a freshly kzallocated space.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dfcb9f4f99 upstream.
commit 2dcab59848 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
attempted to avoid a BUG_ON call when the association being used for a
sendmsg() is blocked waiting for more sndbuf and another thread did a
peeloff operation on such asoc, moving it to another socket.
As Ben Hutchings noticed, then in such case it would return without
locking back the socket and would cause two unlocks in a row.
Further analysis also revealed that it could allow a double free if the
application managed to peeloff the asoc that is created during the
sendmsg call, because then sctp_sendmsg() would try to free the asoc
that was created only for that call.
This patch takes another approach. It will deny the peeloff operation
if there is a thread sleeping on the asoc, so this situation doesn't
exist anymore. This avoids the issues described above and also honors
the syscalls that are already being handled (it can be multiple sendmsg
calls).
Joint work with Xin Long.
Fixes: 2dcab59848 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2dcab59848 upstream.
Alexander Popov reported that an application may trigger a BUG_ON in
sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is
waiting on it to queue more data and meanwhile another thread peels off
the association being used by the first thread.
This patch replaces the BUG_ON call with a proper error handling. It
will return -EPIPE to the original sendmsg call, similarly to what would
have been done if the association wasn't found in the first place.
Acked-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95e91b831f upstream.
The issue is described here, with a nice testcase:
https://bugzilla.kernel.org/show_bug.cgi?id=192931
The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0. For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address. So by the time we do security_mmap_addr(0) things
get funky for shmat().
The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page. There are two possible fixes
to this. The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(... |MAP_FIXED). While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags. This makes the
behavior of shmat() identical to the mmap() case. The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.
Passes shm related ltp tests.
Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: use SHMLBA constant instead of shmlba parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c8bb163ae upstream.
In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.
Fixes: 24803f38a5 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e1d ("mld: do not remove mld souce list info when ...")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1666d49e1d upstream.
This is an IPv6 version of commit 24803f38a5 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.
Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().
Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Timer code moved around in ipv6_mc_down() is different
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 24803f38a5 upstream.
In commit 24cf3af3fe ("igmp: call ip_mc_clear_src..."), we forgot to remove
igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src().
This make us clear all IGMPv3 source filter info after NETDEV_DOWN.
Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need
ip_mc_clear_src() in ip_mc_destroy_dev().
On the other hand, we should restore back instead of free all source filter
info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source
filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE.
Fixes: 24cf3af3fe ("igmp: call ip_mc_clear_src() only when ...")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Use IGMP_Unsolicited_Report_Count instead of sysctl_igmp_qrv
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 837585a537 ]
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: BAckported to 3.2:
- Use ACCESS_ONCE() instead of READ_ONCE()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e1edab87fa ]
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Use ACCESS_ONCE() instead of READ_ONCE()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2eb783c43e upstream.
We set the flag TUN_PKT_STRIP if the user buffer provided is too
small to contain the entire packet plus meta-data. However, this
has been broken ever since we added GSO meta-data. VLAN acceleration
also has the same problem.
This patch fixes this by taking both into account when setting the
TUN_PKT_STRIP flag.
The fact that this has been broken for six years without anyone
realising means that nobody actually uses this flag.
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- No VLAN acceleration support
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 06425c308b ]
syszkaller fuzzer was able to trigger a divide by zero, when
TCP window scaling is not enabled.
SO_RCVBUF can be used not only to increase sk_rcvbuf, also
to decrease it below current receive buffers utilization.
If mss is negative or 0, just return a zero TCP window.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 63117f09c7 ]
Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.
Fixes: fbfa743a9d ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fbfa743a9d ]
This function suffers from multiple issues.
First one is that pskb_may_pull() may reallocate skb->head,
so the 'raw' pointer needs either to be reloaded or not used at all.
Second issue is that NEXTHDR_DEST handling does not validate
that the options are present in skb->data, so we might read
garbage or access non existent memory.
With help from Willem de Bruijn.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f1712c7371 ]
Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()
The main problem seems that the sockets themselves are not RCU
protected.
If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.
Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.
[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0
Call Trace:
<IRQ>
[<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
[<ffffffff81d55771>] sk_filter+0x41/0x210
[<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
[<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
[<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
[<ffffffff81f07af9>] can_receive+0xd9/0x120
[<ffffffff81f07beb>] can_rcv+0xab/0x100
[<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
[<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
[<ffffffff81d37f67>] process_backlog+0x127/0x280
[<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
[<ffffffff810c88d4>] __do_softirq+0x184/0x440
[<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
<EOI>
[<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
[<ffffffff810c8bed>] do_softirq+0x1d/0x20
[<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
[<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
[<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
[<ffffffff810e3baf>] process_one_work+0x24f/0x670
[<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
[<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
[<ffffffff810ebafc>] kthread+0x12c/0x150
[<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70
Reported-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e623a9e9de ]
Commit 34b88a68f2 ("net: Fix use after free in the recvmmsg exit path"),
changed the exit path of recvmmsg to always return the datagrams
variable and modified the error paths to set the variable to the error
code returned by recvmsg if necessary.
However in the case sock_error returned an error, the error code was
then ignored, and recvmmsg returned 0.
Change the error path of recvmmsg to correctly return the error code
of sock_error.
The bug was triggered by using recvmmsg on a CAN interface which was
not up. Linux 4.6 and later return 0 in this case while earlier
releases returned -ENETDOWN.
Fixes: 34b88a68f2 ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: Maxime Jayat <maxime.jayat@mobile-devices.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 03e4deff49 ]
Just like commit 4acd4945cd ("ipv6: addrconf: Avoid calling
netdevice notifiers with RCU read-side lock"), it is unnecessary
to make addrconf_disable_change() use RCU iteration over the
netdev list, since it already holds the RTNL lock, or we may meet
Illegal context switch in RCU read-side critical section.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7ababb7826 ]
5.2. Action on Reception of a Query
When a system receives a Query, it does not respond immediately.
Instead, it delays its response by a random amount of time, bounded
by the Max Resp Time value derived from the Max Resp Code in the
received Query message. A system may receive a variety of Queries on
different interfaces and of different kinds (e.g., General Queries,
Group-Specific Queries, and Group-and-Source-Specific Queries), each
of which may require its own delayed response.
Before scheduling a response to a Query, the system must first
consider previously scheduled pending responses and in many cases
schedule a combined response. Therefore, the system must be able to
maintain the following state:
o A timer per interface for scheduling responses to General Queries.
o A per-group and interface timer for scheduling responses to Group-
Specific and Group-and-Source-Specific Queries.
o A per-group and interface list of sources to be reported in the
response to a Group-and-Source-Specific Query.
When a new Query with the Router-Alert option arrives on an
interface, provided the system has state to report, a delay for a
response is randomly selected in the range (0, [Max Resp Time]) where
Max Resp Time is derived from Max Resp Code in the received Query
message. The following rules are then used to determine if a Report
needs to be scheduled and the type of Report to schedule. The rules
are considered in order and only the first matching rule is applied.
1. If there is a pending response to a previous General Query
scheduled sooner than the selected delay, no additional response
needs to be scheduled.
2. If the received Query is a General Query, the interface timer is
used to schedule a response to the General Query after the
selected delay. Any previously pending response to a General
Query is canceled.
--8<--
Currently the timer is rearmed with new random expiration time for
every incoming query regardless of possibly already pending report.
Which is not aligned with the above RFE.
It also might happen that higher rate of incoming queries can
postpone the report after the expiration time of the first query
causing group membership loss.
Now the per interface general query timer is rearmed only
when there is no pending report already scheduled on that interface or
the newly selected expiration time is before the already pending
scheduled report.
Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3b48ab2248 ]
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.
This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 4200462d88 ]
Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a50af86dd4 ]
Hyper-V (and Azure) support using NVGRE which requires some extra space
for encapsulation headers. Because of this the largest allowed TSO
packet is reduced.
For older releases, hard code a fixed reduced value. For next release,
there is a better solution which uses result of host offload
negotiation.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 648f0c28df ]
pskb_may_pull() can reallocate skb->head, we need to reload dh pointer
in dccp_invalid_packet() or risk use after free.
Bug found by Andrey Konovalov using syzkaller.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 95c2027bfe ]
Add a validation function to make sure offset is valid:
1. Not below skb head (could happen when offset is negative).
2. Validate both 'offset' and 'at'.
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 06ba3b2133 ]
The sky2 frequently crashes during machine shutdown with:
sky2_get_stats+0x60/0x3d8 [sky2]
dev_get_stats+0x68/0xd8
rtnl_fill_stats+0x54/0x140
rtnl_fill_ifinfo+0x46c/0xc68
rtmsg_ifinfo_build_skb+0x7c/0xf0
rtmsg_ifinfo.part.22+0x3c/0x70
rtmsg_ifinfo+0x50/0x5c
netdev_state_change+0x4c/0x58
linkwatch_do_dev+0x50/0x88
__linkwatch_run_queue+0x104/0x1a4
linkwatch_event+0x30/0x3c
process_one_work+0x140/0x3e0
worker_thread+0x60/0x44c
kthread+0xdc/0xf0
ret_from_fork+0x10/0x50
This is caused by the sky2 being called after it has been shutdown.
A previous thread about this can be found here:
https://lkml.org/lkml/2016/4/12/410
An alternative fix is to assure that IFF_UP gets cleared by
calling dev_close() during shutdown. This is similar to what the
bnx2/tg3/xgene and maybe others are doing to assure that the driver
isn't being called following _shutdown().
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b5c2d49544 ]
If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.
The issue is apprently there since at leat Linux-2.6.12-rc2.
Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3023898b7d ]
Do not send the next message in sendmmsg for partial sendmsg
invocations.
sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.
For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.
Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.
Fixes: 228e548e60 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: we don't have the iov_iter API, so make
___sys_sendmsg() calculate and write back the remaining length]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7233bc84a3 ]
sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.
Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.
Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.
This issue doesn't exist in sctp_sendmsg() path.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1aa9d1a0e7 ]
dccp_v6_err() does not use pskb_may_pull() and might access garbage.
We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: use offsetof() + sizeof() instead of
offsetofend()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6706a97fec ]
dccp_v4_err() does not use pskb_may_pull() and might access garbage.
We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmp_socket_deliver() are more than enough.
This patch might allow to process more ICMP messages, as some routers
are still limiting the size of reflected bytes to 28 (RFC 792), instead
of extended lengths (RFC 1812 4.3.2.3)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: use offsetof() + sizeof() instead of
offsetofend()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 4f2e4ad56a ]
Sending zero checksum is ok for TCP, but not for UDP.
UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.
Simply replace such checksum by 0xffff, regardless of transport.
This error was caught on SIT tunnels, but seems generic.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e551c32d57 ]
At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.
Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a4b8e71b05 ]
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
if (len < sizeof(int))
return -EINVAL;
On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.
Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.
Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.
If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
[<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
[<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
[<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
[<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
[<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
[<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
[<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
[<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2fe664f1fc ]
With TCP MTU probing enabled and offload TX checksumming disabled,
tcp_mtu_probe() calculated the wrong checksum when a fragment being copied
into the probe's SKB had an odd length. This was caused by the direct use
of skb_copy_and_csum_bits() to calculate the checksum, as it pads the
fragment being copied, if needed. When this fragment was not the last, a
subsequent call used the previous checksum without considering this
padding.
The effect was a stale connection in one way, as even retransmissions
wouldn't solve the problem, because the checksum was never recalculated for
the full SKB length.
Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 20c64d5cd5 ]
A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.
Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)
sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)
Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc
All TCP stack can be stuck because TCP is under memory pressure.
A simple fix is to preemptively reclaim from sk_mem_uncharge().
This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ffb4d6c850 ]
If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.
The flow then stalls and resets after timeouts.
Tested:
sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a24e04e4b upstream.
sk_mem_reclaim_partial() goal is to ensure each socket has
one SK_MEM_QUANTUM forward allocation. This is needed both for
performance and better handling of memory pressure situations in
follow up patches.
SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes
as some arches have 64KB pages.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Keep using atomic_long_sub() directly, not sk_memory_allocated_sub()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 70a0dec451 ]
This fixes wrong-interface signaling on 32-bit platforms for entries
created when jiffies > 2^31 + MFC_ASSERT_THRESH.
Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 31ca0458a6 ]
get_bridge_ifindices() is used from the old "deviceless" bridge ioctl
calls which aren't called with rtnl held. The comment above says that it is
called with rtnl but that is not really the case.
Here's a sample output from a test ASSERT_RTNL() which I put in
get_bridge_ifindices and executed "brctl show":
[ 957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30)
[ 957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G W O
4.6.0-rc4+ #157
[ 957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[ 957.423009] 0000000000000000 ffff880058adfdf0 ffffffff8138dec5
0000000000000400
[ 957.423009] ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32
0000000000000001
[ 957.423009] 00007ffec1a444b0 0000000000000400 ffff880053c19130
0000000000008940
[ 957.423009] Call Trace:
[ 957.423009] [<ffffffff8138dec5>] dump_stack+0x85/0xc0
[ 957.423009] [<ffffffffa05ead32>]
br_ioctl_deviceless_stub+0x212/0x2e0 [bridge]
[ 957.423009] [<ffffffff81515beb>] sock_ioctl+0x22b/0x290
[ 957.423009] [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700
[ 957.423009] [<ffffffff8126c159>] SyS_ioctl+0x79/0x90
[ 957.423009] [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1
Since it only reads bridge ifindices, we can use rcu to safely walk the net
device list. Also remove the wrong rtnl comment above.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit bdf17661f6 ]
Similarly, we need to update backlog too when we update qlen.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: open-code qdisc_qstats_backlog_{inc,dec}()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 431e3a8e36 ]
We saw qlen!=0 but backlog==0 on our production machine:
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
backlog 0b 72p requeues 0
The problem is we only count qlen for HTB qdisc but not backlog.
We need to update backlog too when we update qlen, so that we
can at least know the average packet length.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Open-code qdisc_qstats_backlog_{inc,dec}()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d6d5e999e5 ]
For local routes that require a particular output interface we do not want
to cache the result. Caching the result causes incorrect behaviour when
there are multiple source addresses on the interface. The end result
being that if the intended recipient is waiting on that interface for the
packet he won't receive it because it will be delivered on the loopback
interface and the IP_PKTINFO ipi_ifindex will be set to the loopback
interface as well.
This can be tested by running a program such as "dhcp_release" which
attempts to inject a packet on a particular interface so that it is
received by another program on the same board. The receiving process
should see an IP_PKTINFO ipi_ifndex value of the source interface
(e.g., eth1) instead of the loopback interface (e.g., lo). The packet
will still appear on the loopback interface in tcpdump but the important
aspect is that the CMSG info is correct.
Sample dhcp_release command line:
dhcp_release eth1 192.168.204.222 02:11:33:22:44:66
Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed off-by: Chris Friesen <chris.friesen@windriver.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a36a0d4008 ]
In particular, make sure we check for decnet private presence
for loopback devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2aa2f9e21e upstream.
On 64 bit, size may very well be huge even if bit 31 happens to be 0.
Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
3 GiB one. So cap at INT_MAX as was probably the intention all along.
This is also the made-up value passed by sprintf and vsprintf.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
commit 4c03b862b1 upstream.
A nested lock depth was added to the hasbin_delete() code but it
doesn't actually work some well and results in tons of lockdep splats.
Fix the code instead to properly drop the lock around the operation
and just keep peeking the head of the hashbin queue.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 137d01df51 upstream.
What happens is that a write to /dev/sg is given a request with non-zero
->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
to an array full of empty iovecs.
Having write permission to /dev/sg shouldn't be equivalent to the
ability to trigger BUG_ON() while holding spinlocks...
Found by Dmitry Vyukov and syzkaller.
[ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
underlying issue. - Linus ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: we're not using iov_iter, but can check the
byte length after truncation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6664498280 upstream.
If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.
This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
Signed-off-by: Anoob Soman <anoob.soman@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a81e6a171 upstream.
Flags (PIPE_BUF_FLAG_PACKET, PIPE_BUF_FLAG_GIFT) could remain on the
unused part of the pipe ring buffer. Previously splice_to_pipe() left
the flags value alone, which could result in incorrect behavior.
Uninitialized flags appears to have been there from the introduction of
the splice syscall.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd22455364 upstream.
xilinx_emaclite looks at the received data to try to determine the
Ethernet packet length but does not properly clamp it if
proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
overflow and a panic via skb_panic() as the length exceeds the allocated
skb size.
Fix those cases.
Also add an additional unconditional check with WARN_ON() at the end.
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: bb81b2ddfa ("net: add Xilinx emac lite device driver")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f9c85ee671 upstream.
Reported as a Kaffeine bug:
https://bugs.kde.org/show_bug.cgi?id=375811
The USB control messages require DMA to work. We cannot pass
a stack-allocated buffer, as it is not warranted that the
stack would be into a DMA enabled area.
On Kernel 4.9, the default is to not accept DMA on stack anymore
on x86 architecture. On other architectures, this has been a
requirement since Kernel 2.2. So, after this patch, this driver
should likely work fine on all archs.
Tested with USB ID 2040:5510: Hauppauge Windham
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2:
- s/sms_msg_hdr/SmsMsgHdr_ST/
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d199fab63c upstream.
Multiple threads can call fanout_add() at the same time.
We need to grab fanout_mutex earlier to avoid races that could
lead to one thread freeing po->rollover that was set by another thread.
Do the same in fanout_release(), for peace of mind, and to help us
finding lockdep issues earlier.
Fixes: dc99f60069 ("packet: Add fanout support.")
Fixes: 0648ab70af ("packet: rollover prepare: per-socket state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- No rollover queue stats
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 25f71d1c3e upstream.
The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.
The user mode helper is triggered by device init calls and the executable
might use the futex syscall.
futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.
If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.
Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.
[ tglx: Rewrote changelog ]
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b74d439e1 upstream.
It seems nobody used LLC since linux-3.12.
Fortunately fuzzers like syzkaller still know how to run this code,
otherwise it would be no fun.
Setting skb->sk without skb->destructor leads to all kinds of
bugs, we now prefer to be very strict about it.
Ideally here we would use skb_set_owner() but this helper does not exist yet,
only CAN seems to have a private helper for that.
Fixes: 376c7311bd ("net: add a temporary sanity check in skb_orphan()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Extracted from commit 62bccb8cdb ("net-timestamp: Make the clone operation
stand-alone from phy timestamping").
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72fb96e7bd upstream.
udp_ioctl(), as its name suggests, is used by UDP protocols,
but is also used by L2TP :(
L2TP should use its own handler, because it really does not
look the same.
SIOCINQ for instance should not assume UDP checksum or headers.
Thanks to Andrey and syzkaller team for providing the report
and a nice reproducer.
While crashes only happen on recent kernels (after commit
7c13f97ffd ("udp: do fwd memory scheduling on dequeue")), this
probably needs to be backported to older kernels.
Fixes: 7c13f97ffd ("udp: do fwd memory scheduling on dequeue")
Fixes: 8558467201 ("udp: Fix udp_poll() and ioctl()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop the IPv6 change
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7447095485 upstream.
rx_refill_timer should be deleted as soon as we disconnect from the
backend since otherwise it is possible for the timer to go off before
we get to xennet_destroy_queues(). If this happens we may dereference
queue->rx.sring which is set to NULL in xennet_disconnect_backend().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: there's only one RX queue here, and del_timer_sync()
was called from xennet_remove() but that's also too late]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2dfa6688aa upstream.
Dan Carpenter kindly reported:
<quote>
The patch d27a7cb919: "zfcp: trace on request for open and close of
WKA port" from Aug 10, 2016, leads to the following static checker
warning:
drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port()
warn: 'req' was already freed.
drivers/s390/scsi/zfcp_fsf.c
1609 zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
1610 retval = zfcp_fsf_req_send(req);
1611 if (retval)
1612 zfcp_fsf_req_free(req);
^^^
Freed.
1613 out:
1614 spin_unlock_irq(&qdio->req_q_lock);
1615 if (req && !IS_ERR(req))
1616 zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
^^^^^^^^^^^
Use after free.
1617 return retval;
1618 }
Same thing for zfcp_fsf_close_wka_port() as well.
</quote>
Rather than relying on req being NULL (or ERR_PTR) for all cases where
we don't want to trace or should not trace,
simply check retval which is unconditionally initialized with -EIO != 0
and it can only become 0 on successful retval = zfcp_fsf_req_send(req).
With that we can also remove the then again unnecessary unconditional
initialization of req which was introduced with that earlier commit.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: d27a7cb919 ("zfcp: trace on request for open and close of WKA port")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2a36224918 upstream.
Commit 4c63c2454e incorrectly assumed that returning -ENOIOCTLCMD would
cause the native ioctl to be called. The ->compat_ioctl callback is
expected to handle all ioctls, not just compat variants. As a result,
when using 32-bit userspace on 64-bit kernels, everything except those
three ioctls would return -ENOTTY.
Fixes: 4c63c2454e ("btrfs: bugfix: handle FS_IOC32_{GETFLAGS,SETFLAGS,GETVERSION} in btrfs_ioctl")
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4842e98f26 upstream.
When a sequencer queue is created in snd_seq_queue_alloc(),it adds the
new queue element to the public list before referencing it. Thus the
queue might be deleted before the call of snd_seq_queue_use(), and it
results in the use-after-free error, as spotted by syzkaller.
The fix is to reference the queue object at the right time.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d6a0e9de0 upstream.
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 7926aff5c5 upstream.
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 37a7ea4a9b upstream.
snd_seq_pool_done() syncs with closing of all opened threads, but it
aborts the wait loop with a timeout, and proceeds to the release
resource even if not all threads have been closed. The timeout was 5
seconds, and if you run a crazy stuff, it can exceed easily, and may
result in the access of the invalid memory address -- this is what
syzkaller detected in a bug report.
As a fix, let the code graduate from naiveness, simply remove the loop
timeout.
BugLink: http://lkml.kernel.org/r/CACT4Y+YdhDV2H5LLzDTJDVF-qiYHUHhtRaW4rbb4gUhTCQB81w@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: deleted log statement is slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da7061c82e upstream.
The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead
it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The
return value in mesh_add_vendor_ies must therefore be checked against
ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with
WLAN_EID_VENDOR_SPECIFIC will be rejected.
Fixes: 082ebb0c25 ("mac80211: fix mesh beacon format")
Signed-off-by: Thorsten Horstmann <thorsten@defutech.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[sven@narfation.org: Add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d07830db1b upstream.
Seems that ATEN serial-to-usb devices using pl2303 exist with
different device ids. This patch adds a missing device ID so it
is recognised by the driver.
Signed-off-by: Marcel J.E. Mol <marcel@mesa.nl>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 228dbbfb5d upstream.
Ensure that if userspace supplies insufficient data to
PTRACE_SETREGSET to fill all the registers, the thread's old
registers are preserved.
Fixes: 5be6f62b00 ("ARM: 6883/1: ptrace: Migrate to regsets framework")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a06393ed03 upstream.
When removing a bcm tx operation either a hrtimer or a tasklet might run.
As the hrtimer triggers its associated tasklet and vice versa we need to
take care to mutually terminate both handlers.
Reported-by: Michael Josenhans <michael.josenhans@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Michael Josenhans <michael.josenhans@web.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ad5d52d42 upstream.
In swab.h the "#if BITS_PER_LONG > 32" breaks compiling userspace programs if
BITS_PER_LONG is #defined by userspace with the sizeof() compiler builtin.
Solve this problem by using __BITS_PER_LONG instead. Since we now
#include asm/bitsperlong.h avoid further potential userspace pollution
by moving the #define of SHIFT_PER_LONG to bitops.h which is not
exported to userspace.
This patch unbreaks compiling qemu on hppa/parisc.
Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff9f8a7cf9 upstream.
We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.
We need to do the opposite operation when value is written by user.
Only matters when HZ != 1000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d9b2997e4a upstream.
Add a quirk for WORLDE easykey.25 MIDI keyboard (idVendor=0218,
idProduct=0401). The device reports that it has config string
descriptor at index 3, but when the system selects the configuration
and tries to get the description, it returns a -EPROTO error,
the communication restarts and this keeps repeating over and over again.
Not requesting the string descriptor makes the device work correctly.
Relevant info from Wireshark:
[...]
CONFIGURATION DESCRIPTOR
bLength: 9
bDescriptorType: 0x02 (CONFIGURATION)
wTotalLength: 101
bNumInterfaces: 2
bConfigurationValue: 1
iConfiguration: 3
Configuration bmAttributes: 0xc0 SELF-POWERED NO REMOTE-WAKEUP
1... .... = Must be 1: Must be 1 for USB 1.1 and higher
.1.. .... = Self-Powered: This device is SELF-POWERED
..0. .... = Remote Wakeup: This device does NOT support remote wakeup
bMaxPower: 50 (100mA)
[...]
45 0.369104 host 2.38.0 USB 64 GET DESCRIPTOR Request STRING
[...]
URB setup
bmRequestType: 0x80
1... .... = Direction: Device-to-host
.00. .... = Type: Standard (0x00)
...0 0000 = Recipient: Device (0x00)
bRequest: GET DESCRIPTOR (6)
Descriptor Index: 0x03
bDescriptorType: 0x03
Language Id: English (United States) (0x0409)
wLength: 255
46 0.369255 2.38.0 host USB 64 GET DESCRIPTOR Response STRING[Malformed Packet]
[...]
Frame 46: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface 0
USB URB
[Source: 2.38.0]
[Destination: host]
URB id: 0xffff88021f62d480
URB type: URB_COMPLETE ('C')
URB transfer type: URB_CONTROL (0x02)
Endpoint: 0x80, Direction: IN
Device: 38
URB bus id: 2
Device setup request: not relevant ('-')
Data: present (0)
URB sec: 1484896277
URB usec: 455031
URB status: Protocol error (-EPROTO) (-71)
URB length [bytes]: 0
Data length [bytes]: 0
[Request in: 45]
[Time from request: 0.000151000 seconds]
Unused Setup Header
Interval: 0
Start frame: 0
Copy of Transfer Flags: 0x00000200
Number of ISO descriptors: 0
[Malformed Packet: USB]
[Expert Info (Error/Malformed): Malformed Packet (Exception occurred)]
[Malformed Packet (Exception occurred)]
[Severity level: Error]
[Group: Malformed]
Signed-off-by: Lukáš Lalinský <lukas@oxygene.sk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8310d48b12 upstream.
In commit 19be0eaffa ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead. Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.
However, a similar check in huge_memory.c was forgotten. As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.
While in this state the process is stil SIGKILLable, but little else
works (e.g. no ptrace attach, no other signals). This is easily
reproduced with the following code (assuming thp are set to always):
#include <assert.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define TEST_SIZE 5 * 1024 * 1024
int main(void) {
int status;
pid_t child;
int fd = open("/proc/self/mem", O_RDWR);
void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr != MAP_FAILED);
pid_t parent_pid = getpid();
if ((child = fork()) == 0) {
void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
assert(addr2 != MAP_FAILED);
memset(addr2, 'a', TEST_SIZE);
pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
return 0;
}
assert(child == waitpid(child, &status, 0));
assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
return 0;
}
Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit. The same pattern exists
in follow_devmap_pmd. However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Drop change to follow_devmap_pmd()
- pmd_dirty() is not available; check the page flags as in
can_follow_write_pte()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6040764ad upstream.
Make sure CRYPTO_ALG_DEAD bit is cleared before proceeding with
the algorithm registration. This fixes qat-dh registration when
driver is restarted
Signed-off-by: Salvatore Benedetto <salvatore.benedetto@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a00b6c243 upstream.
The commit 1c6c69525b ("genirq: Reject bogus threaded irq requests")
starts refusing misconfigured interrupt handlers. This makes
intel_mid_powerbtn not working anymore.
Add a mandatory flag to a threaded IRQ request in the driver.
Fixes: 1c6c69525b ("genirq: Reject bogus threaded irq requests")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 178f358208 upstream.
IBM bit 31 (for the rest of us - bit 0) is a reserved field in the
instruction definition of mtspr and mfspr. Hardware is encouraged to
(and does) ignore it.
As a result, if userspace executes an mtspr DSCR with the reserved bit
set, we get a DSCR facility unavailable exception. The kernel fails to
match against the expected value/mask, and we silently return to
userspace to try and re-execute the same mtspr DSCR instruction. We
loop forever until the process is killed.
We should do something here, and it seems mirroring what hardware does
is the better option vs killing the process. While here, relax the
matching of mfspr PVR too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: drop changes to PPC_INST_M{F,T}SPR_DSCR_USER_MASK]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 99dfe80a2a upstream.
Ensure that if userspace supplies insufficient data to PTRACE_SETREGSET
to fill all the registers, the thread's old registers are preserved.
Fixes: c6e6771b87 ("powerpc: Introduce VSX thread_struct and CONFIG_VSX")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2:
- fpscr and fpr are direct members of struct thread_struct
- Use memcpy() for fpscr, like the reverse copy below, to avoid type error
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d61b7f972d upstream.
A user noticed that write performance was horrible over loopback and we
traced it to an inversion of when we need to set MSG_MORE. It should be
set when we have more bvec's to send, not when we are on the last bvec.
This patch made the test go from 20 iops to 78k iops.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Fixes: 429a787be6 ("nbd: fix use-after-free of rq/bio in the xmit path")
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 429a787be6 upstream.
For writes, we can get a completion in while we're still iterating
the request and bio chain. If that happens, we're reading freed
memory and we can crash.
Break out after the last segment and avoid having the iterator
read freed memory.
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2:
- bio_for_each_segment() uses iterator of type int
- Open-code bio_iter_last()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6df8c9d80a upstream.
sparse says:
fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer
The op value is __le32, so we need to convert it before comparing it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[bwh: Backported to 3.2: only filelock and directory replies are handled]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit befa60113c upstream.
In order to make the driver work with the common clock framework, this
patch converts the clk_enable()/clk_disable() to
clk_prepare_enable()/clk_disable_unprepare().
Also add error checking for clk_prepare_enable().
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1cb51a15b5 upstream.
When replaying the journal it can happen that a journal entry points to
a garbage collected node.
This is the case when a power-cut occurred between a garbage collect run
and a commit. In such a case nodes have to be read using the failable
read functions to detect whether the found node matches what we expect.
One corner case was forgotten, when the journal contains an entry to
remove an inode all xattrs have to be removed too. UBIFS models xattr
like directory entries, so the TNC code iterates over
all xattrs of the inode and removes them too. This code re-uses the
functions for walking directories and calls ubifs_tnc_next_ent().
ubifs_tnc_next_ent() expects to be used only after the journal and
aborts when a node does not match the expected result. This behavior can
render an UBIFS volume unmountable after a power-cut when xattrs are
used.
Fix this issue by using failable read functions in ubifs_tnc_next_ent()
too when replaying the journal.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Reported-by: Rock Lee <rockdotlee@gmail.com>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 291c566a28 upstream.
In function mlx4_cq_completion() and mlx4_cq_event(), the
radix_tree_lookup requires a rcu_read_lock.
This is mandatory: if another core frees the CQ, it could
run the radix_tree_node_rcu_free() call_rcu() callback while
its being used by the radix tree lookup function.
Additionally, in function mlx4_cq_event(), since we are adding
the rcu lock around the radix-tree lookup, we no longer need to take
the spinlock. Also, the synchronize_irq() call for the async event
eliminates the need for incrementing the cq reference count in
mlx4_cq_event().
Other changes:
1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
we no longer take this spinlock in the interrupt context.
The spinlock here, therefore, simply protects against different
threads simultaneously invoking mlx4_cq_free() for different cq's.
2. In function mlx4_cq_free(), we move the radix tree delete to before
the synchronize_irq() calls. This guarantees that we will not
access this cq during any subsequent interrupts, and therefore can
safely free the CQ after the synchronize_irq calls. The rcu_read_lock
in the interrupt handlers only needs to protect against corrupting the
radix tree; the interrupt handlers may access the cq outside the
rcu_read_lock due to the synchronize_irq calls which protect against
premature freeing of the cq.
3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.
4. We leave the cq reference count mechanism in place, because it is
still needed for the cq completion tasklet mechanism.
Fixes: 6d90aa5cf1 ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78794d1890 upstream.
Context expiry times are in units of seconds since boot, not unix time.
The use of get_seconds() here therefore sets the expiry time decades in
the future. This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.
Fixes: c5b29f885a "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30f939feae upstream.
i2c_smbus_xfer() does not always fill an entire block, allowing
kernel stack memory disclosure through the temp variable. Clear
it before it's read to.
Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01167c7b9c upstream.
According to the code the intention is to append 8 SCK cycles
instead of 4 at end of a MMC_STOP_TRANSMISSION command. But this
will never happened because it's an AC command not an ADTC command.
So fix this by moving the statement into the right function.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: e4243f13d1 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6169d0409 upstream.
If a URB is killed while the host is removed we can end up in a situation
where the hub thread takes the roothub device lock, and waits for
the URB to be given back by xhci-hcd, blocking the host remove code.
xhci-hcd tries to stop the endpoint and give back the urb, but can't
as the host is removed from PCI bus at the same time, preventing the normal
way of giving back urb.
Instead we need to rely on the stop command timeout function to give back
the urb. This xhci_stop_endpoint_command_watchdog() timeout function
used a XHCI_STATE_DYING flag to indicate if the timeout function is already
running, but later this flag has been taking into use in other places to
mark that xhci is dying.
Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
checking that reading from pci state does not return 0xffffffff or that
host is not halted before trying to stop the endpoint.
This whole area of stopping endpoints, giving back URBs, and the wathdog
timeout need rework, this fix focuses on solving a specific deadlock
issue that we can then send to stable before any major rework.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: the checks look slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cfd5fd5a9 upstream.
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.
Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d5a9c72d0 upstream.
A short control transfer would currently fail to be detected, something
which could lead to stale buffer data being used as valid input.
Check for short transfers, and make sure to log any transfer errors.
Note that this also avoids leaking heap data to user space (TIOCMGET)
and the remote device (break control).
Fixes: 6ce7610478 ("USB: Driver for CH341 USB-serial adaptor")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c8a6a09c1c upstream.
In ca91cx42_slave_get function, the value pointed by vme_base pointer is
set through:
*vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);
So it must be dereferenced to be used in calculation of pci_base:
*pci_base = (dma_addr_t)*vme_base + pci_offset;
This bug was caught thanks to the following gcc warning:
drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
*pci_base = (dma_addr_t)vme_base + pci_offset;
Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Acked-By: Martyn Welch <martyn@welchs.me.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 802c03881f upstream.
The sysrq input handler should be attached to the input device which has
a left alt key.
On 32-bit kernels, some input devices which has a left alt key cannot
attach sysrq handler. Because the keybit bitmap in struct input_device_id
for sysrq is not correctly initialized. KEY_LEFTALT is 56 which is
greater than BITS_PER_LONG on 32-bit kernels.
I found this problem when using a matrix keypad device which defines
a KEY_LEFTALT (56) but doesn't have a KEY_O (24 == 56%32).
Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7ee2c089e upstream.
The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.
The crash backtrace is below:
dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
ocfs2: Beginning quota recovery on device (253,18) for slot 2
ocfs2: Finishing quota recovery on device (253,18) for slot 2
(truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
(truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
------------[ cut here ]------------
kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
invalid opcode: 0000 [#1] SMP
Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: No, Unsupported modules are loaded
CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282
RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
Call Trace:
ocfs2_setattr+0x698/0xa90 [ocfs2]
notify_change+0x1ae/0x380
do_truncate+0x5e/0x90
do_sys_ftruncate.constprop.11+0x108/0x160
entry_SYSCALL_64_fastpath+0x12/0x6d
Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size. We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us. But, why?
The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB. This is not the right way for
fsdlm.
The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.
The following diagram briefly illustrates how this crash happens:
RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;
The 1st round:
Node1 Node2
RSB1: PR
RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
ocfs2_dlm_lock(no DLM_LKF_VALBLK)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
dlm_lock(no DLM_LKF_VALBLK)
convert_lock(overwrite lkb->lkb_exflags
with no DLM_LKF_VALBLK)
RSB1: NULL RSB1: EX
reset Node2
dlm_recover_rsbs()
recover_lvb()
/* The LVB is not trustable if the node with EX fails and
* no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
*/
if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
return; * to invalid the LVB here.
*/
The 2nd round:
Node 1 Node2
RSB1(become master from recovery)
ocfs2_setattr()
ocfs2_inode_lock(NULL->EX)
/* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
ocfs2_truncate_file()
mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */
The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.
Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57ea52a865 upstream.
The GRO fast path caches the frag0 address. This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.
This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.
This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.
Fixes: 78a478d0ef ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1272ce87fa upstream.
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0. However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.
This patch adds the check by capping frag0_len with the skb tailroom.
Fixes: cb18978cbf ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ddc37832a1 upstream.
On APQ8060, the kernel crashes in arch_hw_breakpoint_init, taking an
undefined instruction trap within write_wb_reg. This is because Scorpion
CPUs erroneously appear to set DBGPRSR.SPD when WFI is issued, even if
the core is not powered down. When DBGPRSR.SPD is set, breakpoint and
watchpoint registers are treated as undefined.
It's possible to trigger similar crashes later on from userspace, by
requesting the kernel to install a breakpoint or watchpoint, as we can
go idle at any point between the reset of the debug registers and their
later use. This has always been the case.
Given that this has always been broken, no-one has complained until now,
and there is no clear workaround, disable hardware breakpoints and
watchpoints on Scorpion to avoid these issues.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2:
- Open-code read_cpuid_part()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55fa15b598 upstream.
Revert to using direct register writes to set the divisor and
line-control registers.
A recent change switched to using the init vendor command to update
these registers, something which also enabled support for CH341A
devices. It turns out that simply setting bit 7 in the divisor register
is sufficient to support CH341A and specifically prevent data from being
buffered until a full endpoint-size packet (32 bytes) has been received.
Using the init command also had the side-effect of temporarily
deasserting the DTR/RTS signals on every termios change (including
initialisation on open) something which for example could cause problems
in setups where DTR is used to trigger a reset.
Fixes: 4e46c410e0 ("USB: serial: ch341: reinitialize chip on
reconfiguration")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ce5e292828 upstream.
Fix reset-resume handling which failed to resubmit the read and
interrupt URBs, thereby leaving a port that was open before suspend in a
broken state until closed and reopened.
Fixes: 1ded7ea47b ("USB: ch341 serial: fix port number changed after
resume")
Fixes: 2bfd1c96a9 ("USB: serial: ch341: remove reset_resume callback")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- Move initialisation of 'serial' up to make this work
- Delete the call to usb_serial_resume() that was still present and
would be redundant with usb_serial_generic_resume()
- Open-code tty_port_initialized()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2950b7854 upstream.
Make sure to stop the interrupt URB before returning on errors during
open.
Fixes: 664d5df92e ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 030ee7ae52 upstream.
The modem-control signals are managed by the tty-layer during open and
should not be asserted prematurely when set_termios is called from
driver open.
Also make sure that the signals are asserted only when changing speed
from B0.
Fixes: 664d5df92e ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a20047f36e upstream.
The private baud_rate variable is used to configure the port at open and
reset-resume and must never be set to (and left at) zero or reset-resume
and all further open attempts will fail.
Fixes: aa91def41a ("USB: ch341: set tty baud speed according to tty
struct")
Fixes: 664d5df92e ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e2da44691 upstream.
DTR and RTS will be asserted by the tty-layer when the port is opened
and deasserted on close (if HUPCL is set). Make sure the initial state
is not-asserted before the port is first opened as well.
Fixes: 664d5df92e ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e46c410e0 upstream.
Changing the LCR register after initialization does not seem to be reliable
on all chips (particularly not on CH341A). Restructure initialization and
configuration to always reinit the chip on configuration changes instead and
pass the LCR register value directly to the initialization command.
(Note that baud rates above 500kbaud are incorrect, but they're incorrect in
the same way both before and after this patch at least on the CH340G. Fixing
this isn't a priority as higher baud rates don't seem that reliable anyway.)
Cleaned-up version of a patch by Grigori Goronzy
Signed-off-by: Aidan Thornton <makosoft@gmail.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: tty_struct::termios is a pointer, not a struct]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6fde8d29b0 upstream.
No functional changes, this just gives names to some registers and USB
requests based on Grigori Goronzy's work and WinChipTech's Linux driver
(which reassuringly agree), then uses them in place of magic numbers.
This also renames the misnamed BREAK2 register (actually UART config)
Signed-off-by: Aidan Thornton <makosoft@gmail.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aa91def41a upstream.
The ch341_set_baudrate() function initialize the device baud speed
according to the value on priv->baud_rate. By default the ch341_open() set
it to a hardcoded value (DEFAULT_BAUD_RATE 9600). Unfortunately, the
tty_struct is not initialized with the same default value. (usually 56700)
This means that the tty_struct and the device baud rate generator are not
synchronized after opening the port.
Fixup is done by calling ch341_set_termios() if tty exist.
Remove unnecessary variable priv->baud_rate setup as it's already done by
ch341_port_probe().
Remove unnecessary call to ch341_set_{handshake,baudrate}() in
ch341_open() as there already called in ch341_configure() and
ch341_set_termios()
Signed-off-by: Nicolas PLANEL <nicolas.planel@enovance.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 394a10331a upstream.
Remove redundant call to ch341_close from error path when submission of
the interrupt urb fails in open.
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06946a6654 upstream.
All error messages from stack in open are being forwarded except for
one call to usb_submit_urb. Change this for consistency.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 064c3db9c5 upstream.
Here, If devm_ioremap will fail. It will return NULL.
Then hpriv->base = NULL - 0x20000; Kernel can run into
a NULL-pointer dereference. This error check will avoid
NULL pointer dereference.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0a8fd13462 upstream.
When checking a new device's descriptors, the USB core does not check
for duplicate endpoint addresses. This can cause a problem when the
sysfs files for those endpoints are created; trying to create multiple
files with the same name will provoke a WARNING:
WARNING: CPU: 2 PID: 865 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x8a/0xa0
sysfs: cannot create duplicate filename
'/devices/platform/dummy_hcd.0/usb2/2-1/2-1:64.0/ep_05'
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 865 Comm: kworker/2:1 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
ffff88006bee64c8 ffffffff81f96b8a ffffffff00000001 1ffff1000d7dcc2c
ffffed000d7dcc24 0000000000000001 0000000041b58ab3 ffffffff8598b510
ffffffff81f968f8 ffffffff850fee20 ffffffff85cff020 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff8168c88e>] panic+0x1cb/0x3a9 kernel/panic.c:179
[<ffffffff812b80b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
[<ffffffff812b8195>] warn_slowpath_fmt+0xc5/0x110 kernel/panic.c:565
[<ffffffff819e70ca>] sysfs_warn_dup+0x8a/0xa0 fs/sysfs/dir.c:30
[<ffffffff819e7308>] sysfs_create_dir_ns+0x178/0x1d0 fs/sysfs/dir.c:59
[< inline >] create_dir lib/kobject.c:71
[<ffffffff81fa1b07>] kobject_add_internal+0x227/0xa60 lib/kobject.c:229
[< inline >] kobject_add_varg lib/kobject.c:366
[<ffffffff81fa2479>] kobject_add+0x139/0x220 lib/kobject.c:411
[<ffffffff82737a63>] device_add+0x353/0x1660 drivers/base/core.c:1088
[<ffffffff82738d8d>] device_register+0x1d/0x20 drivers/base/core.c:1206
[<ffffffff82cb77d3>] usb_create_ep_devs+0x163/0x260 drivers/usb/core/endpoint.c:195
[<ffffffff82c9f27b>] create_intf_ep_devs+0x13b/0x200 drivers/usb/core/message.c:1030
[<ffffffff82ca39d3>] usb_set_configuration+0x1083/0x18d0 drivers/usb/core/message.c:1937
[<ffffffff82cc9e2e>] generic_probe+0x6e/0xe0 drivers/usb/core/generic.c:172
[<ffffffff82caa7fa>] usb_probe_device+0xaa/0xe0 drivers/usb/core/driver.c:263
This patch prevents the problem by checking for duplicate endpoint
addresses during enumeration and skipping any duplicates.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c300fe282 upstream.
When unloading omap2430, we can get the following splat:
WARNING: CPU: 1 PID: 295 at kernel/irq/manage.c:1478 __free_irq+0xa8/0x2c8
Trying to free already-free IRQ 4
...
[<c01a8b78>] (free_irq) from [<bf0aea84>]
(musbhs_dma_controller_destroy+0x28/0xb0 [musb_hdrc])
[<bf0aea84>] (musbhs_dma_controller_destroy [musb_hdrc]) from
[<bf09f88c>] (musb_remove+0xf0/0x12c [musb_hdrc])
[<bf09f88c>] (musb_remove [musb_hdrc]) from [<c056a384>]
(platform_drv_remove+0x24/0x3c)
...
This is because the irq number in use is 260 nowadays, and the dma
controller is using u8 instead of int.
Fixes: 6995eb68aa ("USB: musb: enable low level DMA operation for Blackfin")
Signed-off-by: Tony Lindgren <tony@atomide.com>
[b-liu@ti.com: added Fixes tag]
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 432abf68a7 upstream.
The generic command buffer entry is 128 bits (16 bytes), so the offset
of tail and head pointer should be 16 bytes aligned and increased with
0x10 per command.
When cmd buf is full, head = (tail + 0x10) % CMD_BUFFER_SIZE.
So when left space of cmd buf should be able to store only two
command, we should be issued one COMPLETE_WAIT additionally to wait
all older commands completed. Then the left space should be increased
after IOMMU fetching from cmd buf.
So left check value should be left <= 0x20 (two commands).
Signed-off-by: Huang Rui <ray.huang@amd.com>
Fixes: ac0ea6e92b ('x86/amd-iommu: Improve handling of full command buffer')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cc09092482 upstream.
Fix NULL-pointer dereference in open() should the device lack the
expected endpoints:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at spcp8x5_open+0x30/0xd0 [spcp8x5]
Fixes: 619a6f1d14 ("USB: add usb-serial spcp8x5 driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76ab439ed1 upstream.
Fix NULL-pointer dereference in open() should a type-0 or type-1 device
lack the expected endpoints:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at pl2303_open+0x38/0xec [pl2303]
Note that a missing interrupt-in endpoint would have caused open() to
fail.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5afeef2366 upstream.
Fix NULL-pointer dereference in open() should the device lack the
expected endpoints:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at oti6858_open+0x30/0x1d0 [oti6858]
Note that a missing interrupt-in endpoint would have caused open() to
fail.
Fixes: 49cdee0ed0 ("USB: oti6858 usb-serial driver (in Nokia CA-42
cable)")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a5bc01949e upstream.
Fix NULL-pointer dereferences at open() and disconnect() should the
device lack the expected bulk-out endpoints:
Unable to handle kernel NULL pointer dereference at virtual address 000000b4
...
[c0170ff0>] (__lock_acquire) from [<c0172f00>] (lock_acquire+0x108/0x264)
[<c0172f00>] (lock_acquire) from [<c06a5090>] (_raw_spin_lock_irqsave+0x58/0x6c)
[<c06a5090>] (_raw_spin_lock_irqsave) from [<c0470684>] (tty_port_tty_set+0x28/0xa4)
[<c0470684>] (tty_port_tty_set) from [<bf08d384>] (omninet_open+0x30/0x40 [omninet])
[<bf08d384>] (omninet_open [omninet]) from [<bf07c118>] (serial_port_activate+0x68/0x98 [usbserial])
Unable to handle kernel NULL pointer dereference at virtual address 00000234
...
[<bf01f418>] (omninet_disconnect [omninet]) from [<bf0016c0>] (usb_serial_disconnect+0xe4/0x100 [usbserial])
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 472d7e55d5 upstream.
The interrupt URB is killed at final port close since commit
0de9a7024e ("USB: overhaul of mos7840 driver").
Fixes: 0de9a7024e ("USB: overhaul of mos7840 driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c75633ef7 upstream.
Fix NULL-pointer dereference in open() should the device lack the
expected endpoints:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at mos7840_open+0x88/0x8dc [mos7840]
Note that we continue to treat the interrupt-in endpoint as optional for
now.
Fixes: 3f5429746d ("USB: Moschip 7840 USB-Serial Driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fde1faf872 upstream.
A static usb-serial-driver structure that is used to initialise the
interrupt URB was modified during probe depending on the currently
probed device type, something which could break a parallel probe of a
device of a different type.
Fix this up by overriding the default completion callback for MCS7715
devices in attach() instead. We may want to use two usb-serial driver
instances for the two types later.
Fixes: fb088e335d ("USB: serial: add support for serial port on the
moschip 7715")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 75dd211e77 upstream.
Do not submit the interrupt URB until after the parport has been
successfully registered to avoid another use-after-free in the
completion handler when accessing the freed parport private data in case
of a racing completion.
Fixes: b69578df7e ("USB: usbserial: mos7720: add support for parallel
port on moschip 7715")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 91a1ff4d53 upstream.
The interrupt URB was submitted on probe but never stopped on probe
errors. This can lead to use-after-free issues in the completion
handler when accessing the freed usb-serial struct:
Unable to handle kernel paging request at virtual address 6b6b6be7
...
[<bf052e70>] (mos7715_interrupt_callback [mos7720]) from [<c052a894>] (__usb_hcd_giveback_urb+0x80/0x140)
[<c052a894>] (__usb_hcd_giveback_urb) from [<c052a9a4>] (usb_hcd_giveback_urb+0x50/0x138)
[<c052a9a4>] (usb_hcd_giveback_urb) from [<c0550684>] (musb_giveback+0xc8/0x1cc)
Fixes: b69578df7e ("USB: usbserial: mos7720: add support for parallel
port on moschip 7715")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b05aebc25f upstream.
Fix NULL-pointer dereference at port open if a device lacks the expected
bulk in and out endpoints.
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
[<bf071c20>] (mos7720_open [mos7720]) from [<bf0490e0>] (serial_port_activate+0x68/0x98 [usbserial])
[<bf0490e0>] (serial_port_activate [usbserial]) from [<c0470ca4>] (tty_port_open+0x9c/0xe8)
[<c0470ca4>] (tty_port_open) from [<bf049d98>] (serial_open+0x48/0x6c [usbserial])
[<bf049d98>] (serial_open [usbserial]) from [<c0469178>] (tty_open+0xcc/0x5cc)
Fixes: 0f64478cbc ("USB: add USB serial mos7720 driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 21ce578402 upstream.
Fix NULL-pointer dereference in write() should the device lack the
expected interrupt-out endpoint:
Unable to handle kernel NULL pointer dereference at virtual address 00000054
...
PC is at kobil_write+0x144/0x2a0 [kobil_sct]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d9b0f859b upstream.
Check for the expected endpoints in attach() and fail loudly if not
present.
Note that failing to do this appears to be benign since da280e3488
("USB: keyspan_pda: clean up write-urb busy handling") which prevents a
NULL-pointer dereference in write() by never marking a non-existent
write-urb as free.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90507d54f7 upstream.
Fix NULL-pointer dereference at open should the device lack a bulk-in or
bulk-out endpoint:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at iuu_open+0x78/0x59c [iuu_phoenix]
Fixes: 07c3b1a100 ("USB: remove broken usb-serial num_endpoints
check")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4f9785cc99 upstream.
In case a device is left in "boot-mode" we must not register any port
devices in order to avoid a NULL-pointer dereference on open due to
missing endpoints. This could be used by a malicious device to trigger
an OOPS:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
[<bf0caa84>] (edge_open [io_ti]) from [<bf0b0118>] (serial_port_activate+0x68/0x98 [usbserial])
[<bf0b0118>] (serial_port_activate [usbserial]) from [<c0470ca4>] (tty_port_open+0x9c/0xe8)
[<c0470ca4>] (tty_port_open) from [<bf0b0da0>] (serial_open+0x48/0x6c [usbserial])
[<bf0b0da0>] (serial_open [usbserial]) from [<c0469178>] (tty_open+0xcc/0x5cc)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- No heartbeat_work to initialise earlier
- No separate port_probe and port_remove operations, so add check for null
port pointers in edge_release()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a323fefc6f upstream.
Fix NULL-pointer dereference when clearing halt at open should a
malicious device lack the expected endpoints when in download mode.
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
[<bf011ed8>] (edge_open [io_ti]) from [<bf000118>] (serial_port_activate+0x68/0x98 [usbserial])
[<bf000118>] (serial_port_activate [usbserial]) from [<c0470ca4>] (tty_port_open+0x9c/0xe8)
[<c0470ca4>] (tty_port_open) from [<bf000da0>] (serial_open+0x48/0x6c [usbserial])
[<bf000da0>] (serial_open [usbserial]) from [<c0469178>] (tty_open+0xcc/0x5cc)
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0dd408425e upstream.
Fix NULL-pointer dereference when initialising URBs at open should a
non-EPIC device lack a bulk-in or interrupt-in endpoint.
Unable to handle kernel NULL pointer dereference at virtual address 00000028
...
PC is at edge_open+0x24c/0x3e8 [io_edgeport]
Note that the EPIC-device probe path has the required sanity checks so
this makes those checks partially redundant.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c4ac4496e8 upstream.
Make sure to free the URB transfer buffer in case submission fails (e.g.
due to a disconnect).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3dca01114d upstream.
Fix NULL-pointer dereference when clearing halt at open should the device
lack a bulk-out endpoint.
Unable to handle kernel NULL pointer dereference at virtual address 00000030
...
PC is at cyberjack_open+0x40/0x9c [cyberjack]
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: add this check to the existing
usb_serial_driver::attach implementation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee8665e28e upstream.
the tt_info provided by a HS hub might be in use to by a child device
Make sure we free the devices in the correct order.
This is needed in special cases such as when xhci controller is
reset when resuming from hibernate, and all virt_devices are freed.
Also free the virt_devices starting from max slot_id as children
more commonly have higher slot_id than parent.
Reported-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c069b057d upstream.
Andrey Konovalov's fuzz testing of gadgetfs showed that we should
improve the driver's checks for valid configuration descriptors passed
in by the user. In particular, the driver needs to verify that the
wTotalLength value in the descriptor is not too short (smaller
than USB_DT_CONFIG_SIZE). And the check for whether wTotalLength is
too large has to be changed, because the driver assumes there is
always enough room remaining in the buffer to hold a device descriptor
(at least USB_DT_DEVICE_SIZE bytes).
This patch adds the additional check and fixes the existing check. It
may do a little more than strictly necessary, but one extra check
won't hurt.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit add333a81a upstream.
Andrey Konovalov reports that fuzz testing with syzkaller causes a
KASAN use-after-free bug report in gadgetfs:
BUG: KASAN: use-after-free in gadgetfs_setup+0x208a/0x20e0 at addr ffff88003dfe5bf2
Read of size 2 by task syz-executor0/22994
CPU: 3 PID: 22994 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #16
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88006df06a18 ffffffff81f96aba ffffffffe0528500 1ffff1000dbe0cd6
ffffed000dbe0cce ffff88006df068f0 0000000041b58ab3 ffffffff8598b4c8
ffffffff81f96828 1ffff1000dbe0ccd ffff88006df06708 ffff88006df06748
Call Trace:
<IRQ> [ 201.343209] [< inline >] __dump_stack lib/dump_stack.c:15
<IRQ> [ 201.343209] [<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
[< inline >] print_address_description mm/kasan/report.c:197
[<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
[< inline >] kasan_report mm/kasan/report.c:306
[<ffffffff817e562a>] __asan_report_load_n_noabort+0x3a/0x40 mm/kasan/report.c:337
[< inline >] config_buf drivers/usb/gadget/legacy/inode.c:1298
[<ffffffff8322c8fa>] gadgetfs_setup+0x208a/0x20e0 drivers/usb/gadget/legacy/inode.c:1368
[<ffffffff830fdcd0>] dummy_timer+0x11f0/0x36d0 drivers/usb/gadget/udc/dummy_hcd.c:1858
[<ffffffff814807c1>] call_timer_fn+0x241/0x800 kernel/time/timer.c:1308
[< inline >] expire_timers kernel/time/timer.c:1348
[<ffffffff81482de6>] __run_timers+0xa06/0xec0 kernel/time/timer.c:1641
[<ffffffff814832c1>] run_timer_softirq+0x21/0x80 kernel/time/timer.c:1654
[<ffffffff84f4af8b>] __do_softirq+0x2fb/0xb63 kernel/softirq.c:284
The cause of the bug is subtle. The dev_config() routine gets called
twice by the fuzzer. The first time, the user data contains both a
full-speed configuration descriptor and a high-speed config
descriptor, causing dev->hs_config to be set. But it also contains an
invalid device descriptor, so the buffer containing the descriptors is
deallocated and dev_config() returns an error.
The second time dev_config() is called, the user data contains only a
full-speed config descriptor. But dev->hs_config still has the stale
pointer remaining from the first call, causing the routine to think
that there is a valid high-speed config. Later on, when the driver
dereferences the stale pointer to copy that descriptor, we get a
use-after-free access.
The fix is simple: Clear dev->hs_config if the passed-in data does not
contain a high-speed config descriptor.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit faab50984f upstream.
Andrey Konovalov reports that fuzz testing with syzkaller causes a
KASAN warning in gadgetfs:
BUG: KASAN: slab-out-of-bounds in dev_config+0x86f/0x1190 at addr ffff88003c47e160
Write of size 65537 by task syz-executor0/6356
CPU: 3 PID: 6356 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88003c107ad8 ffffffff81f96aba ffffffff3dc11ef0 1ffff10007820eee
ffffed0007820ee6 ffff88003dc11f00 0000000041b58ab3 ffffffff8598b4c8
ffffffff81f96828 ffffffff813fb4a0 ffff88003b6eadc0 ffff88003c107738
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96aba>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff817e4dec>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:159
[< inline >] print_address_description mm/kasan/report.c:197
[<ffffffff817e5080>] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:286
[<ffffffff817e5705>] kasan_report+0x35/0x40 mm/kasan/report.c:306
[< inline >] check_memory_region_inline mm/kasan/kasan.c:308
[<ffffffff817e3fb9>] check_memory_region+0x139/0x190 mm/kasan/kasan.c:315
[<ffffffff817e4044>] kasan_check_write+0x14/0x20 mm/kasan/kasan.c:326
[< inline >] copy_from_user arch/x86/include/asm/uaccess.h:689
[< inline >] ep0_write drivers/usb/gadget/legacy/inode.c:1135
[<ffffffff83228caf>] dev_config+0x86f/0x1190 drivers/usb/gadget/legacy/inode.c:1759
[<ffffffff817fdd55>] __vfs_write+0x5d5/0x760 fs/read_write.c:510
[<ffffffff817ff650>] vfs_write+0x170/0x4e0 fs/read_write.c:560
[< inline >] SYSC_write fs/read_write.c:607
[<ffffffff81803a5b>] SyS_write+0xfb/0x230 fs/read_write.c:599
[<ffffffff84f47ec1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Indeed, there is a comment saying that the value of len is restricted
to a 16-bit integer, but the code doesn't actually do this.
This patch fixes the warning. It replaces the comment with a
computation that forces the amount of data copied from the user in
ep0_write() to be no larger than the wLength size for the control
transfer, which is a 16-bit quantity.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2 adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0994b0a257 upstream.
Andrey Konovalov reported that we were not properly checking the upper
limit before of a device configuration size before calling
memdup_user(), which could cause some problems.
So set the upper limit to PAGE_SIZE * 4, which should be good enough for
all devices.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bcdbeb8447 upstream.
The stop_activity() routine in dummy-hcd is supposed to unlink all
active requests for every endpoint, among other things. But it
doesn't handle ep0. As a result, fuzz testing can generate a WARNING
like the following:
WARNING: CPU: 0 PID: 4410 at drivers/usb/gadget/udc/dummy_hcd.c:672 dummy_free_request+0x153/0x170
Modules linked in:
CPU: 0 PID: 4410 Comm: syz-executor Not tainted 4.9.0-rc7+ #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88006a64ed10 ffffffff81f96b8a ffffffff41b58ab3 1ffff1000d4c9d35
ffffed000d4c9d2d ffff880065f8ac00 0000000041b58ab3 ffffffff8598b510
ffffffff81f968f8 0000000041b58ab3 ffffffff859410e0 ffffffff813f0590
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81f96b8a>] dump_stack+0x292/0x398 lib/dump_stack.c:51
[<ffffffff812b808f>] __warn+0x19f/0x1e0 kernel/panic.c:550
[<ffffffff812b831c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[<ffffffff830fcb13>] dummy_free_request+0x153/0x170 drivers/usb/gadget/udc/dummy_hcd.c:672
[<ffffffff830ed1b0>] usb_ep_free_request+0xc0/0x420 drivers/usb/gadget/udc/core.c:195
[<ffffffff83225031>] gadgetfs_unbind+0x131/0x190 drivers/usb/gadget/legacy/inode.c:1612
[<ffffffff830ebd8f>] usb_gadget_remove_driver+0x10f/0x2b0 drivers/usb/gadget/udc/core.c:1228
[<ffffffff830ec084>] usb_gadget_unregister_driver+0x154/0x240 drivers/usb/gadget/udc/core.c:1357
This patch fixes the problem by iterating over all the endpoints in
the driver's ep array instead of iterating over the gadget's ep_list,
which explicitly leaves out ep0.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e4da3fcf7 upstream.
By convention (according to doc) if function does not provide
get_alt() callback composite framework should assume that it has only
altsetting 0 and should respond with error if host tries to set
other one.
After commit dd4dff8b03 ("USB: composite: Fix bug: should test
set_alt function pointer before use it")
we started checking set_alt() callback instead of get_alt().
This check is useless as we check if set_alt() is set inside
usb_add_function() and fail if it's NULL.
Let's fix this check and move comment about why we check the get
method instead of set a little bit closer to prevent future false
fixes.
Fixes: dd4dff8b03 ("USB: composite: Fix bug: should test set_alt function pointer before use it")
Signed-off-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1d5f8ff80 upstream.
This patch removes BUG_ON() macro from mlx4_alloc_icm_coherent()
by checking DMA address alignment in advance and performing proper
folding in case of error.
Fixes: 5b0bf5e25e ("mlx4_core: Support ICM tables in coherent memory")
Reported-by: Ozgur Karatas <okaratas@member.fsf.org>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6496bbf0ec upstream.
Single send WQE in RX buffer should be stamped with software
ownership in order to prevent the flow of QP in error in FW
once UPDATE_QP is called.
Fixes: 9f519f68cf ('mlx4_en: Not using Shared Receive Queues')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e6afb1ad88 upstream.
Commit beb0babfb7 ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:
Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.
Fixes: beb0babfb7 ("korina: disable napi on close and restart")
Reported-by: Alexandros C. Couloumbis <alex@ozo.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 628185cfdd upstream.
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.
What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.
This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.
This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.
tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.
Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").
Fixes: 12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a91918cd3e upstream.
This iscsit_tpg_add_portal_group() function is only called from
lio_target_tiqn_addtpg(). Both functions free the "tpg" pointer on
error so it's a double free bug. The memory is allocated in the caller
so it should be freed in the caller and not here.
Fixes: e48354ce07 ("iscsi-target: Add iSCSI fabric support for target v4.1")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
[ bvanassche: Added "Fix" at start of patch title ]
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d2a145252c upstream.
A race between scanning and fc_remote_port_delete() may result in a
permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
and unblocked after. The reason is that blocking a device sets both the
SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED. However,
scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
device to be ignored by scsi_target_unblock() and thus never have its
QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
running but has a stopped queue.
We actually have two places where SDEV_RUNNING is set: once in
scsi_add_lun() which respects the blocked flag and once in
scsi_sysfs_add_sdev() which doesn't. Since the second set is entirely
spurious, simply remove it to fix the problem.
Reported-by: Zengxi Chen <chenzengxi@huawei.com>
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f2ce1c6af upstream.
It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early. In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN. So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.
This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished). If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover. This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.
Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile. Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work). For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing. We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.
For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.
Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy(). We must avoid the
following sequence:
ERP thread rport_work other context
------------------------- -------------- --------------------------------
port is unblocked, rport still blocked,
due to pending/running ERP action,
so ((port->status & ...UNBLOCK) != 0)
and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
zfcp_erp_port_reopen()
lock ERP
zfcp_erp_port_block()
port->status clear ...UNBLOCK
unlock ERP
zfcp_scsi_schedule_rport_block()
port->rport_task = RPORT_DEL
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task != RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_block()
if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
zfcp_scsi_rport_work()
(port->rport_task == RPORT_ADD)
port->rport_task = RPORT_NONE
zfcp_scsi_rport_register()
(port->rport == NULL)
rport = fc_remote_port_add()
port->rport = rport;
Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed. In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN. With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.
Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):
[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
* clear UNBLOCK ---------------------------------------+
zfcp_scsi_schedule_rports_block() |
write_lock_irqsave(&adapter->erp_lock, flags);-------+ |
zfcp_erp_action_enqueue() | |
zfcp_erp_setup_act() | |
* set ERP_INUSE -----------------------------------|--|--+
write_unlock_irqrestore(&adapter->erp_lock, flags);--+ | |
.context-switch. | |
zfcp_erp_thread() | |
zfcp_erp_strategy() | |
write_lock_irqsave(&adapter->erp_lock, flags);------+ | |
... | | |
zfcp_erp_strategy_check_target() | | |
zfcp_erp_strategy_check_adapter() | | |
zfcp_erp_adapter_unblock() | | |
* set UNBLOCK -----------------------------------|--+ |
zfcp_erp_action_dequeue() | |
* clear ERP_INUSE ---------------------------------|-----+
... |
write_unlock_irqrestore(&adapter->erp_lock, flags);-+
Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved. Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c48 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e1 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 56d23ed7ad upstream.
Since quite a while, Linux issues enough SCSI commands per scsi_device
which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
and SAM_STAT_GOOD. This floods the HBA trace area and we cannot see
other and important HBA trace records long enough.
Therefore, do not trace HBA response errors for pure benign residual
under counts at the default trace level.
This excludes benign residual under count combined with other validity
bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL. For all those other
cases, we still do want to see both the HBA record and the corresponding
SCSI record by default.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dac37e15b7 upstream.
When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
eh_target_reset_handler(), it expects us to relent the ownership over
the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
or target - when returning with SUCCESS from the callback ('release'
them). SCSI EH can then reuse those commands.
We did not follow this rule to release commands upon SUCCESS; and if
later a reply arrived for one of those supposed to be released commands,
we would still make use of the scsi_cmnd in our ingress tasklet. This
will at least result in undefined behavior or a kernel panic because of
a wrong kernel pointer dereference.
To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
*)->data in the matching scope if a TMF was successful. This is done
under the locks (struct zfcp_adapter *)->abort_lock and (struct
zfcp_reqlist *)->lock to prevent the requests from being removed from
the request-hashtable, and the ingress tasklet from making use of the
scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().
For cases where a reply arrives during SCSI EH, but before we get a
chance to NULLify the pointer - but before we return from the callback
-, we assume that the code is protected from races via the CAS operation
in blk_complete_request() that is called in scsi_done().
The following stacktrace shows an example for a crash resulting from the
previous behavior:
Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
Oops: 0038 [#1] SMP
CPU: 2 PID: 0 Comm: swapper/2 Not tainted
task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
Krnl Code: 00000000001156a2: a7190000 lghi %r1,0
00000000001156a6: a7380015 lhi %r3,21
#00000000001156aa: e32050000008 ag %r2,0(%r5)
>00000000001156b0: 482022b0 lh %r2,688(%r2)
00000000001156b4: ae123000 sigp %r1,%r2,0(%r3)
00000000001156b8: b2220020 ipm %r2
00000000001156bc: 8820001c srl %r2,28
00000000001156c0: c02700000001 xilf %r2,1
Call Trace:
([<0000000000000000>] 0x0)
[<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
[<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
[<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
[<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
[<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
[<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
[<0000000000141fd4>] tasklet_action+0x9c/0x170
[<0000000000141550>] __do_softirq+0xe8/0x258
[<000000000010ce0a>] do_softirq+0xba/0xc0
[<000000000014187c>] irq_exit+0xc4/0xe8
[<000000000046b526>] do_IRQ+0x146/0x1d8
[<00000000005d6a3c>] io_return+0x0/0x8
[<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
([<0000000000000000>] 0x0)
[<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
[<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
[<0000000000114782>] smp_start_secondary+0xda/0xe8
[<00000000005d6efe>] restart_int_handler+0x56/0x6c
[<0000000000000000>] 0x0
Last Breaking-Event-Address:
[<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0
Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d3a2418ee3 upstream.
This patch avoids that Coverity complains about not checking the
ib_find_pkey() return value.
Fixes: commit 547af76521 ("IB/multicast: Report errors on multicast groups if P_key changes")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fe2f378dd upstream.
The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:
Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
Fixes: commit b7ab0b19a8 ("IB/mad: Verify mgmt class in received MADs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bcc7f5b4be upstream.
bdev->bd_contains is not stable before calling __blkdev_get().
When __blkdev_get() is called on a parition with ->bd_openers == 0
it sets
bdev->bd_contains = bdev;
which is not correct for a partition.
After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
and then ->bd_contains is stable.
When FMODE_EXCL is used, blkdev_get() calls
bd_start_claiming() -> bd_prepare_to_claim() -> bd_may_claim()
This call happens before __blkdev_get() is called, so ->bd_contains
is not stable. So bd_may_claim() cannot safely use ->bd_contains.
It currently tries to use it, and this can lead to a BUG_ON().
This happens when a whole device is already open with a bd_holder (in
use by dm in my particular example) and two threads race to open a
partition of that device for the first time, one opening with O_EXCL and
one without.
The thread that doesn't use O_EXCL gets through blkdev_get() to
__blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
Immediately thereafter the other thread, using FMODE_EXCL, calls
bd_start_claiming() from blkdev_get(). This should fail because the
whole device has a holder, but because bdev->bd_contains == bdev
bd_may_claim() incorrectly reports success.
This thread continues and blocks on bd_mutex.
The first thread then sets bdev->bd_contains correctly and drops the mutex.
The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
again in:
BUG_ON(!bd_may_claim(bdev, whole, holder));
The BUG_ON fires.
Fix this by removing the dependency on ->bd_contains in
bd_may_claim(). As bd_may_claim() has direct access to the whole
device, it can simply test if the target bdev is the whole device.
Fixes: 6b4517a791 ("block: implement bd_claiming and claiming block")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5716863e0f upstream.
fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
sb->s_inodes list when iterating over all inodes. Furthermore the code has a
bug that if the current inode is the last on i_sb_list that does not have e.g.
I_FREEING set, then we leave next_i pointing to inode which may get removed
from the i_sb_list once we drop s_inode_list_lock thus resulting in
use-after-free issues (usually manifesting as infinite looping in
fsnotify_unmount_inodes()).
Fix the problem by keeping current inode pinned somewhat longer. Then we can
make the code much simpler and standard.
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c056fdc5b upstream.
After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.
AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd1b ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").
The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down. Pass 0 to facilitate backporting, and kill
it in the next commit.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5457e03de9 upstream.
The buffer for iucv_message_receive() needs to be below 2 GB. In
__iucv_message_receive(), the buffer address is casted to an u32, which
would result in either memory corruption or an addressing exception when
using addresses >= 2 GB.
Fix this by using GFP_DMA for the buffer allocation.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 601e6e3cc5 upstream.
The original code causes a static checker warning because it has a
continue inside a do { } while (0); loop. In that context, a continue
and a break are equivalent. The intent was to go back to the start of
the loop so the continue was a bug.
I've added a retry label at the start and changed the continue to a goto
retry. Then I removed the do { } while (0) loop and pulled the code in
one indent level.
Fixes: 2791c1a439 ("SPARC/LEON: added support for selecting Timer Core and Timer within core")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79e51b5c2d upstream.
Currently it is impossible to edit the value of a config symbol with a
prompt longer than (terminal width - 2) characters. dialog_inputbox()
calculates a negative x-offset for the input window and newwin() fails
as this is invalid. It also doesn't check for this failure, so it
busy-loops calling wgetch(NULL) which immediately returns -1.
The additions in the offset calculations also don't match the intended
size of the window.
Limit the window size and calculate the offset similarly to
show_scroll_win().
Fixes: 692d97c380 ("kconfig: new configuration interface (nconfig)")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
[bwh: Backported to 3.2: replaced code used LINES and COLS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e6e1ef48f upstream.
Don't load an inode with a negative size; this causes integer overflow
problems in the VFS.
[ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]
Fixes: a48380f769 (ext4: rename i_dir_acl to i_size_high)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: use EIO instead of EFSCORRUPTED]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0cf3ef5e0 upstream.
What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied. As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e36ce99ee0 upstream.
Module test reports:
temp1_max: Suspected overflow: [160000 vs. 0]
temp1_min: Suspected overflow: [160000 vs. 0]
This is seen because the values passed when writing temperature limits
are unbound.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Fixes: 6099469805 ("hwmon: Support for Dallas Semiconductor DS620")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b09eff0c3 upstream.
This patch adds support for PIDs 0x1040, 0x1041 of Telit LE922A.
Since the interface positions are the same than the ones used
for other Telit compositions, previous defined blacklists are used.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 82ffb6fc63 upstream.
The Logitech QuickCam Communicate Deluxe/S7500 microphone fails with the
following warning.
[ 6.778995] usb 2-1.2.2.2: Warning! Unlikely big volume range (=3072),
cval->res is probably wrong.
[ 6.778996] usb 2-1.2.2.2: [5] FU [Mic Capture Volume] ch = 1, val =
4608/7680/1
Adding it to the list of devices in volume_control_quirks makes it work
properly, fixing related typo.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 777c6e0dae upstream.
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following
[ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[ 145.122628] Call Trace:
[ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.
Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.
Fixes: 47e627bc8c ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
- The lockless (__-prefixed) variants don't exist
- Keep definition of cpu_notify_nofail() conditional on CONFIG_HOTPLUG_CPU]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2a7bf53f57 upstream.
If a log tree has a layout like the following:
leaf N:
...
item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
dir log end 1275809046
leaf N + 1:
item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
dir log end 18446744073709551615
...
When we pass the value 1275809046 + 1 as the parameter start_ret to the
function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
end up with path->slots[0] having the value 239 (points to the last item
of leaf N, item 240). Because the dir log item in that position has an
offset value smaller than *start_ret (1275809046 + 1) we need to move on
to the next leaf, however the logic for that is wrong since it compares
the current slot to the number of items in the leaf, which is smaller
and therefore we don't lookup for the next leaf but instead we set the
slot to point to an item that does not exist, at slot 240, and we later
operate on that slot which has unexpected content or in the worst case
can result in an invalid memory access (accessing beyond the last page
of leaf N's extent buffer).
So fix the logic that checks when we need to lookup at the next leaf
by first incrementing the slot and only after to check if that slot
is beyond the last item of the current leaf.
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: e02119d5a7 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6dff5b6705 upstream.
GCC 5 generates different code for this bootwrapper null check that
causes the PS3 to hang very early in its bootup. This check is of
limited value, so just get rid of it.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd74da957b upstream.
The function we are wrapping is named dma_alloc_noncoherent, and
not dma_alloc_non_coherent.
Fixes: 9ac7849e35 ("devres: device resource management")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c3dd1e058 upstream.
Function klsi_105_open() calls usb_control_msg() (to "enable read") and
checks its return value. When the return value is unexpected, it only
assigns the error code to the return variable retval, but does not
terminate the exception path. This patch fixes the bug by inserting
"goto err_generic_close;" when the call to usb_control_msg() fails.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Pan Bian <bianpan2016@163.com>
[johan: rebase on prerequisite fix and amend commit message]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6774d5f532 upstream.
Kill urbs and disable read before returning from open on failure to
retrieve the line state.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: replaced code was using dbg()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f37fabb864 upstream.
In the critical sysfs entry the thermal hwmon was returning wrong
temperature to the user-space. It was reporting the temperature of the
first trip point instead of the temperature of critical trip point.
For example:
/sys/class/hwmon/hwmon0/temp1_crit:50000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
/sys/class/thermal/thermal_zone0/trip_point_0_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
/sys/class/thermal/thermal_zone0/trip_point_3_type:critical
Since commit e68b16abd9 ("thermal: add hwmon sysfs I/F") the driver
have been registering a sysfs entry if get_crit_temp() callback was
provided. However when accessed, it was calling get_trip_temp() instead
of the get_crit_temp().
Fixes: e68b16abd9 ("thermal: add hwmon sysfs I/F")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4763601a56 upstream.
The function returns -EINVAL even if it builds the stream properly.
The bogus error code sneaked in during the code refactoring, but it
wasn't noticed until now since the returned error code itself is
ignored in anyway. Kill it here, but there is no behavior change by
this patch, obviously.
Fixes: e5779998bf ('ALSA: usb-audio: refactor code')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit daaadbf074 upstream.
Commit 2cbbb579bc ("regmap: Add the LZO cache support") introduced
'blksize' in regcache_lzo_read() and regcache_lzo_write(), that is
set but not used. Compiling with W=1 gives the following warnings,
fix them.
drivers/base/regmap/regcache-lzo.c: In function ‘regcache_lzo_read’:
drivers/base/regmap/regcache-lzo.c:239:9: warning: variable ‘blksize’ set but not used [-Wunused-but-set-variable]
size_t blksize, tmp_dst_len;
^
drivers/base/regmap/regcache-lzo.c: In function ‘regcache_lzo_write’:
drivers/base/regmap/regcache-lzo.c:278:9: warning: variable ‘blksize’ set but not used [-Wunused-but-set-variable]
size_t blksize, tmp_dst_len;
^
These are harmless warnings and are only being fixed to reduce the
noise with W=1 in the kernel.
Fixes: 2cbbb579bc ("regmap: Add the LZO cache support")
Cc: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Kirtika Ruchandani <kirtika@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d3d83ee20a upstream.
A recent cleanup had the right idea to remove the initialization
of the error variable, but missed the actual benefit of that,
which is that we get warnings if there is a bug in it. Now
we get a warning about a bug that was introduced by this cleanup:
drivers/media/platform/davinci/vpfe_capture.c: In function 'vpfe_probe':
drivers/media/platform/davinci/vpfe_capture.c:1992:9: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This adds the missing initialization that the warning is about,
and another one that was preexisting and that we did not get
a warning for. That second bug has existed since the driver
was first added.
Fixes: efb74461f5 ("[media] DaVinci-VPFE-Capture: Delete an unnecessary variable initialisation in vpfe_probe()")
Fixes: 7da8a6cb3e ("V4L/DVB (12248): v4l: vpfe capture bridge driver for DM355 and DM6446")
[mchehab@s-opensource.com: fix a merge conflict]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 265e9098ba upstream.
In crypt_set_key(), if a failure occurs while replacing the old key
(e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
set. Otherwise, the crypto layer would have an invalid key that still
has DM_CRYPT_KEY_VALID flag set.
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c48ae41baf upstream.
The commit "ext4: sanity check the block and cluster size at mount
time" should prevent any problems, but in case the superblock is
modified while the file system is mounted, add an extra safety check
to make sure we won't overrun the allocated buffer.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd6bb35bf7 upstream.
Centralize the checks for inodes_per_block and be more strict to make
sure the inodes_per_block_group can't end up being zero.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5aee0f8a3f upstream.
Fix a large number of problems with how we handle mount options in the
superblock. For one, if the string in the superblock is long enough
that it is not null terminated, we could run off the end of the string
and try to interpret superblocks fields as characters. It's unlikely
this will cause a security problem, but it could result in an invalid
parse. Also, parse_options is destructive to the string, so in some
cases if there is a comma-separated string, it would be modified in
the superblock. (Fortunately it only happens on file systems with a
1k block size.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8052d7245b upstream.
When there is a CRC error in the SPROM read from the device, the code
attempts to handle a fallback SPROM. When this also fails, the driver
returns zero rather than an error code.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af15769ffa upstream.
gcc-7 notices that the condition in mvs_94xx_command_active looks
suspicious:
drivers/scsi/mvsas/mv_94xx.c: In function 'mvs_94xx_command_active':
drivers/scsi/mvsas/mv_94xx.c:671:15: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context]
This was introduced when the mv_printk() statement got added, and leads
to the condition being ignored. This is probably harmless.
Changing '&&' to '&' makes the code look reasonable, as we check the
command bit before setting and printing it.
Fixes: a4632aae8b ("[SCSI] mvsas: Add new macros and functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30a9d7afe7 upstream.
The number of 'counters' elements needed in 'struct sg' is
super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
elements in the array. This is insufficient for block sizes >= 32k. In
such cases the memcpy operation performed in ext4_mb_seq_groups_show()
would cause stack memory corruption.
Fixes: c9de560ded
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 69e43e8cc9 upstream.
'border' variable is set to a value of 2 times the block size of the
underlying filesystem. With 64k block size, the resulting value won't
fit into a 16-bit variable. Hence this commit changes the data type of
'border' to 'unsigned int'.
Fixes: c9de560ded
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 99e5cde5ea upstream.
Make sure to drop any device reference taken by vio_find_node() when
adding and removing virtual I/O slots.
Fixes: 5eeb8c63a3 ("[PATCH] PCI Hotplug: rpaphp: Move VIO registration")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 815a7141c4 upstream.
Make sure to drop any reference taken by bus_find_device() when creating
devices during init and driver registration.
Fixes: 55347cc996 ("[POWERPC] ibmebus: Add device creation and bus probing based on of_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fe0f316816 upstream.
Make sure to drop any reference taken by bus_find_device() in the sysfs
callbacks that are used to create and destroy devices based on
device-tree entries.
Fixes: 6bccf755ff ("[POWERPC] ibmebus: dynamic addition/removal of adapters, some code cleanup")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d128af1787 upstream.
The AEAD givenc descriptor relies on moving the IV through the
output FIFO and then back to the CTX2 for authentication. The
SEQ FIFO STORE could be scheduled before the data can be
read from OFIFO, especially since the SEQ FIFO LOAD needs
to wait for the SEQ FIFO LOAD SKIP to finish first. The
SKIP takes more time when the input is SG than when it's
a contiguous buffer. If the SEQ FIFO LOAD is not scheduled
before the STORE, the DECO will hang waiting for data
to be available in the OFIFO so it can be transferred to C2.
In order to overcome this, first force transfer of IV to C2
by starting the "cryptlen" transfer first and then starting to
store data from OFIFO to the output buffer.
Fixes: 1acebad3d8 ("crypto: caam - faster aead implementation")
Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com>
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccdb6be9ec upstream.
The UHCI controllers in Intel chipsets rely on a platform-specific non-PME
mechanism for wakeup signalling. They can generate wakeup signals even
though they don't support PME.
We need to let the USB core know this so that it will enable runtime
suspend for UHCI controllers.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6496ebd7ed upstream.
One some systems, the firmware does not allow certain PCI devices to be put
in deep D-states. This can cause problems for wakeup signalling, if the
device does not support PME# in the deepest allowed suspend state. For
example, Pierre reports that on his system, ACPI does not permit his xHCI
host controller to go into D3 during runtime suspend -- but D3 is the only
state in which the controller can generate PME# signals. As a result, the
controller goes into runtime suspend but never wakes up, so it doesn't work
properly. USB devices plugged into the controller are never detected.
If the device relies on PME# for wakeup signals but is not capable of
generating PME# in the target state, the PCI core should accurately report
that it cannot do wakeup from runtime suspend. This patch modifies the
pci_dev_run_wake() routine to add this check.
Reported-by: Pierre de Villemereuil <flyos@mailoo.org>
Tested-by: Pierre de Villemereuil <flyos@mailoo.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4dfce57db6 upstream.
There have been several reports over the years of NULL pointer
dereferences in xfs_trans_log_inode during xfs_fsr processes,
when the process is doing an fput and tearing down extents
on the temporary inode, something like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
PID: 29439 TASK: ffff880550584fa0 CPU: 6 COMMAND: "xfs_fsr"
[exception RIP: xfs_trans_log_inode+0x10]
#9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
#10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
#11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
#12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
#13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
#14 [ffff8800a57bbe00] evict at ffffffff811e1b67
#15 [ffff8800a57bbe28] iput at ffffffff811e23a5
#16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
#17 [ffff8800a57bbe88] dput at ffffffff811dd06c
#18 [ffff8800a57bbea8] __fput at ffffffff811c823b
#19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
#20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
#21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
#22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
As it turns out, this is because the i_itemp pointer, along
with the d_ops pointer, has been overwritten with zeros
when we tear down the extents during truncate. When the in-core
inode fork on the temporary inode used by xfs_fsr was originally
set up during the extent swap, we mistakenly looked at di_nextents
to determine whether all extents fit inline, but this misses extents
generated by speculative preallocation; we should be using if_bytes
instead.
This mistake corrupts the in-memory inode, and code in
xfs_iext_remove_inline eventually gets bad inputs, causing
it to memmove and memset incorrect ranges; this became apparent
because the two values in ifp->if_u2.if_inline_ext[1] contained
what should have been in d_ops and i_itemp; they were memmoved due
to incorrect array indexing and then the original locations
were zeroed with memset, again due to an array overrun.
Fix this by properly using i_df.if_bytes to determine the number
of extents, not di_nextents.
Thanks to dchinner for looking at this with me and spotting the
root cause.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 328cf6927b upstream.
If CONFIG_ETRAX_AXISFLASHMAP is not configured, the flash rescue image
object file is empty. With recent versions of binutils, this results
in the following build error.
cris-linux-objcopy: error:
the input file 'arch/cris/boot/rescue/rescue.o' has no sections
This is seen, for example, when trying to build cris:allnoconfig
with recently generated toolchains.
Since it does not make sense to build a flash rescue image if there is
no flash, only build it if CONFIG_ETRAX_AXISFLASHMAP is enabled.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 66ab3a74c5 ("CRIS: Merge machine dependent boot/compressed ..")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8f29bb719 upstream.
usb_endpoint_maxp() returns wMaxPacketSize in its
raw form. Without taking into consideration that it
also contains other bits reserved for isochronous
endpoints.
This patch fixes one occasion where this is a
problem by making sure that we initialize
ep->maxpacket only with lower 10 bits of the value
returned by usb_endpoint_maxp(). Note that seperate
patches will be necessary to audit all call sites of
usb_endpoint_maxp() and make sure that
usb_endpoint_maxp() only returns lower 10 bits of
wMaxPacketSize.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7ec03e60ef upstream.
Function ite_set_carrier_params() uses variable use_demodulator after
having initialized it to false in some if branches, but this variable is
never set to true otherwise.
This bug has been found using clang -Wsometimes-uninitialized warning
flag.
Fixes: 620a32bba4 ("[media] rc: New rc-based ite-cir driver for
several ITE CIRs")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d65f2fa680 upstream.
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.
This is incorrect for zero VID vlan accel packets, making the following
match fail:
tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...
Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b2 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
adapted).
Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.
Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used")
Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: s/skb_vlan_tag/vlan_tx_tag/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b321a38d24 upstream.
The oversampling ratio is controlled using the oversampling pins,
OS [2:0] with OS2 being the MSB control bit, and OS0 the LSB control
bit.
The gpio connected to the OS2 pin is not being set correctly, only OS0
and OS1 pins are being set. Fix the typo to allow proper control of the
oversampling pins.
Signed-off-by: Eva Rachel Retuya <eraretuya@gmail.com>
Fixes: b9618c0 ("staging: IIO: ADC: New driver for AD7606/AD7606-6/AD7606-4")
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccf7abb93a upstream.
Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.
__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.
This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.
Again, this gem was found by syzkaller tool.
Fixes: 9c55e01c0c ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5edabca9d4 upstream.
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.
However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.
Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.
Similar fixes for TCP:
fb7e2399ec [TCP]: skb is unexpectedly freed.
0aea76d35c tcp: SYN packets are now
simply consumed
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 146cc8a17a upstream.
The current implementation failed to detect short transfers when
attempting to read the line state, and also, to make things worse,
logged the content of the uninitialised heap transfer buffer.
Fixes: abf492e7b3 ("USB: kl5kusb105: fix DMA buffers on stack")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef85b67385 upstream.
When L2 exits to L0 due to "exception or NMI", software exceptions
(#BP and #OF) for which L1 has requested an intercept should be
handled by L1 rather than L0. Previously, only hardware exceptions
were forwarded to L1.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3aa02cb664 upstream.
Currently kill_fasync() is called outside the stream lock in
snd_pcm_period_elapsed(). This is potentially racy, since the stream
may get released even during the irq handler is running. Although
snd_pcm_release_substream() calls snd_pcm_drop(), this doesn't
guarantee that the irq handler finishes, thus the kill_fasync() call
outside the stream spin lock may be invoked after the substream is
detached, as recently reported by KASAN.
As a quick workaround, move kill_fasync() call inside the stream
lock. The fasync is rarely used interface, so this shouldn't have a
big impact from the performance POV.
Ideally, we should implement some sync mechanism for the proper finish
of stream and irq handler. But this oneliner should suffice for most
cases, so far.
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b98b0bc8c4 upstream.
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...
Note that before commit 8298193012 ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.
This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0ac402cfc upstream.
Both damn things interpret userland pointers embedded into the payload;
worse, they are actually traversing those. Leaving aside the bad
API design, this is very much _not_ safe to call with KERNEL_DS.
Bail out early if that happens.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bf911e985d upstream.
Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.
The fix is to just move the check upwards so it's also validated for the
1st chunk.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: moved code is slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Not upstream as it is not needed there.
So a patch something like this might be a safe way to fix the
potential infoleak in older kernels.
THIS IS UNTESTED. It's a very obvious patch, though, so if it compiles
it probably works. It just initializes the output variable with 0 in
the inline asm description, instead of doing it in the exception
handler.
It will generate slightly worse code (a few unnecessary ALU
operations), but it doesn't have any interactions with the exception
handler implementation.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ac6e780070 upstream.
With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()
Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.
We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq
Many thanks to syzkaller team and Marco for giving us a reproducer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4f0c40d944 upstream.
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.
Fixes: 7c657876b6 ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f4979fcea7 upstream.
Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.
Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.
Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 50220dead1 upstream.
Plugging a Logitech DJ receiver with KASAN activated raises a bunch of
out-of-bound readings.
The fields are allocated up to MAX_USAGE, meaning that potentially, we do
not have enough fields to fit the incoming values.
Add checks and silence KASAN.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 321027c1fe upstream.
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.
The problem is exactly that described in commit:
f63a8daa58 ("perf: Fix event->ctx locking")
... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.
That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.
So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).
Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.
Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: f63a8daa58 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- Use ACCESS_ONCE() instead of READ_ONCE()
- Test perf_event::group_flags instead of group_caps
- Add the err_locked cleanup block, which we didn't need before
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5cd3f5affa upstream.
Since commit c9a4962881 ("nfsd:
make client_lock per net") compiling nfs4state.o without
CONFIG_LOCKDEP set, triggers this GCC warning:
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]
The cause of that warning is that lockdep_assert_held() compiles
away if CONFIG_LOCKDEP is not set. Silence this warning by using
the argument to lockdep_assert_held() as a nop if CONFIG_LOCKDEP
is not set.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Link: http://lkml.kernel.org/r/1359060797.1325.33.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 724b6daa13 upstream.
In perf_event_for_each() we call a function on an event, and then
iterate over the siblings of the event.
However we don't call the function on the siblings, we call it
repeatedly on the original event - it seems "obvious" that we should
be calling it with sibling as the argument.
It looks like this broke in commit 75f937f24b ("Fix ctx->mutex
vs counter->mutex inversion").
The only effect of the bug is that the PERF_IOC_FLAG_GROUP parameter
to the ioctls doesn't work.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334109253-31329-1-git-send-email-michael@ellerman.id.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd98e9419d upstream.
Commit 79901317ce ("n_tty: Don't flush buffer when closing ldisc"),
first merged in kernel release 3.10, caused the following regression
in the Gigaset M101 driver:
Before that commit, when closing the N_TTY line discipline in
preparation to switching to N_GIGASET_M101, receive_room would be
reset to a non-zero value by the call to n_tty_flush_buffer() in
n_tty's close method. With the removal of that call, receive_room
might be left at zero, blocking data reception on the serial line.
The present patch fixes that regression by setting receive_room
to an appropriate value in the ldisc open method.
Fixes: 79901317ce ("n_tty: Don't flush buffer when closing ldisc")
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3951a3709 upstream.
In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().
Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.
This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1420!
Call Trace:
[<ffffffff81281eab>] blk_put_request+0x5b/0x80
[<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
[<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
[<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
[<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
[<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
[<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
[<ffffffff81258967>] ? file_has_perm+0x97/0xb0
[<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
[<ffffffff81602afb>] tracesys+0xdd/0xe2
RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0
The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.
Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.
KASAN was extremely helpful in finding the root cause of this bug.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2:
- sg_finish_rem_req() would not free srp->rq->cmd so don't do it here either
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ea1ec713f upstream.
DMA mapping permissions were being derived from pgprot_kernel directly
without using PAGE_KERNEL. This causes them to be marked with executable
permission, which is not what we want. Fix this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8298193012 upstream.
Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to
match !!sock_flag()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 93a97c50cb upstream.
If we can't allocate the resources in gigaset_initdriver() then we
should return -ENOMEM instead of zero.
Fixes: 2869b23e4b ("[PATCH] drivers/isdn/gigaset: new M101 driver (v2)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 332b05ca7a upstream.
This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
are not prevented resulting in a warning.
Reference: https://lkml.org/lkml/2016/12/2/230
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c01638f5d9 upstream.
Basically, the pjdfstests set the ownership of a file to 06555, and then
chowns it (as root) to a new uid/gid. Prior to commit a09f99edde ("fuse:
fix killing s[ug]id in setattr"), fuse would send down a setattr with both
the uid/gid change and a new mode. Now, it just sends down the uid/gid
change.
Technically this is NOTABUG, since POSIX doesn't _require_ that we clear
these bits for a privileged process, but Linux (wisely) has done that and I
think we don't want to change that behavior here.
This is caused by the use of should_remove_suid(), which will always return
0 when the process has CAP_FSETID.
In fact we really don't need to be calling should_remove_suid() at all,
since we've already been indicated that we should remove the suid, we just
don't want to use a (very) stale mode for that.
This patch should fix the above as well as simplify the logic.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: a09f99edde ("fuse: fix killing s[ug]id in setattr")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c823abac17 upstream.
When we unload the ep93xx_eth, whether we have opened the network
interface or not, we will either hit a kernel paging request error, or a
simple NULL pointer de-reference because:
- if ep93xx_open has been called, we have created a valid DMA mapping
for ep->descs, when we call ep93xx_stop, we also call
ep93xx_free_buffers, ep->descs now has a stale value
- if ep93xx_open has not been called, we have a NULL pointer for
ep->descs, so performing any operation against that address just won't
work
Fix this by adding a NULL pointer check for ep->descs which means that
ep93xx_free_buffers() was able to successfully tear down the descriptors
and free the DMA cookie as well.
Fixes: 1d22e05df8 ("[PATCH] Cirrus Logic ep93xx ethernet driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0eab121ef8 upstream.
Prior to commit c0371da604 ("put iov_iter into msghdr") in v3.19, there
was no check that the iovec contained enough bytes for an ICMP header,
and the read loop would walk across neighboring stack contents. Since the
iov_iter conversion, bad arguments are noticed, but the returned error is
EFAULT. Returning EINVAL is a clearer error and also solves the problem
prior to v3.19.
This was found using trinity with KASAN on v3.18:
BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0
Read of size 8 by task trinity-c2/9623
page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15
Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT)
Call trace:
[<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90
[<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50
[< inline >] print_address_description mm/kasan/report.c:147
[< inline >] kasan_report_error mm/kasan/report.c:236
[<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259
[< inline >] check_memory_region mm/kasan/kasan.c:264
[<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507
[<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15
[< inline >] memcpy_from_msg include/linux/skbuff.h:2667
[<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674
[<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714
[<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749
[< inline >] __sock_sendmsg_nosec net/socket.c:624
[< inline >] __sock_sendmsg net/socket.c:632
[<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643
[< inline >] SYSC_sendto net/socket.c:1797
[<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761
CVE-2016-8399
Reported-by: Qidan He <i@flanker017.me>
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: only ICMPv4 is supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3de81b7588 upstream.
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes: b97bf3fd8f ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust filename, context
- Duplicate macro definitions in bearer.h to avoid mutual inclusion
- NETDEV_CHANGEMTU and NETDEV_CHANGEADDR cases in net notifier were combined
- Drop changes in udp_media.c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 84ac726023 upstream.
When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.
This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.
The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dbb26055de upstream.
David reported a futex/rtmutex state corruption. It's caused by the
following problem:
CPU0 CPU1 CPU2
l->owner=T1
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T2)
boost()
unlock(l->wait_lock)
schedule()
rt_mutex_lock(l)
lock(l->wait_lock)
l->owner = T1 | HAS_WAITERS;
enqueue(T3)
boost()
unlock(l->wait_lock)
schedule()
signal(->T2) signal(->T3)
lock(l->wait_lock)
dequeue(T2)
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
dequeue(T3)
===> wait list is now empty
deboost()
unlock(l->wait_lock)
lock(l->wait_lock)
fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
l->owner = owner
==> l->owner = T1
}
lock(l->wait_lock)
rt_mutex_unlock(l) fixup_rt_mutex_waiters()
if (wait_list_empty(l)) {
owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
===> Success (l->owner = NULL)
l->owner = owner
==> l->owner = T1
}
That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.
This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().
The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.
It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.
Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.
Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2117d5398c upstream.
em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.
We could have fixed that by always saving and restoring the CS, but we
take a simpler approach and just break any guest that manages to fail
as the error recovery is error-prone and modern CPUs don't need emulator
for this.
Found by syzkaller:
WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
Kernel panic - not syncing: panic_on_warn set ...
CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] panic+0x1b7/0x3a3 kernel/panic.c:179
[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
[...] complete_emulated_io arch/x86/kvm/x86.c:6870
[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
[...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: d1442d85cc ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1650b4ebc9 upstream.
Function user_notifier_unregister should be called only once for each
registered user notifier.
Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.
Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Fixes: 18863bdd60 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fcd2042e8d upstream.
SSIDs aren't guaranteed to be 0-terminated. Let's cap the max length
when we print them out.
This can be easily noticed by connecting to a network with a 32-octet
SSID:
[ 3903.502925] mwifiex_pcie 0000:01:00.0: info: trying to associate to
'0123456789abcdef0123456789abcdef <uninitialized mem>' bssid
xx:xx:xx:xx:xx:xx
Fixes: 5e6e3a92b9 ("wireless: mwifiex: initial commit for Marvell mwifiex driver")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b810a242c upstream.
The real QP is destroyed in case of the ref count reaches zero, but
for XRC target QPs this call was missed and caused to QP leaks.
Let's call to destroy for all flows.
Fixes: 0e0ec7e063 ('RDMA/core: Export ib_open_qp() to share XRC...')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ab13292d7 upstream.
The BRIM Brothers Zone DPMX is a bicycle powermeter. This ID is for the USB
serial interface in its charging dock for the control pods, via which some
settings for the pods can be modified.
Signed-off-by: Paul Jakma <paul@jakma.org>
Cc: Barry Redmond <barry@brimbrothers.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 722f191080 upstream.
Make sure to drop the reference taken by bus_find_device_by_name()
before returning from mfd_clone_cell().
Fixes: a9bbba9963 ("mfd: add platform_device sharing support for mfd")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ce9d2272b upstream.
Some code (all error handling) submits CDBs that are allocated
on the stack. This breaks with CB/CBI code that tries to create
URB directly from SCSI command buffer - which happens to be in
vmalloced memory with vmalloced kernel stacks.
Let's make copy of the command in usb_stor_CB_transport.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba13e98f2c upstream.
When receiving a nec repeat, ensure the correct scancode is repeated
rather than a random value from the stack. This removes the need for
the bogus uninitialized_var() and also fixes the warnings:
drivers/media/usb/dvb-usb/dib0700_core.c: In function ‘dib0700_rc_urb_completion’:
drivers/media/usb/dvb-usb/dib0700_core.c:679: warning: ‘protocol’ may be used uninitialized in this function
[sean addon: So after writing the patch and submitting it, I've bought the
hardware on ebay. Without this patch you get random scancodes
on nec repeats, which the patch indeed fixes.]
Signed-off-by: Sean Young <sean@mess.org>
Tested-by: Sean Young <sean@mess.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70d78fe7c8 upstream.
It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.
Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation. So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.
Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.
Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f567e950bf upstream.
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.
This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.
Fixes: c7ac8679be ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e5ec1759d upstream.
This patch will fix regression caused by commit 1e793f6fc0 ("scsi:
megaraid_sas: Fix data integrity failure for JBOD (passthrough)
devices").
The problem was that the MEGASAS_IS_LOGICAL macro did not have braces
and as a result the driver ended up exposing a lot of non-existing SCSI
devices (all SCSI commands to channels 1,2,3 were returned as
SUCCESS-DID_OK by driver).
[mkp: clarified patch description]
Fixes: 1e793f6fc0
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Tested-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f91346e8b5 upstream.
An interrupt may occur right after devm_request_irq() is called and
prior to the spinlock initialization, leading to a kernel oops,
as the interrupt handler uses the spinlock.
In order to prevent this problem, move the spinlock initialization
prior to requesting the interrupts.
Fixes: e4243f13d1 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9bfef729a3 upstream.
This patch adds support for the TI CC3200 LaunchPad board, which uses a
custom USB vendor ID and product ID. Channel A is used for JTAG, and
channel B is used for a UART.
Signed-off-by: Doug Brown <doug@schmorgal.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 147b36d5b7 upstream.
Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.
The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:
i2c_register_driver() i2c_del_adapter()
driver_register() ...
bus_add_driver() ...
... bus_for_each_drv(..., __process_removed_adapter)
... i2c_do_del_adapter()
... list_for_each_entry_safe(..., &driver->clients, ...)
INIT_LIST_HEAD(&driver->clients);
To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().
The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:
% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 17 [#1] SMP ARM
CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: e5ada400 task.stack: e4936000
PC is at i2c_do_del_adapter+0x20/0xcc
LR is at __process_removed_adapter+0x14/0x1c
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 35bd004a DAC: 00000051
Process sh (pid: 533, stack limit = 0xe4936210)
Stack: (0xe4937d28 to 0xe4938000)
Backtrace:
[<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
[<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
[<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
[<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
[<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
[<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
[<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
[<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
[<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
[<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
[<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
[<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
[<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
[<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
[<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cf0ea4da4c upstream.
Like many similar devices it needs a quirk to work.
Issuing the request gets the device into an irrecoverable state.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 849eca7b9d upstream.
Like other KVM switches, the Aten DVI KVM switch needs a quirk to avoid spewing
errors:
[791759.606542] usb 1-5.4: input irq status -75 received
[791759.614537] usb 1-5.4: input irq status -75 received
[791759.622542] usb 1-5.4: input irq status -75 received
Add it.
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9300a4b7b upstream.
RFC 2734 defines the datagram_size field in fragment encapsulation
headers thus:
datagram_size: The encoded size of the entire IP datagram. The
value of datagram_size [...] SHALL be one less than the value of
Total Length in the datagram's IP header (see STD 5, RFC 791).
Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got
this field with a -/+1 offset:
ether1394_tx() /* transmit */
ether1394_encapsulate_prep()
hdr->ff.dg_size = dg_size - 1;
ether1394_data_handler() /* receive */
if (hdr->common.lf == ETH1394_HDR_LF_FF)
dg_size = hdr->ff.dg_size + 1;
else
dg_size = hdr->sf.dg_size + 1;
Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500
byte sized datagrams in fragments with datagram_size=1499 if link
fragmentation is required.
Only firewire-net sets and gets datagram_size without this offset. The
result is lacking interoperability of firewire-net with OS X, Windows
XP, and presumably Linux' eth1394. (I did not test with the latter.)
For example, FTP data transfers to a Linux firewire-net box with max_rec
smaller than the 1500 bytes MTU
- from OS X fail entirely,
- from Win XP start out with a bunch of fragmented datagrams which
time out, then continue with unfragmented datagrams because Win XP
temporarily reduces the MTU to 576 bytes.
So let's fix firewire-net's datagram_size accessors.
Note that firewire-net thereby loses interoperability with unpatched
firewire-net, but only if link fragmentation is employed. (This happens
with large broadcast datagrams, and with large datagrams on several
FireWire CardBus cards with smaller max_rec than equivalent PCI cards,
and it can be worked around by setting a small enough MTU.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6ed518328d upstream.
We have one critical section in the syscall entry path in which we switch from
the userspace stack to kernel stack. In the event of an external interrupt, the
interrupt code distinguishes between those two states by analyzing the value of
sr7. If sr7 is zero, it uses the kernel stack. Therefore it's important, that
the value of sr7 is in sync with the currently enabled stack.
This patch now disables interrupts while executing the critical section. This
prevents the interrupt handler to possibly see an inconsistent state which in
the worst case can lead to crashes.
Interestingly, in the syscall exit path interrupts were already disabled in the
critical section which switches back to the userspace stack.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23f4ffedb7 upstream.
skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.
This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: apply to ip6_tnl_xmit2()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6124b409c upstream.
This subsystem consistently fails to drop the device reference taken by
class_find_device().
Note that some of these lookup functions already take a reference to the
returned data, while others claim no reference is needed (or does not
seem need one).
Fixes: 183b9b592a ("uwb: add the UWB stack (core files)")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Drop change to uwb_rc_class_device_exists()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd9afd3cbe upstream.
According to Dave Miller "the networking stack has a
hard requirement that all SKBs which are transmitted
must have their completion signalled in a fininte
amount of time. This is because, until the SKB is
freed by the driver, it holds onto socket,
netfilter, and other subsystem resources."
In summary, this means that using TX IRQ throttling
for the networking gadgets is, at least, complex and
we should avoid it for the time being.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f89c56ce71 upstream.
Similar to commit c146066ab8 ("ipv4: Don't use ufo handling on later
transformed packets"), don't perform UFO on packets that will be IPsec
transformed. To detect it we rely on the fact that headerlen in
dst_entry is non-zero only for transformation bundles (xfrm_dst
objects).
Unwanted segmentation can be observed with a NETIF_F_UFO capable device,
such as a dummy device:
DEV=dum0 LEN=1493
ip li add $DEV type dummy
ip addr add fc00::1/64 dev $DEV nodad
ip link set $DEV up
ip xfrm policy add dir out src fc00::1 dst fc00::2 \
tmpl src fc00::1 dst fc00::2 proto esp spi 1
ip xfrm state add src fc00::1 dst fc00::2 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
tcpdump -n -nn -i $DEV -t &
socat /dev/zero,readbytes=$LEN udp6:[fc00::2]:$LEN
tcpdump output before:
IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
IP6 fc00::1 > fc00::2: frag (1448|48)
IP6 fc00::1 > fc00::2: ESP(spi=0x00000001,seq=0x2), length 88
... and after:
IP6 fc00::1 > fc00::2: frag (0|1448) ESP(spi=0x00000001,seq=0x1), length 1448
IP6 fc00::1 > fc00::2: frag (1448|80)
Fixes: e89e9cf539 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8d59de8f7b upstream.
Currently there is a race between incoming traffic and
initialization flow. HW is able to receive the packets
after INIT_PORT is done and unicast steering is configured.
Before we set priv->port_up NAPI is not scheduled and
receive queues become full. Therefore we never get
new interrupts about the completions.
This issue could happen if running heavy traffic during
bringing port up.
The resolution is to schedule NAPI once port_up is set.
If receive queues were full this will process all cqes
and release them.
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: mlx4_en_priv::rx_cq is an array of structs not pointers]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd768e1466 upstream.
vcpu->arch.wbinvd_dirty_mask may still be used after freeing it,
corrupting memory. For example, the following call trace may set a bit
in an already freed cpu mask:
kvm_arch_vcpu_load
vcpu_load
vmx_free_vcpu_nested
vmx_free_vcpu
kvm_arch_vcpu_free
Fix this by deferring freeing of wbinvd_dirty_mask.
Signed-off-by: Ido Yariv <ido@wizery.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 62e931fac4 upstream.
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.
The shortcut
if (size > atomic_read(&chunk->avail))
continue;
makes the loop skip over chunks that do not have enough bytes left to
fulfill the request. There are two situations, though, where an
allocation might still fail:
(1) The available memory is not contiguous, i.e. the request cannot
be fulfilled due to external fragmentation.
(2) A race condition. Another thread runs the same code concurrently
and is quicker to grab the available memory.
In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed. This return value is then assigned to
start_bit. The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched. Today, the code fails to reset start_bit to 0. As a
result, prefixes of subsequent chunks are ignored. Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.
Fixes: 7f184275aa ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 444f901742 upstream.
on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
Content-Length: 0
will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.
Confirm with RFC 3261
Request-Line = Method SP Request-URI SP SIP-Version CRLF
Fixes: 30f33e6dee ("[NETFILTER]: nf_conntrack_sip: support method specific request/response handling")
Signed-off-by: Ulrich Weber <ulrich.weber@riverbed.com>
Acked-by: Marco Angaroni <marcoangaroni@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 42acfc6615 upstream.
In csi_J(3), the third parameter of scr_memsetw (vc_screenbuf_size) is
divided by 2 inappropriatelly. But scr_memsetw expects size, not
count, because it divides the size by 2 on its own before doing actual
memset-by-words.
So remove the bogus division.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Petr Písař <ppisar@redhat.com>
Fixes: f8df13e0a9 (tty: Clean console safely)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bf7dc8443 upstream.
The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.
Ensure that cache flushes are passed to the controller.
[mkp: applied by hand and removed unused vars]
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34eee70a7b upstream.
The ad5933_i2c_read function returns an error code to indicate
whether it could read data or not. However ad5933_work() ignores
this return code and just accesses the data unconditionally,
which gets detected by gcc as a possible bug:
drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]
This adds minimal error handling so we only evaluate the
data if it was correctly read.
Link: https://patchwork.kernel.org/patch/8110281/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 407a3aee6e upstream.
The host keeps sending heartbeat packets independent of the
guest responding to them. Even though we respond to the heartbeat messages at
interrupt level, we can have situations where there maybe multiple heartbeat
messages pending that have not been responded to. For instance this occurs when the
VM is paused and the host continues to send the heartbeat messages.
Address this issue by draining and responding to all
the heartbeat messages that maybe pending.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1e793f6fc0 upstream.
Commit 02b01e010a ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.
[mkp: clarified patch description]
Fixes: 02b01e010a
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dafa724bf5 upstream.
dm_get_target_type() was previously called so any error returned from
dm_table_add_target() must first call dm_put_target_type(). Otherwise
the DM target module's reference count will leak and the associated
kernel module will be unable to be removed.
Also, leverage the fact that r is already -EINVAL and remove an extra
newline.
Fixes: 36a0456 ("dm table: add immutable feature")
Fixes: cc6cbe1 ("dm table: add always writeable feature")
Fixes: 3791e2f ("dm table: add singleton feature")
Signed-off-by: tang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2: adjuat context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80f23935ca upstream.
PowerPC's "cmp" instruction has four operands. Normally people write
"cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently
people forget, and write "cmp" with just three operands.
With older binutils this is silently accepted as if this was "cmpw",
while often "cmpd" is wanted. With newer binutils GAS will complain
about this for 64-bit code. For 32-bit code it still silently assumes
"cmpw" is what is meant.
In this instance the code comes directly from ISA v2.07, including the
cmp, but cmpd is correct. Backport to stable so that new toolchains can
build old kernels.
Fixes: 948cf67c47 ("powerpc: Add NAP mode support on Power7 in HV mode")
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c83ed4c9db upstream.
If UBIFS is facing an error while walking a directory, it reports this
error and ubifs_readdir() returns the error code. But the VFS readdir
logic does not make the getdents system call fail in all cases. When the
readdir cursor indicates that more entries are present, the system call
will just return and the libc wrapper will try again since it also
knows that more entries are present.
This causes the libc wrapper to busy loop for ever when a directory is
corrupted on UBIFS.
A common approach do deal with corrupted directory entries is
skipping them by setting the cursor to the next entry. On UBIFS this
approach is not possible since we cannot compute the next directory
entry cursor position without reading the current entry. So all we can
do is setting the cursor to the "no more entries" position and make
getdents exit.
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 843741c577 upstream.
When the operation fails we also have to undo the changes
we made to ->xattr_names. Otherwise listxattr() will report
wrong lengths.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da25311c7c upstream.
The Schenker XMG C504 is a rebranded Gigabyte P35 v2 laptop.
Therefore it also needs a keyboard reset to detect the Elantech touchpad.
Otherwise the touchpad appears to be dead.
With this patch the touchpad is detected:
$ dmesg | grep -E "(i8042|Elantech|elantech)"
[ 2.675399] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[ 2.680372] i8042: Attempting to reset device connected to KBD port
[ 2.789037] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 2.791586] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 2.813840] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
[ 3.811431] psmouse serio1: elantech: assuming hardware version 4 (with firmware version 0x361f0e)
[ 3.825424] psmouse serio1: elantech: Synaptics capabilities query result 0x00, 0x15, 0x0f.
[ 3.839424] psmouse serio1: elantech: Elan sample query result 03, 58, 74
[ 3.911349] input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio1/input/input6
Signed-off-by: Patrick Scheuring <patrick.scheuring.dev@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a2ed0b391d upstream.
When isofs_mount() is called to mount a device read-write, it returns
EACCES even before it checks that the device actually contains an isofs
filesystem. This may confuse mount(8) which then tries to mount all
subsequent filesystem types in read-only mode.
Fix the problem by returning EACCES only once we verify that the device
indeed contains an iso9660 filesystem.
Fixes: 17b7f7cf58
Reported-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7cb08e894 upstream.
We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function. Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.
Fixes: aceeffbb59 ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea720935cf upstream.
In mac80211, multicast A-MSDUs are accepted in many cases that
they shouldn't be accepted in:
* drop A-MSDUs with a multicast A1 (RA), as required by the
spec in 9.11 (802.11-2012 version)
* drop A-MSDUs with a 4-addr header, since the fourth address
can't actually be useful for them; unless 4-address frame
format is actually requested, even though the fourth address
is still not useful in this case, but ignored
Accepting the first case, in particular, is very problematic
since it allows anyone else with possession of a GTK to send
unicast frames encapsulated in a multicast A-MSDU, even when
the AP has client isolation enabled.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a34439e5a upstream.
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 50d2e6dc1f upstream.
The cipher block size for GCM is 16 bytes, and thus the CTR transform
used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
the code currently reserves only 8 bytes for the IV, causing
an out-of-bounds access in the CTR transform. This patch fixes
the issue by setting the size of the IV buffer to 16 bytes.
Fixes: 84c9115230 ("[CRYPTO] gcm: Add support for async ciphers")
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb3ae6d25a upstream.
Make sure userspace filesystem is returning a well formed list of xattr
names (zero or more nonzero length, null terminated strings).
[Michael Theall: only verify in the nonzero size case]
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a09f99edde upstream.
Fuse allowed VFS to set mode in setattr in order to clear suid/sgid on
chown and truncate, and (since writeback_cache) write. The problem with
this is that it'll potentially restore a stale mode.
The poper fix would be to let the filesystems do the suid/sgid clearing on
the relevant operations. Possibly some are already doing it but there's no
way we can detect this.
So fix this by refreshing and recalculating the mode. Do this only if
ATTR_KILL_S[UG]ID is set to not destroy performance for writes. This is
still racy but the size of the window is reduced.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e2b8828ff upstream.
Without "default_permissions" the userspace filesystem's lookup operation
needs to perform the check for search permission on the directory.
If directory does not allow search for everyone (this is quite rare) then
userspace filesystem has to set entry timeout to zero to make sure
permissions are always performed.
Changing the mode bits of the directory should also invalidate the
(previously cached) dentry to make sure the next lookup will have a chance
of updating the timeout, if needed.
Reported-by: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Open-code d_is_dir()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6cd997db91 upstream.
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines that were _just_ short enough that the RA order still fit in
the line buffer, the line was instead padded with an insufficient
amount of spaces. This was caused by examining the size of the
allocated line buffer rather than the length of the string to be
displayed.
For con3270_cline_end(), we just compare against the line length. For
con3270_update_string() however that isn't available anymore, so we
check whether the Repeat to Address order is present.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git)
Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c14f2aac7a upstream.
con3270 contains an optimisation that reduces the amount of data to be
transmitted to the 3270 terminal by putting a Repeat to Address (RA)
order into the data stream. The RA order itself takes up space, so
con3270 only uses it if there's enough space left in the line
buffer. Otherwise it just pads out the line manually.
For lines too long to include the RA order, one byte was left
uninitialised. This was caused by an off-by-one bug in the loop that
pads out the line. Since the buffer is allocated from a common pool,
the single byte left uninitialised contained some previous buffer
content. Usually this was just a space or some character (which can
result in clutter but is otherwise harmless). Sometimes, however, it
was a Repeat to Address order, messing up the entire screen layout and
causing the display to send the entire buffer content on every
keystroke.
Fixes: f51320a5 ("[PATCH] s390: new 3270 driver.") (tglx/history.git)
Reported-by: Liu Jing <liujbjl@linux.vnet.ibm.com>
Tested-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Tested-by: Yang Chen <bjcyang@linux.vnet.ibm.com>
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e81d44778d upstream.
The commit 6050d47adc: "ext4: bail out from make_indexed_dir() on
first error" could end up leaking bh2 in the error path.
[ Also avoid renaming bh2 to bh, which just confuses things --tytso ]
Signed-off-by: yangsheng <yngsion@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5045ea3737 upstream.
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));
And we get:
0
-1
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad3 ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ed50abb2d upstream.
CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
Older versions of the specification allowed to terminate
multi-block transfers only with CMD12.
The patch fixes the following problem:
mmc0: new MMC card at address 0001
mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
...
blk_update_request: I/O error, dev mmcblk0, sector 0
Buffer I/O error on dev mmcblk0, logical block 0, async page read
mmcblk0: unable to read partition table
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f9165c981 upstream.
http://www.ti.com/lit/pdf/SWCZ010:
DCDC o/p voltage can go higher than programmed value
Impact:
VDDI, VDD2, and VIO output programmed voltage level can go higher than
expected or crash, when coming out of PFM to PWM mode or using DVFS.
Description:
When DCDC CLK SYNC bits are 11/01:
* VIO 3-MHz oscillator is the source clock of the digital core and input
clock of VDD1 and VDD2
* Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
phase shift
* Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
* The 3 HSD PFET will be turned-on at the same time, causing the highest
possible switching noise on the application. This noise level depends
on the layout, the VBAT level, and the load current. The noise level
increases with improper layout.
When DCDC CLK SYNC bits are 00:
* VIO 3-MHz oscillator is the source clock of digital core
* VDD1 and VDD2 are running on their own 3-MHz oscillator
* Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
* The switching noise of the 3 SMPS will be randomly spread over time,
causing lower overall switching noise.
Workaround:
Set DCDCCTRL_REG[1:0]= 00.
Signed-off-by: Jan Remmet <j.remmet@phytec.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: use tps65910_clear_bits()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eb1a74b7be upstream.
The DragonFly quirk added in 42e3121d90 ("ALSA: usb-audio: Add a more
accurate volume quirk for AudioQuest DragonFly") applies a custom dB map
on the volume control when its range is reported as 0..50 (0 .. 0.2dB).
However, there exists at least one other variant (hw v1.0c, as opposed
to the tested v1.2) which reports a different non-sensical volume range
(0..53) and the custom map is therefore not applied for that device.
This results in all of the volume change appearing close to 100% on
mixer UIs that utilize the dB TLV information.
Add a fallback case where no dB TLV is reported at all if the control
range is not 0..50 but still 0..N where N <= 1000 (3.9 dB). Also
restrict the quirk to only apply to the volume control as there is also
a mute control which would match the check otherwise.
Fixes: 42e3121d90 ("ALSA: usb-audio: Add a more accurate volume quirk for AudioQuest DragonFly")
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Reported-by: David W <regulars@d-dub.org.uk>
Tested-by: David W <regulars@d-dub.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: keep using dev_info()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a7e1f04905 upstream.
When switching from polling-based fw commands to event-based fw
commands, there is a race condition which could cause a fw command
in another task to hang: that task will keep waiting for the polling
sempahore, but may never be able to acquire it. This is due to
mlx4_cmd_use_events, which "down"s the sempahore back to 0.
During driver initialization, this is not a problem, since no other
tasks which invoke FW commands are active.
However, there is a problem if the driver switches to polling mode
and then back to event mode during normal operation.
The "test_interrupts" feature does exactly that.
Running "ethtool -t <eth device> offline" causes the PF driver to
temporarily switch to polling mode, and then back to event mode.
(Note that for VF drivers, such switching is not performed).
Fix this by adding a read-write semaphore for protection when
switching between modes.
Fixes: 225c7b1fee ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fae9e5a7b upstream.
This patch fixes a NULL pointer dereference caused by a race codition in
the probe function of the legousbtower driver. It re-structures the
probe function to only register the interface after successfully reading
the board's firmware ID.
The probe function does not deregister the usb interface after an error
receiving the devices firmware ID. The device file registered
(/dev/usb/legousbtower%d) may be read/written globally before the probe
function returns. When tower_delete is called in the probe function
(after an r/w has been initiated), core dev structures are deleted while
the file operation functions are still running. If the 0 address is
mappable on the machine, this vulnerability can be used to create a
Local Priviege Escalation exploit via a write-what-where condition by
remapping dev->interrupt_out_buffer in tower_write. A forged USB device
and local program execution would be required for LPE. The USB device
would have to delay the control message in tower_probe and accept
the control urb in tower_open whilst guest code initiated a write to the
device file as tower_delete is called from the error in tower_probe.
This bug has existed since 2003. Patch tested by emulated device.
Reported-by: James Patrick-Evans <james@jmp-e.com>
Tested-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: James Patrick-Evans <james@jmp-e.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: keep using err()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db68577966 upstream.
The pointer callbacks of ali5451 driver may return the value at the
boundary occasionally, and it results in the kernel warning like
snd_ali5451 0000:00:06.0: BUG: , pos = 16384, buffer size = 16384, period size = 1024
It seems that folding the position offset is enough for fixing the
warning and no ill-effect has been seen by that.
Reported-by: Enrico Mioso <mrkiko.rs@gmail.com>
Tested-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11b7e154b1 upstream.
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.
Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .
Fixes: fa2b4e54d4 ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07d0e9a847 upstream.
If a VFC port gets unmapped in the VIOS, it may not respond with a CRQ
init complete following H_REG_CRQ. If this occurs, we can end up having
called scsi_block_requests and not a resulting unblock until the init
complete happens, which may never occur, and we end up hanging I/O
requests. This patch ensures the host action stay set to
IBMVFC_HOST_ACTION_TGT_DEL so we move all rports into devloss state and
unblock unless we receive an init complete.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c3db901c54 upstream.
The current code missed freeing domain id when free a domain of
struct dma_ops_domain.
Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes: ec487d1a11 ('x86, AMD IOMMU: add domain allocation and deallocation functions')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66388f2c08 upstream.
Once a chunk is enqueued successfully, sctp queues can take care of it.
Even if it is failed to transmit (like because of nomem), it should be
put into retransmit queue.
If sctp report this error to users, it confuses them, they may resend
that msg, but actually in kernel sctp stack is in charge of retransmit
it already.
Besides, this error probably is not from the failure of transmitting
current msg, but transmitting or retransmitting another msg's chunks,
as sctp_outq_flush just tries to send out all transports' chunks.
This patch is to make sctp_cmd_send_msg return avoid, and not return the
transmit err back to sctp_sendmsg
Fixes: 8b570dc9f7 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: no gfp flags parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 93e3b4e663 upstream.
Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.
We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 24b923f073 upstream.
This device uses GPIOs: 28 to switch between analog and
digital modes: on digital mode, it should be set to 1.
The code that sets it on analog mode is OK, but it misses
the logic that sets it on digital mode.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1871d718a9 upstream.
The cx231xx_set_agc_analog_digital_mux_select() callers
expect it to return 0 or an error. Returning a positive value
makes the first attempt to switch between analog/digital to fail.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dafb65fb98 upstream.
On this frontend, it takes a while to start output normal
TS data. That only happens on state S9. On S8, the TS output
is enabled, but it is not reliable enough.
However, the zigzag loop is too fast to let it sync.
As, on practical tests, the zigzag software loop doesn't
seem to be helping, but just slowing down the tuning, let's
switch to hardware algorithm, as the tuners used on such
devices are capable of work with frequency drifts without
any help from software.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0c9d349153 upstream.
Some RTL8821AE devices sold in Great Britain have the country code of
0x25 encoded in their EEPROM. This value is not tested in the routine
that establishes the regulatory info for the chip. The fix is to set
this code to have the same capabilities as the EU countries. In addition,
the channels allowed for COUNTRY_CODE_ETSI were more properly suited
for China and Israel, not the EU. This problem has also been fixed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95a8d19f28 upstream.
In case nf_conntrack_tuple_taken did not find a conflicting entry
check that all entries in this hash slot were tested and restart
in case an entry was moved to another chain.
Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: ea781f197d ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2:
- Adjust context
- Use NF_CT_STAT_INC(), not the _ATOMIC variant, since we disable BHs]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aceeffbb59 upstream.
This was lost with commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
but is necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.
For the large GPN_FT response (4 pages), save space by not dumping
any empty residual entries.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 94db3725f0 upstream.
commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
started to add FC_CT_HDR_LEN which made zfcp dump random data
out of bounds for RSPN GS responses because u.rspn.rsp
is the largest and last field in the union of struct zfcp_fc_req.
Other request/response types only happened to stay within bounds
due to the padding of the union or
due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD.
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : ...
Record id : 2
Tag : fsscth2
Request id : 0x...
Destination ID : 0x00fffffc
Payload short : 01000000 fc020000 80020000 00000000
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <===
00000000 00000000 00000000 00000000
Payload length : 32 <===
struct zfcp_fc_req {
[0] struct zfcp_fsf_ct_els ct_els;
[56] struct scatterlist sg_req;
[96] struct scatterlist sg_rsp;
union {
struct {req; rsp;} adisc; SIZE: 28+28= 56
struct {req; rsp;} gid_pn; SIZE: 24+20= 44
struct {rspsg; req;} gpn_ft; SIZE: 40*4+20=180
struct {req; rsp;} gspn; SIZE: 20+273= 293
struct {req; rsp;} rspn; SIZE: 277+16= 293
[136] } u;
}
SIZE: 432
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 771bf03537 upstream.
With commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc19681
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.
GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Fixes: 7c7dc19681 ("[SCSI] zfcp: Simplify handling of ct and els requests")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7c964ffe58 upstream.
This information was lost with
commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
but is required to debug e.g. invalid handle situations.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d27a7cb919 upstream.
Since commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
HBA records no longer contain WWPN, D_ID, or LUN
to reduce duplicate information which is already in REC records.
In contrast to "regular" target ports, we don't use recovery to open
WKA ports such as directory/nameserver, so we don't get REC records.
Therefore, introduce pseudo REC running records without any
actual recovery action but including D_ID of WKA port on open/close.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 35f040df97 upstream.
While retaining the actual filtering according to trace level,
the following commits started to write such filtered records
with a hardcoded record level of 1 instead of the actual record level:
commit 250a1352b9
("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Now we can distinguish written records again for offline level filtering.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b9 ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4eeaa4f3f1 upstream.
On a successful end of reopen port forced,
zfcp_erp_strategy_followup_success() re-uses the port erp_action
and the subsequent zfcp_erp_action_cleanup() now
sees ZFCP_ERP_SUCCEEDED with
erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT
instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
but must not perform zfcp_scsi_schedule_rport_register().
We can detect this because the fresh port reopen erp_action
is in its very first step ZFCP_ERP_STEP_UNINITIALIZED.
Otherwise this opens a time window with unblocked rport
(until the followup port reopen recovery would block it again).
If a scsi_cmnd timeout occurs during this time window
fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents
a clean and timely path failover.
This should not happen if the path issue can be recovered
on FC transport layer such as path issues involving RSCNs.
Also, unnecessary and repeated DID_IMM_RETRY for pending and
undesired new requests occur because internally zfcp still
has its zfcp_port blocked.
As follow-on errors with scsi_eh, it can cause,
in the worst case, permanently lost paths due to one of:
sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk!
sd <scsidev>: Device offlined - not ready after error recovery
For fix validation and to aid future debugging with other recoveries
we now also trace (un)blocking of rports.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 5767620c38 ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED")
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e1 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70369f8e15 upstream.
In the hardware data router case, introduced with kernel 3.2
commit 86a9668a8d ("[SCSI] zfcp: support for hardware data router")
the ELS/GS request&response length needs to be initialized
as in the chained SBAL case.
Otherwise, the FCP channel rejects ELS requests with
FSF_REQUEST_SIZE_TOO_LARGE.
Such ELS requests can be issued by user space through BSG / HBA API,
or zfcp itself uses ADISC ELS for remote port link test on RSCN.
The latter can cause a short path outage due to
unnecessary remote target port recovery because the always
failing ADISC cannot detect extremely short path interruptions
beyond the local FCP channel.
Below example is decoded with zfcpdbf from s390-tools:
Timestamp : ...
Area : SAN
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_san_req+0408
Record id : 1
Tag : fssels1
Request id : 0x<reqid>
Destination ID : 0x00<target d_id>
Payload info : 52000000 00000000 <our wwpn > [ADISC]
<our wwnn > 00<s_id> 00000000
00000000 00000000 00000000 00000000
Timestamp : ...
Area : HBA
Subarea : 00
Level : 1
Exception : -
CPU id : ..
Caller : zfcp_dbf_hba_fsf_res+0740
Record id : 1
Tag : fs_ferr
Request id : 0x<reqid>
Request status : 0x00000010
FSF cmnd : 0x0000000b [FSF_QTCB_SEND_ELS]
FSF sequence no: 0x...
FSF issued : ...
FSF stat : 0x00000061 [FSF_REQUEST_SIZE_TOO_LARGE]
FSF stat qual : 00000000 00000000 00000000 00000000
Prot stat : 0x00000100
Prot stat qual : 00000000 00000000 00000000 00000000
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 86a9668a8d ("[SCSI] zfcp: support for hardware data router")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd77befa5b upstream.
For an NPIV-enabled FCP device, zfcp can erroneously show
"NPort (fabric via point-to-point)" instead of "NPIV VPORT"
for the port_type sysfs attribute of the corresponding
fc_host.
s390-tools that can be affected are dbginfo.sh and ziomon.
zfcp_fsf_exchange_config_evaluate() ignores
fsf_qtcb_bottom_config.connection_features indicating NPIV
and only sets fc_host_port_type to FC_PORTTYPE_NPORT if
fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC.
Only the independent zfcp_fsf_exchange_port_evaluate()
evaluates connection_features to overwrite fc_host_port_type
to FC_PORTTYPE_NPIV in case of NPIV.
Code was introduced with upstream kernel 2.6.30
commit 0282985da5
("[SCSI] zfcp: Report fc_host_port_type as NPIV").
This works during FCP device recovery (such as set online)
because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by
FSF_QTCB_EXCHANGE_PORT_DATA in sequence.
However, the zfcp-specific scsi host sysfs attributes
"requests", "megabytes", or "seconds_active" trigger only
zfcp_fsf_exchange_config_evaluate() resetting fc_host
port_type to FC_PORTTYPE_NPORT despite NPIV.
The zfcp-specific scsi host sysfs attribute "utilization"
triggers only zfcp_fsf_exchange_port_evaluate() correcting
the fc_host port_type again in case of NPIV.
Evaluate fsf_qtcb_bottom_config.connection_features
in zfcp_fsf_exchange_config_evaluate() where it belongs to.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 0282985da5 ("[SCSI] zfcp: Report fc_host_port_type as NPIV")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d50b3f43db upstream.
When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
(#50bc50) and neighboring pixels have slightly different values
(such as #50bc78).
The reason is that fbcon loads its 16 color palette through
efifb_setcolreg(), which in turn calculates a 32-bit value to write
into memory for each palette index.
Until now, this code could only handle 8-bit visuals and didn't mask
overlapping values when ORing them.
With this patch, fbcon displays the correct colors when a qemu VM is
booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").
Fixes: 7c83172b98 ("x86_64 EFI boot support: EFI frame buffer driver") # v2.6.24+
Signed-off-by: Max Staudt <mstaudt@suse.de>
Acked-By: Peter Jones <pjones@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0beef634b8 upstream.
Inability to locate a user mode specified transaction ID should not
lead to a kernel crash. For other than XS_TRANSACTION_START also
don't issue anything to xenbus if the specified ID doesn't match that
of any active transaction.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ed Swierk <eswierk@skyportsystems.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d5468d7afa upstream.
Commit 588afcc1c0 ("[media] usbvision fix overflow of interfaces
array")' should be reverted, because:
* "!dev->actconfig->interface[ifnum]" won't catch a case where the value
is not NULL but some garbage. This way the system may crash later with
GPF.
* "(ifnum >= USB_MAXINTERFACES)" does not cover all the error
conditions. "ifnum" should be compared to "dev->actconfig->
desc.bNumInterfaces", i.e. compared to the number of "struct
usb_interface" kzalloc()-ed, not to USB_MAXINTERFACES.
* There is a "struct usb_device" leak in this error path, as there is
usb_get_dev(), but no usb_put_dev() on this path.
* There is a bug of the same type several lines below with number of
endpoints. The code is accessing hard-coded second endpoint
("interface->endpoint[1].desc") which may not exist. It would be great
to handle this in the same patch too.
* All the concerns above are resolved by already-accepted commit fa52bd50
("[media] usbvision: fix crash on detecting device with invalid
configuration")
* Mailing list message:
http://www.spinics.net/lists/linux-media/msg94832.html
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Luis Henriques <luis.henriques@canonical.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 073931017b upstream.
When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2). Fix that.
References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
[bwh: Backported to 3.2:
- Drop changes to ceph, f2fs, hfsplus, orangefs
- Use capable() instead of capable_wrt_inode_uidgid()
- Update ext3 and generic_acl.c as well
- In gfs2, jfs, and xfs, take care to avoid leaking the allocated ACL if
posix_acl_update_mode() determines it's not needed
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 755ac67f83 upstream.
If the acl can be exactly represented in the traditional file
mode permission bits, we don't set another acl attribute.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 030b533c4f upstream.
Currently, notify_change() clears capabilities or IMA attributes by
calling security_inode_killpriv() before calling into ->setattr. Thus it
happens before any other permission checks in inode_change_ok() and user
is thus allowed to trigger clearing of capabilities or IMA attributes
for any file he can look up e.g. by calling chown for that file. This is
unexpected and can lead to user DoSing a system.
Fix the problem by calling security_inode_killpriv() at the end of
inode_change_ok() instead of from notify_change(). At that moment we are
sure user has permissions to do the requested change.
References: CVE-2015-1350
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 31051c85b5 upstream.
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2:
- Drop changes to f2fs, lustre, orangefs, overlayfs
- Adjust filenames, context
- In nfsd, pass dentry to nfsd_sanitize_attrs()
- In xfs, pass dentry to xfs_change_file_space(), xfs_set_mode(),
xfs_setattr_nonsize(), and xfs_setattr_size()
- Update ext3 as well
- Mark pohmelfs as BROKEN; it's long dead upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 667121ace9 upstream.
The IP-over-1394 driver firewire-net lacked input validation when
handling incoming fragmented datagrams. A maliciously formed fragment
with a respectively large datagram_offset would cause a memcpy past the
datagram buffer.
So, drop any packets carrying a fragment with offset + length larger
than datagram_size.
In addition, ensure that
- GASP header, unfragmented encapsulation header, or fragment
encapsulation header actually exists before we access it,
- the encapsulated datagram or fragment is of nonzero size.
Reported-by: Eyal Itkin <eyal.itkin@gmail.com>
Reviewed-by: Eyal Itkin <eyal.itkin@gmail.com>
Fixes: CVE 2016-8633
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
[bwh: Backported to 3.2: fwnet_receive_broadcast() never matches IPv6 packets]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7bc2b55a5c upstream.
We need to put an upper bound on "user_len" so the memcpy() doesn't
overflow.
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2:
- Adjust context
- Use literal 1032 insetad of ARCMSR_API_DATA_BUFLEN]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 03dab869b7 upstream.
This fixes CVE-2016-7042.
Fix a short sprintf buffer in proc_keys_show(). If the gcc stack protector
is turned on, this can cause a panic due to stack corruption.
The problem is that xbuf[] is not big enough to hold a 64-bit timeout
rendered as weeks:
(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
$2 = 30500568904943
That's 14 chars plus NUL, not 11 chars plus NUL.
Expand the buffer to 16 chars.
I think the unpatched code apparently works if the stack-protector is not
enabled because on a 32-bit machine the buffer won't be overflowed and on a
64-bit machine there's a 64-bit aligned pointer at one side and an int that
isn't checked again on the other side.
The panic incurred looks something like:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
Call Trace:
[<ffffffff813d941f>] dump_stack+0x63/0x84
[<ffffffff811b2cb6>] panic+0xde/0x22a
[<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
[<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
[<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
[<ffffffff81350410>] ? key_validate+0x50/0x50
[<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
[<ffffffff8126b31c>] seq_read+0x2cc/0x390
[<ffffffff812b6b12>] proc_reg_read+0x42/0x70
[<ffffffff81244fc7>] __vfs_read+0x37/0x150
[<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
[<ffffffff81246156>] vfs_read+0x96/0x130
[<ffffffff81247635>] SyS_read+0x55/0xc0
[<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Reported-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 951b6a0717 upstream.
addr can be NULL and it should not be dereferenced before NULL checking.
Signed-off-by: Jaganath Kanakkassery <jaganath.k@samsung.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.2:
- There's no 'chan' variable
- Keep using batostr() to log addresses
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b398e416e upstream.
I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.
Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78
The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator
ksm_do_scan
scan_get_next_rmap_item
down_read
get_next_rmap_item
alloc_rmap_item #ksmd will loop permanently.
There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel. Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.
Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.
While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.
Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1245800c0f upstream.
The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.
Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9abefcb1aa upstream.
A timer was used to restart after the bus-off state, leading to a
relatively large can_restart() executed in an interrupt context,
which in turn sets up pinctrl. When this happens during system boot,
there is a high probability of grabbing the pinctrl_list_mutex,
which is locked already by the probe() of other device, making the
kernel suspect a deadlock condition [1].
To resolve this issue, the restart_timer is replaced by a delayed
work.
[1] https://github.com/victronenergy/venus/issues/24
Signed-off-by: Sergei Miroshnichenko <sergeimir@emcraft.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 325c50e3ce upstream.
If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.
This patch ensures that the file descriptor refers to a directory.
Fixes: cb8e70901d (Btrfs: Fix subvolume creation locking rules)
Fixes: 76dda93c6a (Btrfs: add snapshot/subvolume destroy ioctl)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2:
- Open-code file_inode()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 371a015344 upstream.
the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.
there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.
At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d21c353d5e upstream.
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reported-by: Saar Maoz <saar.maoz@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e6f0c6e617 upstream.
Commit ac7cf246df ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.
In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.
Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.
Fixes: ac7cf246df ("ocfs2/dlm: fix race between convert and recovery")
Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b588479358 upstream.
commit 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
introduced aead. The function attach_aead kmemdup()s the algorithm
name during xfrm_state_construct().
However this memory is never freed.
Implementation has since been slightly modified in
commit ee5c23176f ("xfrm: Clone states properly on migration")
without resolving this leak.
This patch adds a kfree() call for the aead algorithm name.
Fixes: 1a6509d991 ("[IPSEC]: Add support for combined mode algorithms")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8e4b72054f upstream.
Since commit acb2505d01 ("openrisc: fix copy_from_user()"),
copy_from_user() returns the number of bytes requested, not the
number of bytes not copied.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: acb2505d01 ("openrisc: fix copy_from_user()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 65c0044ca8 upstream.
avr32 builds fail with:
arch/avr32/kernel/built-in.o: In function `arch_ptrace':
(.text+0x650): undefined reference to `___copy_from_user'
arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined
reference to `___copy_from_user'
kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax':
(.text+0x5dd8): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin':
sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `ptrace_has_cap':
ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user'
kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to
`___copy_from_user' follow
Fixes: 8630c32275 ("avr32: fix copy_from_user()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Havard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8ab86c00e3 upstream.
skb is not freed if newsk is NULL. Rework the error path so free_skb is
unconditionally called on function exit.
Fixes: c3ea9fa274 ("[IrDA] af_irda: IRDA_ASSERT cleanups")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 344bacca8c upstream.
This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.
The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.
[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---
Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08c5cd3748 upstream.
Some full-speed mceusb infrared transceivers contain invalid endpoint
descriptors for their interrupt endpoints, with bInterval set to 0.
In the past they have worked out okay with the mceusb driver, because
the driver sets the bInterval field in the descriptor to 1,
overwriting whatever value may have been there before. However, this
approach was never sanctioned by the USB core, and in fact it does not
work with xHCI controllers, because they use the bInterval value that
was present when the configuration was installed.
Currently usbcore uses 32 ms as the default interval if the value in
the endpoint descriptor is invalid. It turns out that these IR
transceivers don't work properly unless the interval is set to 10 ms
or below. To work around this mceusb problem, this patch changes the
endpoint-descriptor parsing routine, making the default interval value
be 10 ms rather than 32 ms.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Wade Berrier <wberrier@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8630c32275 upstream.
really ugly, but apparently avr32 compilers turns access_ok() into
something so bad that they want it in assembler. Left that way,
zeroing added in inline wrapper.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c685238922 upstream.
It could be done in exception-handling bits in __get_user_b() et.al.,
but the surgery involved would take more knowledge of sh64 details
than I have or _want_ to have.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c2f18fa4cb upstream.
* should zero on any failure
* __get_user() should use __copy_from_user(), not copy_from_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 224264657b upstream.
should clear on access_ok() failures. Also remove the useless
range truncation logics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: no calls to check_object_size()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit acb2505d01 upstream.
... that should zero on faults. Also remove the <censored> helpful
logics wrt range truncation copied from ppc32. Where it had ever
been needed only in case of copy_from_user() *and* had not been merged
into the mainline until a month after the need had disappeared.
A decade before openrisc went into mainline, I might add...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae7cc577ec upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: include <linux/string.h> to get declaration of memset()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b8767a8f0 upstream.
It should check access_ok(). Otherwise a bunch of places turn into
trivially exploitable rootholes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eb47e0293b upstream.
* copy_from_user() on access_ok() failure ought to zero the destination
* none of those primitives should skip the access_ok() check in case of
small constant size.
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ad18b75c2 upstream.
both for access_ok() failures and for faults halfway through
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit acdb04d0b3 upstream.
When we need to allocate a temporary blkcipher_walk_next and it
fails, the code is supposed to take the slow path of processing
the data block by block. However, due to an unrelated change
we instead end up dereferencing the NULL pointer.
This patch fixes it by moving the unrelated bsize setting out
of the way so that we enter the slow path as inteded.
Fixes: 7607bd8ff0 ("[CRYPTO] blkcipher: Added blkcipher_walk_virt_block")
Reported-by: xiakaixu <xiakaixu@huawei.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[bwh: Backported to 3.2: s/walk_blocksize/blocksize/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06dfe5cc0c upstream.
SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for
the PCMCIA socket class. PCMCIA used to handle suspend/resume via the
socket hosting device, which happened at normal device suspend/resume
time.
However, the referenced commit changed this: much of the resume now
happens much earlier, in the noirq resume handler of dev_pm_ops.
However, on SA1111, the PCMCIA device is not accessible as the SA1111
has not been resumed at _noirq time. It's slightly worse than that,
because the SA1111 has already been put to sleep at _noirq time, so
suspend doesn't work properly.
Fix this by converting the core SA1111 code to use dev_pm_ops as well,
and performing its own suspend/resume at noirq time.
This fixes these errors in the kernel log:
pcmcia_socket pcmcia_socket0: time out after reset
pcmcia_socket pcmcia_socket1: time out after reset
and the resulting lack of PCMCIA cards after a S2RAM cycle.
Fixes: d7646f7632 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b519d408ea upstream.
Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2:
- Add the 'out' label
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f6d7c1b559 upstream.
This fixes subpage writes when using 4-bit HW ECC.
There has been numerous reports about ECC errors with devices using this
driver for a while. Also the 4-bit ECC has been reported as broken with
subpages in [1] and with 16 bits NANDs in the driver and in mach* board
files both in mainline and in the vendor BSPs.
What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
try reinitializing the ECC engine:
- R/W on whole pages properly generates/checks RS code
- try writing the 1st subpage only of a blank page, the subpage is well
written and the RS code properly generated, re-reading the same page
the HW detects some ECC error, reading the same page again no ECC
error is detected
Note that the ECC engine is already reinitialized in the 1-bit case.
Tested on my LCDK with UBI+UBIFS using subpages.
This could potentially get rid of the issue workarounded in [1].
[1] 28c015a9da ("mtd: davinci-nand: disable subpage write for keystone-nand")
Fixes: 6a4123e581 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
Signed-off-by: Karl Beldan <kbeldan@baylibre.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2545e5da08 upstream.
... in all cases, including the failing access_ok()
Note that some architectures using asm-generic/uaccess.h have
__copy_from_user() not zeroing the tail on failure halfway
through. This variant works either way.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2561d309df upstream.
it should clear the destination even when access_ok() fails.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f30ea5090 upstream.
When we fail to attach the security context in xfrm_state_construct()
we'll return 0 as error value which, in turn, will wrongly claim success
to userland when, in fact, we won't be adding / updating the XFRM state.
This is a regression introduced by commit fd21150a0f ("[XFRM] netlink:
Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()").
Fix it by propagating the error returned by security_xfrm_state_alloc()
in this case.
Fixes: fd21150a0f ("[XFRM] netlink: Inline attach_encap_tmpl()...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 816f318b23 upstream.
When a seq-virmidi driver is initialized, it registers a rawmidi
instance with its callback to create an associated seq kernel client.
Currently it's done throughly in rawmidi's register_mutex context.
Recently it was found that this may lead to a deadlock another rawmidi
device that is being attached with the sequencer is accessed, as both
open with the same register_mutex. This was actually triggered by
syzkaller, as Dmitry Vyukov reported:
======================================================
[ INFO: possible circular locking dependency detected ]
4.8.0-rc1+ #11 Not tainted
-------------------------------------------------------
syz-executor/7154 is trying to acquire lock:
(register_mutex#5){+.+.+.}, at: [<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341
but task is already holding lock:
(&grp->list_mutex){++++.+}, at: [<ffffffff850138bb>] check_and_subscribe_port+0x5b/0x5c0 sound/core/seq/seq_ports.c:495
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&grp->list_mutex){++++.+}:
[<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746
[<ffffffff863f6199>] down_read+0x49/0xc0 kernel/locking/rwsem.c:22
[< inline >] deliver_to_subscribers sound/core/seq/seq_clientmgr.c:681
[<ffffffff85005c5e>] snd_seq_deliver_event+0x35e/0x890 sound/core/seq/seq_clientmgr.c:822
[<ffffffff85006e96>] > snd_seq_kernel_client_dispatch+0x126/0x170 sound/core/seq/seq_clientmgr.c:2418
[<ffffffff85012c52>] snd_seq_system_broadcast+0xb2/0xf0 sound/core/seq/seq_system.c:101
[<ffffffff84fff70a>] snd_seq_create_kernel_client+0x24a/0x330 sound/core/seq/seq_clientmgr.c:2297
[< inline >] snd_virmidi_dev_attach_seq sound/core/seq/seq_virmidi.c:383
[<ffffffff8502d29f>] snd_virmidi_dev_register+0x29f/0x750 sound/core/seq/seq_virmidi.c:450
[<ffffffff84fd208c>] snd_rawmidi_dev_register+0x30c/0xd40 sound/core/rawmidi.c:1645
[<ffffffff84f816d3>] __snd_device_register.part.0+0x63/0xc0 sound/core/device.c:164
[< inline >] __snd_device_register sound/core/device.c:162
[<ffffffff84f8235d>] snd_device_register_all+0xad/0x110 sound/core/device.c:212
[<ffffffff84f7546f>] snd_card_register+0xef/0x6c0 sound/core/init.c:749
[<ffffffff85040b7f>] snd_virmidi_probe+0x3ef/0x590 sound/drivers/virmidi.c:123
[<ffffffff833ebf7b>] platform_drv_probe+0x8b/0x170 drivers/base/platform.c:564
......
-> #0 (register_mutex#5){+.+.+.}:
[< inline >] check_prev_add kernel/locking/lockdep.c:1829
[< inline >] check_prevs_add kernel/locking/lockdep.c:1939
[< inline >] validate_chain kernel/locking/lockdep.c:2266
[<ffffffff814791f4>] __lock_acquire+0x4d44/0x4d80 kernel/locking/lockdep.c:3335
[<ffffffff8147a3a8>] lock_acquire+0x208/0x430 kernel/locking/lockdep.c:3746
[< inline >] __mutex_lock_common kernel/locking/mutex.c:521
[<ffffffff863f0ef1>] mutex_lock_nested+0xb1/0xa20 kernel/locking/mutex.c:621
[<ffffffff84fd6d4b>] snd_rawmidi_kernel_open+0x4b/0x260 sound/core/rawmidi.c:341
[<ffffffff8502e7c7>] midisynth_subscribe+0xf7/0x350 sound/core/seq/seq_midi.c:188
[< inline >] subscribe_port sound/core/seq/seq_ports.c:427
[<ffffffff85013cc7>] check_and_subscribe_port+0x467/0x5c0 sound/core/seq/seq_ports.c:510
[<ffffffff85015da9>] snd_seq_port_connect+0x2c9/0x500 sound/core/seq/seq_ports.c:579
[<ffffffff850079b8>] snd_seq_ioctl_subscribe_port+0x1d8/0x2b0 sound/core/seq/seq_clientmgr.c:1480
[<ffffffff84ffe9e4>] snd_seq_do_ioctl+0x184/0x1e0 sound/core/seq/seq_clientmgr.c:2225
[<ffffffff84ffeae8>] snd_seq_kernel_client_ctl+0xa8/0x110 sound/core/seq/seq_clientmgr.c:2440
[<ffffffff85027664>] snd_seq_oss_midi_open+0x3b4/0x610 sound/core/seq/oss/seq_oss_midi.c:375
[<ffffffff85023d67>] snd_seq_oss_synth_setup_midi+0x107/0x4c0 sound/core/seq/oss/seq_oss_synth.c:281
[<ffffffff8501b0a8>] snd_seq_oss_open+0x748/0x8d0 sound/core/seq/oss/seq_oss_init.c:274
[<ffffffff85019d8a>] odev_open+0x6a/0x90 sound/core/seq/oss/seq_oss.c:138
[<ffffffff84f7040f>] soundcore_open+0x30f/0x640 sound/sound_core.c:639
......
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&grp->list_mutex);
lock(register_mutex#5);
lock(&grp->list_mutex);
lock(register_mutex#5);
*** DEADLOCK ***
======================================================
The fix is to simply move the registration parts in
snd_rawmidi_dev_register() to the outside of the register_mutex lock.
The lock is needed only to manage the linked list, and it's not
necessarily to cover the whole initialization process.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f8a7658bc upstream.
When a user timer instance is continued without the explicit start
beforehand, the system gets eventually zero-division error like:
divide error: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
CPU: 1 PID: 27320 Comm: syz-executor Not tainted 4.8.0-rc3-next-20160825+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88003c9b2280 task.stack: ffff880027280000
RIP: 0010:[<ffffffff858e1a6c>] [< inline >] ktime_divns include/linux/ktime.h:195
RIP: 0010:[<ffffffff858e1a6c>] [<ffffffff858e1a6c>] snd_hrtimer_callback+0x1bc/0x3c0 sound/core/hrtimer.c:62
Call Trace:
<IRQ>
[< inline >] __run_hrtimer kernel/time/hrtimer.c:1238
[<ffffffff81504335>] __hrtimer_run_queues+0x325/0xe70 kernel/time/hrtimer.c:1302
[<ffffffff81506ceb>] hrtimer_interrupt+0x18b/0x420 kernel/time/hrtimer.c:1336
[<ffffffff8126d8df>] local_apic_timer_interrupt+0x6f/0xe0 arch/x86/kernel/apic/apic.c:933
[<ffffffff86e13056>] smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:957
[<ffffffff86e1210c>] apic_timer_interrupt+0x8c/0xa0 arch/x86/entry/entry_64.S:487
<EOI>
.....
Although a similar issue was spotted and a fix patch was merged in
commit [6b760bb2c6: ALSA: timer: fix division by zero after
SNDRV_TIMER_IOCTL_CONTINUE], it seems covering only a part of
iceberg.
In this patch, we fix the issue a bit more drastically. Basically the
continue of an uninitialized timer is supposed to be a fresh start, so
we do it for user timers. For the direct snd_timer_continue() call,
there is no way to pass the initial tick value, so we kick out for the
uninitialized case.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2:
- Adjust context
- In _snd_timer_stop(), check the value of 'event' instead of 'stop']
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c3b1681375 upstream.
This is a minor code cleanup without any functional changes:
- Kill keep_flag argument from _snd_timer_stop(), as all callers pass
only it false.
- Remove redundant NULL check in _snd_timer_stop().
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust to apply after previously backported
"ALSA: timer: Fix race between stop and interrupt"]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0bd2223594 upstream.
When calling .import() on a cryptd ahash_request, the structure members
that describe the child transform in the shash_desc need to be initialized
like they are when calling .init()
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 135e8c9250 upstream.
The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.
The task being woken up is already awake from a schedule()
and is doing the following:
do {
schedule()
set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);
The waker, actually gets stuck doing the following in
try_to_wake_up():
while (p->on_cpu)
cpu_relax();
Analysis:
The instance I've seen involves the following race:
CPU1 CPU2
while () {
if (cond)
break;
do {
schedule();
set_current_state(TASK_UN..)
} while (!cond);
wakeup_routine()
spin_lock_irqsave(wait_lock)
raw_spin_lock_irqsave(wait_lock) wake_up_process()
} try_to_wake_up()
set_current_state(TASK_RUNNING); ..
list_del(&waiter.list);
CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:
CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
wake_up_process()
try_to_wake_up()
raw_spin_lock_irqsave(p->pi_lock)
..
if (p->on_rq && ttwu_wakeup())
..
while (p->on_cpu)
cpu_relax()
..
CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.
CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.
The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely
Reproduction of the issue:
The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.
Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[ Updated comment to clarify matching barriers. Many
architectures do not have a full barrier in switch_to()
so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <nicholas.piggin@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 307fe9dd11 upstream.
All the scaling of the KXSD9 involves multiplication with a
fraction number < 1.
However the scaling value returned from IIO_INFO_SCALE was
unpredictable as only the micros of the value was assigned, and
not the integer part, resulting in scaling like this:
$cat in_accel_scale
-1057462640.011978
Fix this by assigning zero to the integer part.
Tested-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 546481c281 upstream.
When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.
The next scenario demonstrates it:
neigh_add_path --> ipoib_cm_create_tx -->
queue_work (pointer to path is in the cm/tx struct)
#while the work is still in the queue,
#the port goes down and causes the ipoib_flush_paths:
ipoib_flush_paths --> path_free --> kfree(path)
#at this point the work scheduled starts.
ipoib_cm_tx_start --> copy from the (invalid)path pointer:
(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
-> memory corruption.
To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.
Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: s/neigh->daddr/neigh->neighbour->ha/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 68c6bcdd8b upstream.
The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.
Additionally, this patch gets rid of group->query_id variable which is
not used.
Fixes: faec2f7b96 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15301a5707 upstream.
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:
_paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.
This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.
In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue. But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:
mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
Modules linked in: e1000e
CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
RIP: 0010:[<ffffffff81134148>] [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
RSP: 0018:ffff8800d4aefb90 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
FS: 00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
Call Trace:
_raw_spin_lock+0x27/0x30
handle_pte_fault+0x13db/0x16b0
handle_mm_fault+0x312/0x670
__do_page_fault+0x1b1/0x4e0
do_page_fault+0x22/0x30
page_fault+0x28/0x30
__vfs_read+0x28/0xe0
vfs_read+0x86/0x130
SyS_read+0x46/0xa0
entry_SYSCALL_64_fastpath+0x1e/0xa8
Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b
Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11749e086b upstream.
I got this with syzkaller:
==================================================================
BUG: KASAN: null-ptr-deref on address 0000000000000020
Read of size 32 by task syz-executor/22519
CPU: 1 PID: 22519 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2
014
0000000000000001 ffff880111a17a00 ffffffff81f9f141 ffff880111a17a90
ffff880111a17c50 ffff880114584a58 ffff880114584a10 ffff880111a17a80
ffffffff8161fe3f ffff880100000000 ffff880118d74a48 ffff880118d74a68
Call Trace:
[<ffffffff81f9f141>] dump_stack+0x83/0xb2
[<ffffffff8161fe3f>] kasan_report_error+0x41f/0x4c0
[<ffffffff8161ff74>] kasan_report+0x34/0x40
[<ffffffff82c84b54>] ? snd_timer_user_read+0x554/0x790
[<ffffffff8161e79e>] check_memory_region+0x13e/0x1a0
[<ffffffff8161e9c1>] kasan_check_read+0x11/0x20
[<ffffffff82c84b54>] snd_timer_user_read+0x554/0x790
[<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
[<ffffffff817d0831>] ? proc_fault_inject_write+0x1c1/0x250
[<ffffffff817d0670>] ? next_tgid+0x2a0/0x2a0
[<ffffffff8127c278>] ? do_group_exit+0x108/0x330
[<ffffffff8174653a>] ? fsnotify+0x72a/0xca0
[<ffffffff81674dfe>] __vfs_read+0x10e/0x550
[<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
[<ffffffff81674cf0>] ? do_sendfile+0xc50/0xc50
[<ffffffff81745e10>] ? __fsnotify_update_child_dentry_flags+0x60/0x60
[<ffffffff8143fec6>] ? kcov_ioctl+0x56/0x190
[<ffffffff81e5ada2>] ? common_file_perm+0x2e2/0x380
[<ffffffff81746b0e>] ? __fsnotify_parent+0x5e/0x2b0
[<ffffffff81d93536>] ? security_file_permission+0x86/0x1e0
[<ffffffff816728f5>] ? rw_verify_area+0xe5/0x2b0
[<ffffffff81675355>] vfs_read+0x115/0x330
[<ffffffff81676371>] SyS_read+0xd1/0x1a0
[<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
[<ffffffff82001c2c>] ? __this_cpu_preempt_check+0x1c/0x20
[<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
[<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
[<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
[<ffffffff810052fc>] ? syscall_return_slowpath+0x16c/0x1d0
[<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
==================================================================
There are a couple of problems that I can see:
- ioctl(SNDRV_TIMER_IOCTL_SELECT), which potentially sets
tu->queue/tu->tqueue to NULL on memory allocation failure, so read()
would get a NULL pointer dereference like the above splat
- the same ioctl() can free tu->queue/to->tqueue which means read()
could potentially see (and dereference) the freed pointer
We can fix both by taking the ioctl_lock mutex when dereferencing
->queue/->tqueue, since that's always held over all the ioctl() code.
Just looking at the code I find it likely that there are more problems
here such as tu->qhead pointing outside the buffer if the size is
changed concurrently using SNDRV_TIMER_IOCTL_PARAMS.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 088bf2ff5d upstream.
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.
I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address:
ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
This seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.
2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.
Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 40d9c32525 upstream.
These product IDs are listed in Windows driver.
0x6803 corresponds to WeTelecom WM-D300.
0x6802 name is unknown.
Signed-off-by: Aleksandr Makarov <aleksandr.o.makarov@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e63ad4bd5 upstream.
native_smp_prepare_cpus
-> default_setup_apic_routing
-> enable_IR_x2apic
-> irq_remapping_prepare
-> intel_prepare_irq_remapping
-> intel_setup_irq_remapping
So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.
Prevent remap initialization when IOAPIC is disabled.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0082e985f upstream.
An assertion in layout_in_gaps() verifies that the gap_lebs pointer is
below the maximum bound. When computing this maximum bound the idx_lebs
count is multiplied by sizeof(int), while C pointers arithmetic does take
into account the size of the pointed elements implicitly already. Remove
the multiplication to fix the assertion.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 53e5f36fbd upstream.
UBSAN complains about a left shift by -1 in proc_do_submiturb(). This
can occur when an URB is submitted for a bulk or control endpoint on
a high-speed device, since the code doesn't bother to check the
endpoint type; normally only interrupt or isochronous endpoints have
a nonzero bInterval value.
Aside from the fact that the operation is illegal, it shouldn't matter
because the result isn't used. Still, in theory it could cause a
hardware exception or other problem, so we should work around it.
This patch avoids doing the left shift unless the shift amount is >= 0.
The same piece of code has another problem. When checking the device
speed (the exponential encoding for interrupt endpoints is used only
by high-speed or faster devices), we need to look for speed >=
USB_SPEED_SUPER as well as speed == USB_SPEED HIGH. The patch adds
this check.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Vittorio Zecca <zeccav@gmail.com>
Tested-by: Vittorio Zecca <zeccav@gmail.com>
Suggested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6c73358c83 upstream.
The maximum value allowed for wMaxPacketSize of a high-speed interrupt
endpoint is 1024 bytes, not 1023.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Fixes: aed9d65ac3 ("USB: validate wMaxPacketValue entries in endpoint descriptors")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f00975c61 upstream.
Somehow this one slipped through, which means drivers without modeset
support can be oopsed (since those also don't call
drm_mode_config_init, which means the crtc lookup will chase an
uninitalized idr).
Reported-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7ac61a062f upstream.
Any readings from the raw interface of the KXSD9 driver will
return an empty string, because it does not return
IIO_VAL_INT but rather some random value from the accelerometer
to the caller.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3eb53b20d7 upstream.
When building gccgo in userspace, errno.h gets parsed and the go include file
sysinfo.go is generated.
Since EREFUSED is defined to the same value as ECONNREFUSED, and ECONNREFUSED
is defined later on in errno.h, this leads to go complaining that EREFUSED
isn't defined yet.
Fix this trivial problem by moving the define of EREFUSED down after
ECONNREFUSED in errno.h (and clean up the indenting while touching this line).
Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 47af45d684 upstream.
The commit 4097461897 ("Input: i8042 - break load dependency ...")
correctly set up ps2_cmd_mutex pointer for the KBD port but forgot to do
the same for AUX port(s), which results in communication on KBD and AUX
ports to clash with each other.
Fixes: 4097461897 ("Input: i8042 - break load dependency ...")
Reported-by: Bruno Wolff III <bruno@wolff.to>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bb1fceca22 upstream.
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f1f6d9a8b5 upstream.
Remove the hcd after checking for the xhci last quirks, not before.
This caused a hang on a Alpine Ridge xhci based maching which remove
the whole xhci controller when unplugging the last usb device
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 88716a9376 upstream.
After a device is disconnected, xhci_stop_device() will be invoked
in xhci_bus_suspend().
Also the "disconnect" IRQ will have ISR to invoke
xhci_free_virt_device() in this sequence.
xhci_irq -> xhci_handle_event -> handle_cmd_completion ->
xhci_handle_cmd_disable_slot -> xhci_free_virt_device
If xhci->devs[slot_id] has been assigned to NULL in
xhci_free_virt_device(), then virt_dev->eps[i].ring in
xhci_stop_device() may point to an invlid address to cause kernel
panic.
virt_dev = xhci->devs[slot_id];
:
if (virt_dev->eps[i].ring && virt_dev->eps[i].ring->dequeue)
[] Unable to handle kernel paging request at virtual address 00001a68
[] pgd=ffffffc001430000
[] [00001a68] *pgd=000000013c807003, *pud=000000013c807003,
*pmd=000000013c808003, *pte=0000000000000000
[] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[] CPU: 0 PID: 39 Comm: kworker/0:1 Tainted: G U
[] Workqueue: pm pm_runtime_work
[] task: ffffffc0bc0e0bc0 ti: ffffffc0bc0ec000 task.ti:
ffffffc0bc0ec000
[] PC is at xhci_stop_device.constprop.11+0xb4/0x1a4
This issue is found when running with realtek ethernet device
(0bda:8153).
Signed-off-by: Jim Lin <jilin@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b7c7e52ef upstream.
There is an allocation with GFP_KERNEL flag in mos7840_write(),
while it may be called from interrupt context.
Follow-up for commit 1912528376 ("USB: kobil_sct: fix non-atomic
allocation in write path")
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a5a1d6142 upstream.
There is an allocation with GFP_KERNEL flag in mos7720_write(),
while it may be called from interrupt context.
Follow-up for commit 1912528376 ("USB: kobil_sct: fix non-atomic
allocation in write path")
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7f851684e upstream.
Found one megaraid_sas HBA probe fails,
[ 187.235190] scsi host2: Avago SAS based MegaRAID driver
[ 191.112365] megaraid_sas 0000:89:00.0: BAR 0: can't reserve [io 0x0000-0x00ff]
[ 191.120548] megaraid_sas 0000:89:00.0: IO memory region busy!
and the card has resource like,
[ 125.097714] pci 0000:89:00.0: [1000:005d] type 00 class 0x010400
[ 125.104446] pci 0000:89:00.0: reg 0x10: [io 0x0000-0x00ff]
[ 125.110686] pci 0000:89:00.0: reg 0x14: [mem 0xce400000-0xce40ffff 64bit]
[ 125.118286] pci 0000:89:00.0: reg 0x1c: [mem 0xce300000-0xce3fffff 64bit]
[ 125.125891] pci 0000:89:00.0: reg 0x30: [mem 0xce200000-0xce2fffff pref]
that does not io port resource allocated from BIOS, and kernel can not
assign one as io port shortage.
The driver is only looking for MEM, and should not fail.
It turns out megasas_init_fw() etc are using bar index as mask. index 1
is used as mask 1, so that pci_request_selected_regions() is trying to
request BAR0 instead of BAR1.
Fix all related reference.
Fixes: b6d5d8808b ("megaraid_sas: Use lowest memory bar for SR-IOV VF support")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5cf0791da5 upstream.
There's a subtle preemption race on UP kernels:
Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.
But then, there is this scenario on x86-UP:
TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:
-> mmput()
-> exit_mmap()
-> tlb_finish_mmu()
-> tlb_flush_mmu()
-> tlb_flush_mmu_tlbonly()
-> tlb_flush()
-> flush_tlb_mm_range()
-> __flush_tlb_up()
-> __flush_tlb()
-> __native_flush_tlb()
At this point current->mm is NULL but current->active_mm still points to
the "old" mm.
Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.
Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.
Now the fun part:
Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.
The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…
This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.
Fix this by disabling preemption across the critical code section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ba333dc55 upstream.
When a device is in a status where CIO has killed all I/O by itself the
interrupt for a clear request may not contain an irb to determine the
clear function. Instead it contains an error pointer -EIO.
This was ignored by the DASD int_handler leading to a hanging device
waiting for a clear interrupt.
Handle -EIO error pointer correctly for requests that are clear pending and
treat the clear as successful.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aed9d65ac3 upstream.
Erroneous or malicious endpoint descriptors may have non-zero bits in
reserved positions, or out-of-bounds values. This patch helps prevent
these from causing problems by bounds-checking the wMaxPacketValue
entries in endpoint descriptors and capping the values at the maximum
allowed.
This issue was first discovered and tests were conducted by Jake Lamberson
<jake.lamberson1@gmail.com>, an intern working for Rosie Hall.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: roswest <roswest@cisco.com>
Tested-by: roswest <roswest@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: drop the USB_SPEED_SUPER_PLUS case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00a3101f56 upstream.
Like NFQNL_MSG_VERDICT_BATCH do, we should also reject the verdict
request when the portid is not same with the initial portid(maybe
from another process).
Fixes: 97d32cf944 ("netfilter: nfnetlink_queue: batch verdict support")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa00c437ee upstream.
In aacraid's ioctl_send_fib() we do two fetches from userspace, one the
get the fib header's size and one for the fib itself. Later we use the
size field from the second fetch to further process the fib. If for some
reason the size from the second fetch is different than from the first
fix, we may encounter an out-of- bounds access in aac_fib_send(). We
also check the sender size to insure it is not out of bounds. This was
reported in https://bugzilla.kernel.org/show_bug.cgi?id=116751 and was
assigned CVE-2016-6480.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Fixes: 7c00ffa31 '[SCSI] 2.6 aacraid: Variable FIB size (updated patch)'
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e10aec652f upstream.
Bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=105331
reports that the "AEO model 0" display is driven with 8 bpc
without dithering by default, which looks bad because that
panel is apparently a 6 bpc DP panel with faulty EDID.
A fix for this was made by commit 013dd9e038
("drm/i915/dp: fall back to 18 bpp when sink capability is unknown").
That commit triggers new regressions in precision for DP->DVI and
DP->VGA displays. A patch is out to revert that commit, but it will
revert video output for the AEO model 0 panel to 8 bpc without
dithering.
The EDID 1.3 of that panel, as decoded from the xrandr output
attached to that bugzilla bug report, is somewhat faulty, and beyond
other problems also sets the "DFP 1.x compliant TMDS" bit, which
according to DFP spec means to drive the panel with 8 bpc and
no dithering in absence of other colorimetry information.
Try to make the original bug reporter happy despite the
faulty EDID by adding a quirk to mark that panel as 6 bpc,
so 6 bpc output with dithering creates a nice picture.
Tested by injecting the edid from the fdo bug into a DP connector
via drm_kms_helper.edid_firmware and verifying the 6 bpc + dithering
is selected.
This patch should be backported to stable.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae34d12cc1 upstream.
BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0
UART/FIFO IC.
To support BCM920706V2_EVAL dev board for WICED development on Linux.
Add the VID(0a5c) and PID(6422) to ftdi_sio driver to allow loading
ftdi_sio for this board.
Signed-off-by: Sheng-Hui J. Chu <s.jeffrey.chu@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6977495c06 upstream.
Ivium Technologies uses the FTDI VID with custom PIDs for their line of
electrochemical interfaces and the PalmSens they developed for PalmSens
BV.
Signed-off-by: Robert Delien <robert@delien.nl>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cf1b18030d upstream.
The device has four interfaces; the three serial ports ought to be
handled by this driver:
00 Diagnostic interface serial port
01 NMEA device serial port
02 Mass storage (sd card)
03 Modem serial port
The other product ids listed in the Windows driver are present already.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 77da160530 upstream.
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 99f3c90d0d upstream.
When the corrupt_bio_byte feature was introduced it caused READ bios to
no longer be errored with -EIO during the down_interval. This had to do
with the complexity of needing to submit READs if the corrupt_bio_byte
feature was used.
Fix it so READ bios are properly errored with -EIO; doing so early in
flakey_map() as long as there isn't a match for the corrupt_bio_byte
feature.
Fixes: a3998799fb ("dm flakey: add corrupt_bio_byte feature")
Reported-by: Akira Hayakawa <ruby.wktk@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2: in flakey_end_io(), keep using
bio_submitted_while_down instead of pb->bio_submitted]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37cf99e08c upstream.
The balloon has a special mechanism that is subscribed to the oom
notification which leads to deflation for a fixed number of pages.
The number is always fixed even when the balloon is fully deflated.
But leak_balloon did not expect that the pages to deflate will be more
than taken, and raise a "BUG" in balloon_page_dequeue when page list
will be empty.
So, the simplest solution would be to check that the number of releases
pages is less or equal to the number taken pages.
Signed-off-by: Konstantin Neumoin <kneumoin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 829fa70ddd upstream.
A number of fuzzing failures seem to be caused by allocation bitmaps
or other metadata blocks being pointed at the superblock.
This can cause kernel BUG or WARNings once the superblock is
overwritten, so validate the group descriptor blocks to make sure this
doesn't happen.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 68c5cf5a60 upstream.
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for s390 at all even though ARCH_DLINFO can contain one NEW_AUX_ENT when
VDSO is enabled.
This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which s390 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.
Fixes: b020632e40 ("[S390] introduce vdso on s390")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f626300a3e upstream.
tcp_select_initial_window() intends to advertise a window
scaling for the maximum possible window size. To do so,
it considers the maximum of net.ipv4.tcp_rmem[2] and
net.core.rmem_max as the only possible upper-bounds.
However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
to set the socket's receive buffer size to values
larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
Thus, SO_RCVBUFFORCE is effectively ignored by
tcp_select_initial_window().
To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
net.core.rmem_max and socket's initial buffer space.
Fixes: b0573dea1f ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 714fb87e8b upstream.
Install the UBI device object before we arm sysfs.
Otherwise udev tries to read sysfs attributes before UBI is ready and
udev rules will not match.
Signed-off-by: Iosif Harutyunov <iharutyunov@sonicwall.com>
[rw: massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55f1cf83d5 upstream.
The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be
>=.
Fixes: 5f97f7f940 ('[PATCH] avr32 architecture')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 955818cd5b upstream.
ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.
Fixes: 06222e491e ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
[bwh: Backported to 3.2:
- We don't use vfs_setpos(); instead set ret = -EINVAL or ret = offset
directly
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4097461897 upstream.
As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we
have a hard load dependency between i8042 and atkbd which prevents
keyboard from working on Gen2 Hyper-V VMs.
> hyperv_keyboard invokes serio_interrupt(), which needs a valid serio
> driver like atkbd.c. atkbd.c depends on libps2.c because it invokes
> ps2_command(). libps2.c depends on i8042.c because it invokes
> i8042_check_port_owner(). As a result, hyperv_keyboard actually
> depends on i8042.c.
>
> For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a
> Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m
> rather than =y, atkbd.ko can't load because i8042.ko can't load(due to
> no i8042 device emulated) and finally hyperv_keyboard can't work and
> the user can't input: https://bugs.archlinux.org/task/39820
> (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y)
To break the dependency we move away from using i8042_check_port_owner()
and instead allow serio port owner specify a mutex that clients should use
to serialize PS/2 command stream.
Reported-by: Mark Laws <mdl@60hz.org>
Tested-by: Mark Laws <mdl@60hz.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ac36a4ada upstream.
If 'tunnel' is NULL we should return -EBADF but the 'end_put_sess' path
unconditionally sets 'error' back to zero. Rework the error path so it
more closely matches pppol2tp_sendmsg.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b46211d6dc upstream.
Add missing sysconfig/sysstatus information
to OMAP3 hwmod. The information has been
checked against OMAP34xx and OMAP36xx TRM.
Without this change DSI block is not reset
during boot, which is required for working
Nokia N950 display.
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 368301f2fe upstream.
With this command sequence:
modprobe plip
modprobe pps_parport
rmmod pps_parport
the partport_pps modules causes this crash:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: parport_detach+0x1d/0x60 [pps_parport]
Oops: 0000 [#1] SMP
...
Call Trace:
parport_unregister_driver+0x65/0xc0 [parport]
SyS_delete_module+0x187/0x210
The sequence that builds up to this is:
1) plip is loaded and takes the parport device for exclusive use:
plip0: Parallel port at 0x378, using IRQ 7.
2) pps_parport then fails to grab the device:
pps_parport: parallel port PPS client
parport0: cannot grant exclusive access for device pps_parport
pps_parport: couldn't register with parport0
3) rmmod of pps_parport is then killed because it tries to access
pardev->name, but pardev (taken from port->cad) is NULL.
So add a check for NULL in the test there too.
Link: http://lkml.kernel.org/r/20160714115245.12651-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Rodolfo Giometti <giometti@enneenne.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 149a4fddd0 upstream.
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 510cccb5b0 upstream.
The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
which is currently 256, whereas number of keys/buttons in input device (and
therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
out-of-bound access when we do
sym = U(key_maps[0][k]);
with large 'k'.
To fix it we should not attempt iterating beyond smaller of NR_KEYS and
KEY_CNT.
Also while at it let's switch to for_each_set_bit() instead of open-coding
it.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b53893aae4 upstream.
According to the datasheet you should only write 1 to this bit. If it is
not set, at least AIN3 will return bad values on newer silicon revisions.
Fixes: d84ca5b345 ("hwmon: Add driver for ADT7411 voltage and temperature sensor")
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 144f4c9839 upstream.
nand_do_write_ops() determines if it is writing a partial page with the
formula:
part_pagewr = (column || writelen < (mtd->writesize - 1))
When 'writelen' is exactly 1 byte less than the NAND page size the formula
equates to zero, so the code doesn't process it as a partial write,
although it should.
As a consequence the function remains in the while(1) loop with 'writelen'
becoming 0xffffffff and iterating endlessly.
The bug may not be easy to reproduce in Linux since user space tools
usually force the padding or round-up the write size to a page-size
multiple.
This was discovered in U-Boot where the issue can be reproduced by
writing any size that is 1 byte less than a page-size multiple.
For example, on a NAND with 2K page (0x800):
=> nand erase.part <partition>
=> nand write $loadaddr <partition> 7ff
[Editor's note: the bug was added in commit 29072b9607, but moved
around in commit 66507c7bc8 ("mtd: nand: Add support to use nand_base
poi databuf as bounce buffer")]
Fixes: 29072b9607 ("[MTD] NAND: add subpage write support")
Signed-off-by: Hector Palacios <hector.palacios@digi.com>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[bwh: Backported to 3.2: adjusted context as noted above]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f823a2aa8f upstream.
wlc_phy_txpower_get_current() does a logical OR of power->flags, which
presumes that power.flags was initiliazed earlier by the caller,
unfortunately, this is not the case, so make sure we zero out the struct
tx_power before calling into wlc_phy_txpower_get_current().
Reported-by: coverity (CID 146011)
Fixes: 5b435de0d7 ("net: wireless: add brcm80211 drivers")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9754d45e99 upstream.
Some chips incorrectly support partial reads from TPM_STS register
at non-zero offsets. Read the entire 32-bits register instead of
making two 8-bit reads to support such devices and reduce the number
of bus transactions when obtaining the burstcount from TPM_STS.
Fixes: 27084efee0 ("tpm: driver for next generation TPM chips")
Signed-off-by: Andrey Pronin <apronin@chromium.org>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
[bwh: Backported to 3.2:
- Use raw ioread32() instead of tpm_tis_read32()
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f070e81be upstream.
When there is more data to be processed, the current test in
scatterwalk_done may prevent us from calling pagedone even when
we should.
In particular, if we're on an SG entry spanning multiple pages
where the last page is not a full page, we will incorrectly skip
calling pagedone on the second last page.
This patch fixes this by adding a separate test for whether we've
reached the end of a page.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23bc6ab0a0 upstream.
When we retrieve imtu value from userspace we should use 16 bit pointer
cast instead of 32 as it's defined that way in headers. Fixes setsockopt
calls on big-endian platforms.
Signed-off-by: Amadeusz Sławiński <amadeusz.slawinski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c0415fa08 upstream.
This patch adds support for 0x1206 PID of Telit LE910.
Since the interfaces positions are the same than the ones for
0x1043 PID of Telit LE922, telit_le922_blacklist_usbcfg3 is used.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79ad07d457 upstream.
There is a cut and paste issue here. The bug is that we are allocating
more memory than necessary for msp_maps. We should be allocating enough
space for a map_info struct (144 bytes) but we instead allocate enough
for an mtd_info struct (1840 bytes). It's a small waste.
The other part of this is not harmful but when we allocated msp_flash
then we allocated enough space fro a map_info pointer instead of an
mtd_info pointer. But since pointers are the same size it works out
fine.
Anyway, I decided to clean up all three allocations a bit to make them
a bit more consistent and clear.
Fixes: 68aa0fa87f ('[MTD] PMC MSP71xx flash/rootfs mappings')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c65d5c6c81 upstream.
If we encounter a filesystem error during orphan cleanup, we should stop.
Otherwise, we may end up in an infinite loop where the same inode is
processed again and again.
EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended
EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters
Aborting journal on device loop0-8.
EXT4-fs (loop0): Remounting filesystem read-only
EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure
EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted
EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted
EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted
EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed!
[...]
EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed!
[...]
See-also: c9eb13a910 ("ext4: fix hang when processing corrupted orphaned inode list")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 554a5ccc4e upstream.
If we hit this error when mounted with errors=continue or
errors=remount-ro:
EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata
then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to
continue. However, ext4_mb_release_context() is the wrong thing to call
here since we are still actually using the allocation context.
Instead, just error out. We could retry the allocation, but there is a
possibility of getting stuck in an infinite loop instead, so this seems
safer.
[ Fixed up so we don't return EAGAIN to userspace. --tytso ]
Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[bwh: Backported to 3.2:
- Use EIO instead of EFSCORRUPTED
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f1fe81123 upstream.
When freeing the nested resources of a vcpu, there is an assumption that
the vcpu's vmcs01 is the current VMCS on the CPU that executes
nested_release_vmcs12(). If this assumption is violated, the vcpu's
vmcs01 may be made active on multiple CPUs at the same time, in
violation of Intel's specification. Moreover, since the vcpu's vmcs01 is
not VMCLEARed on every CPU on which it is active, it can linger in a
CPU's VMCS cache after it has been freed and potentially
repurposed. Subsequent eviction from the CPU's VMCS cache on a capacity
miss can result in memory corruption.
It is not sufficient for vmx_free_vcpu() to call vmx_load_vmcs01(). If
the vcpu in question was last loaded on a different CPU, it must be
migrated to the current CPU before calling vmx_load_vmcs01().
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: vcpu_load() returns void]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4fa7734c62 upstream.
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs. However, this is not
available anymore after commit 26a865f4aa (KVM: VMX: fix use after
free of vmx->loaded_vmcs, 2014-01-03).
Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.
Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 386512d18b upstream.
In case any operation fails before we can successfully go the point
where we would register a MDIO bus, we would be going to an error label
which involves unregistering then freeing this yet to be created MDIO
bus. Update all error paths to go to label free which is the only one
valid until either the clock is enabled, or the MDIO bus is allocated
and registered. This fixes kernel oops observed while trying to
dereference the MDIO bus structure which is not yet allocated.
Fixes: a170285772 ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4e187d83d upstream.
Before commit 778be232a2 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a2 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0533b13072 upstream.
If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.
Fixes: 9e701c6109 ("svcrpc: simpler request dropping")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit abb2bafd29 upstream.
The EFI firmware on Macs contains a full-fledged network stack for
downloading OS X images from osrecovery.apple.com. Unfortunately
on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
wireless card on every boot and leaves it enabled even after
ExitBootServices has been called. The card continues to assert its IRQ
line, causing spurious interrupts if the IRQ is shared. It also corrupts
memory by DMAing received packets, allowing for remote code execution
over the air. This only stops when a driver is loaded for the wireless
card, which may be never if the driver is not installed or blacklisted.
The issue seems to be constrained to the Broadcom 4331. Chris Milsted
has verified that the newer Broadcom 4360 built into the MacBookPro11,3
(2013/2014) does not exhibit this behaviour. The chances that Apple will
ever supply a firmware fix for the older machines appear to be zero.
The solution is to reset the card on boot by writing to a reset bit in
its mmio space. This must be done as an early quirk and not as a plain
vanilla PCI quirk to successfully combat memory corruption by DMAed
packets: Matthew Garrett found out in 2012 that the packets are written
to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
This type of memory is made available to the page allocator by
efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
subsys initcall level. In-between a time window would be open for memory
corruption. Random crashes occurring in this time window and attributed
to DMAed packets have indeed been observed in the wild by Chris
Bainbridge.
When Matthew Garrett analyzed the memory corruption issue in 2012, he
sought to fix it with a grub quirk which transitions the card to D3hot:
http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
This approach does not help users with other bootloaders and while it
may prevent DMAed packets, it does not cure the spurious interrupts
emanating from the card. Unfortunately the card's mmio space is
inaccessible in D3hot, so to reset it, we have to undo the effect of
Matthew's grub patch and transition the card back to D0.
Note that the quirk takes a few shortcuts to reduce the amount of code:
The size of BAR 0 and the location of the PM capability is identical
on all affected machines and therefore hardcoded. Only the address of
BAR 0 differs between models. Also, it is assumed that the BCMA core
currently mapped is the 802.11 core. The EFI driver seems to always take
care of this.
Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
towards finding the best solution to this problem.
The following should be a comprehensive list of affected models:
iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16]
iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16]
Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12]
Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12]
Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12]
MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12]
MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12]
MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12]
For posterity, spurious interrupts caused by the Broadcom 4331 wireless
card resulted in splats like this (stacktrace omitted):
irq 17: nobody cared (try booting with the "irqpoll" option)
handlers:
[<ffffffff81374370>] pcie_isr
[<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
[<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
Disabling IRQ #17
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1]
Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1]
Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2]
Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1]
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris Milsted <cmilsted@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Michael Buesch <m@bues.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: b43-dev@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
[ Did minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- early_ioremap() is declared in <asm/io.h> not <asm/early_ioremap.h>
- Add definition of BCMA_RESET_ST
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 850c321027 upstream.
We used to scan secondary buses until the following commit that
was applied in 2009:
8659c406ad ("x86: only scan the root bus in early PCI quirks")
which commit constrained early quirks to the root bus only. Its
motivation was to prevent application of the nvidia_bugs quirk
on secondary buses.
We're about to add a quirk to reset the Broadcom 4331 wireless card on
2011/2012 Macs, which is located on a secondary bus behind a PCIe root
port. To facilitate that, reintroduce scanning of secondary buses.
The commit message of 8659c406ad notes that scanning only the root bus
"saves quite some unnecessary scanning work". The algorithm used prior
to 8659c406ad was particularly time consuming because it scanned
buses 0 to 31 brute force. To avoid lengthening boot time, employ a
recursive strategy which only scans buses that are actually reachable
from the root bus.
Yinghai Lu pointed out that the secondary bus number read from a
bridge's config space may be invalid, in particular a value of 0 would
cause an infinite loop. The PCI core goes beyond that and recurses to a
child bus only if its bus number is greater than the parent bus number
(see pci_scan_bridge()). Since the root bus is numbered 0, this implies
that secondary buses may not be 0. Do the same on early scanning.
If this algorithm is found to significantly impact boot time or cause
infinite loops on broken hardware, it would be possible to limit its
recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
at depth 0, so the bus need not be scanned deeper than that for now. An
alternative approach would be to revert to scanning only the root bus,
and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
and 8086:1e16. Apple always positioned the card behind either of these
three ports. The quirk would then check presence of the card in slot 0
below the root port and do its deed.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 205e1e255c upstream.
Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.
This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.
Fixes: 1f461dcdd2 ("ppp: take reference on channels netns")
Reported-by: Matt Bennett <Matt.Bennett@alliedtelesis.co.nz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux-ppp@vger.kernel.org
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f388cdcdd1 upstream.
snd_ctl_remove() has a notification for the removal event. It's
superfluous when done during the device got disconnected. Although
the notification itself is mostly harmless, it may potentially be
harmful, and should be suppressed. Actually some components PCM may
free ctl elements during the disconnect or free callbacks, thus it's
no theoretical issue.
This patch adds the check of card->shutdown flag for avoiding
unnecessary notifications after (or during) the disconnect.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 14ff8d48f2 upstream.
DRM_CONNECTOR_POLL_CONNECT only enables polling for connections, not
disconnections. Because of this, we end up losing hotplug polling for
analog connectors once they get connected.
Easy way to reproduce:
- Grab a machine with a radeon GPU and a VGA port
- Plug a monitor into the VGA port, wait for it to update the connector
from disconnected to connected
- Disconnect the monitor on VGA, a hotplug event is never sent for the
removal of the connector.
Originally, only using DRM_CONNECTOR_POLL_CONNECT might have been a good
idea since doing VGA polling can sometimes result in having to mess with
the DAC voltages to figure out whether or not there's actually something
there since VGA doesn't have HPD. Doing this would have the potential of
showing visible artifacts on the screen every time we ran a poll while a
VGA display was connected. Luckily, radeon_vga_detect() only resorts to
this sort of polling if the poll is forced, and DRM's polling helper
doesn't force it's polls.
Additionally, this removes some assignments to connector->polled that
weren't actually doing anything.
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b9554dc5b upstream.
If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory. Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- Drop the second check in ext4_init_block_bitmap() since it can't return
an error code
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6a7fd522a7 upstream.
If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up. In that case, the iput() on the journal inode can
end up causing a BUG().
Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.
Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f70749ca42 upstream.
An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:
ext4_lblk_t last = lblock + len - 1;
if (len == 0 || lblock > last)
return 0;
since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().
We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).
Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865f ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15e4292a2d upstream.
This patch fixes an issue that the CFIFOSEL register value is possible
to be changed by usbhsg_ep_enable() wrongly. And then, a data transfer
using CFIFO may not work correctly.
For example:
# modprobe g_multi file=usb-storage.bin
# ifconfig usb0 192.168.1.1 up
(During the USB host is sending file to the mass storage)
# ifconfig usb0 down
In this case, since the u_ether.c may call usb_ep_enable() in
eth_stop(), if the renesas_usbhs driver is also using CFIFO for
mass storage, the mass storage may not work correctly.
So, this patch adds usbhs_lock() and usbhs_unlock() calling in
usbhsg_ep_enable() to protect CFIFOSEL register. This is because:
- CFIFOSEL.CURPIPE = 0 is also needed for the pipe configuration
- The CFIFOSEL (fifo->sel) is already protected by usbhs_lock()
Fixes: 97664a207b ("usb: renesas_usbhs: shrink spin lock area")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4fdef69838 upstream.
This patch fixes an issue that the xfer_work() is possible to cause
NULL pointer dereference if the usb cable is disconnected while data
transfer is running.
In such case, a gadget driver may call usb_ep_disable()) before
xfer_work() is actually called. In this case, the usbhs_pkt_pop()
will call usbhsf_fifo_unselect(), and then usbhs_pipe_to_fifo()
in xfer_work() will return NULL.
Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 926b7b5122 upstream.
On non-DeviceTree platforms, the index of serial device is a static
variable incremented on each probe. It is incremented even if deferred
probe happens when getting the clock in s3c24xx_serial_init_port().
This index is used for referencing elements of statically allocated
s3c24xx_serial_ports array. In case of re-probe, the index will point
outside of this array leading to memory corruption.
Increment the index only on successful probe.
Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Fixes: b497549a03 ("[ARM] S3C24XX: Split serial driver into core and per-cpu drivers")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b30bdfa864 upstream.
As it is if you ask for a sync gcm you may actually end up with
an async one because it does not filter out async implementations
of ghash.
This patch fixes this by adding the necessary filter when looking
for ghash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d89e5478b upstream.
Commit:
e9532e69b8 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")
... set rq->prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq->prev_steal_time.
However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch. Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq->prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.
Since the root cause has been fixed, we can just revert commit e9532e69b8.
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 'commit e9532e69b8 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 45b64ee649 upstream.
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
addressable memory by referring to ibm,dyanamic-memory property. There
are three problems with the current approach:
1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
all the LMBs of the guest, but that is not true for PowerKVM which
populates only DR LMBs (LMBs that can be hotplugged/removed) in that
property.
2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
at the max possible address. Since ibm,dynamic-memory doesn't include
RMA LMBs, the address thus obtained will be less than the actual max
address. For example, if max possible memory size is 32G, with lmb-size
of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
which won't be present here). hot_add_drconf_memory_max() would then
return the max addressable memory as 127 * 256MB = 31.75GB, the max
address should have been 32G which is what ibm,lrdr-capacity shows.
3 In PowerKVM, there can be a gap between the end of boot time RAM and
beginning of hotplug RAM area. So just multiplying lmb-count with
lmb-size will not provide the correct max possible address for PowerKVM.
This patch fixes 1 by using ibm,lrdr-capacity property to return the max
addressable memory whenever the property is present. Then it fixes 2 & 3
by fetching the address of the last LMB in ibm,dynamic-memory property.
Fixes: cd34206e94 ("powerpc: Add memory_hotplug_max()")
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0e0e367740 upstream.
It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.
Reported-by: Maninder Singh <maninder1.s@samsung.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 19be0eaffa upstream.
faultin_page drops FOLL_WRITE after the page fault handler did the CoW
and then we retry follow_page_mask to get our CoWed page. This is racy,
however because the page might have been unmapped by that time and so
we would have to do a page fault again, this time without CoW. This
would cause the page cache corruption for FOLL_FORCE on MAP_PRIVATE
read only mappings with obvious consequences.
This is an ancient bug that was actually already fixed once by Linus
eleven years ago in commit 4ceb5db975 ("Fix get_user_pages() race
for write access") but that was then undone due to problems on s390
by commit f33ea7f404 ("fix get_user_pages bug") because s390 didn't
have proper dirty pte tracking until abf09bed3c ("s390/mm: implement
software dirty bits"). This wasn't a problem at the time as pointed out
by Hugh Dickins because madvise relied on mmap_sem for write up until
0a27a14a62 ("mm: madvise avoid exclusive mmap_sem") but since then we
can race with madvise which can unmap the fresh COWed page or with KSM
and corrupt the content of the shared page.
This patch is based on the Linus' approach to not clear FOLL_WRITE after
the CoW page fault (aka VM_FAULT_WRITE) but instead introduces FOLL_COW
to note this fact. The flag is then rechecked during follow_pfn_pte to
enforce the page fault again if we do not see the CoWed page. Linus was
suggesting to check pte_dirty again as s390 is OK now. But that would
make backporting to some old kernels harder. So instead let's just make
sure that vm_normal_page sees a pure anonymous page.
This would guarantee we are seeing a real CoW page. Introduce
can_follow_write_pte which checks both pte_write and falls back to
PageAnon on forced write faults which passed CoW already. Thanks to Hugh
to point out that a special care has to be taken for KSM pages because
our COWed page might have been merged with a KSM one and keep its
PageAnon flag.
Fixes: 0a27a14a62 ("mm: madvise avoid exclusive mmap_sem")
Reported-by: Phil "not Paul" Oester <kernel@linuxace.com>
Disclosed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
[bwh: Backported to 3.2:
- Adjust filename, context, indentation
- The 'no_page' exit path in follow_page() is different, so open-code the
cleanup
- Delete a now-unused label]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43761473c2 upstream.
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
[bwh: Backported to 3.2:
- In audit_log_execve_info() various information is retrieved via
the extra parameter struct audit_aux_data_execve *axi
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 75ff39ccc1 upstream.
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()
- Open-code prandom_u32_max()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4116def233 upstream.
The last field "flags" of object "minfo" is not initialized.
Copying this object out may leak kernel stack data.
Assign 0 to it to avoid leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d2be1422e upstream.
link_info.str is a char array of size 60. Memory after the NULL
byte is not initialized. Sending the whole object out can cause
a leak.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: the unpadded strcpy() is in tipc_node_get_links()
and no nlattr is involved, so use strncpy()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4ec8cc803 upstream.
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9a47e9cff9 upstream.
The stack object “r1” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cec8f96e49 upstream.
The stack object “tread” has a total size of 32 bytes. Its field
“event” and “val” both contain 4 bytes padding. These 8 bytes
padding bytes are sent to user without being initialized.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 681fef8380 upstream.
The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
are padding bytes which are not initialized and leaked to userland
via “copy_to_user”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e54ad7f1ee upstream.
This prevents stacking filesystems (ecryptfs and overlayfs) from using
procfs as lower filesystem. There is too much magic going on inside
procfs, and there is no good reason to stack stuff on top of procfs.
(For example, procfs does access checks in VFS open handlers, and
ecryptfs by design calls open handlers from a kernel thread that doesn't
drop privileges or so.)
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 69c433ed2e upstream.
Add a simple read-only counter to super_block that indicates how deep this
is in the stack of filesystems. Previously ecryptfs was the only stackable
filesystem and it explicitly disallowed multiple layers of itself.
Overlayfs, however, can be stacked recursively and also may be stacked
on top of ecryptfs or vice versa.
To limit the kernel stack usage we must limit the depth of the
filesystem stack. Initially the limit is set to 2.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.2:
- Drop changes to overlayfs
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8da344b74 upstream.
In sess_auth_rawntlmssp_authenticate(), the ntlmssp blob is allocated
statically and its size is an "empirical" 5*sizeof(struct
_AUTHENTICATE_MESSAGE) (320B on x86_64). I don't know where this value
comes from or if it was ever appropriate, but it is currently
insufficient: the user and domain name in UTF16 could take 1kB by
themselves. Because of that, build_ntlmssp_auth_blob() might corrupt
memory (out-of-bounds write). The size of ntlmssp_blob in
SMB2_sess_setup() is too small too (sizeof(struct _NEGOTIATE_MESSAGE)
+ 500).
This patch allocates the blob dynamically in
build_ntlmssp_auth_blob().
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- Adjust context, indentation
- build_ntlmssp_auth_blob() is static
- Drop changes to smb2pdu.c
- Use cERROR() instead of cifs_dbg(VFS, ...)
- Use MAX_USERNAME_SIZE instead of CIFS_MAX_USERNAME_LEN]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0fe970df3 upstream.
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7831b4ff0d upstream.
A qeth_card contains a napi_struct linked to the net_device during
device probing. This struct must be deleted when removing the qeth
device, otherwise Panic on oops can occur when qeth devices are
repeatedly removed and added.
Fixes: a1c3ed4c9c ("qeth: NAPI support for l2 and l3 discipline")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Tested-by: Alexander Klein <ALKL@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3fa6993fef upstream.
The user timer tu->qused counter may go to a negative value when
multiple concurrent reads are performed since both the check and the
decrement of tu->qused are done in two individual locked contexts.
This results in bogus read outs, and the endless loop in the
user-space side.
The fix is to move the decrement of the tu->qused counter into the
same spinlock context as the zero-check of the counter.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f87fda00b6 upstream.
ether_addr_equal_64bits() requires some care about its arguments,
namely that 8 bytes might be read, even if last 2 byte values are not
used.
KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
in bond_3ad.c
Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
Although the 8-byte alignment was there, KASan would detect out
of bound accesses.
Fixes: 815117adaf ("bonding: use ether_addr_equal_unaligned for bond addr compare")
Fixes: bb54e58929 ("bonding: Verify RX LACPDU has proper dest mac-addr")
Fixes: 885a136c52 ("bonding: use compare_ether_addr_64bits() in ALB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust filename
- Drop change to bond_params::ad_actor_system
- Fix one more copy of null_mac_addr to use eth_zero_addr()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6d57e9078e upstream.
a lot of code has either the memset or an inefficient copy
from a static array that contains the all-zeros Ethernet address.
Introduce help function eth_zero_addr() to fill an address with
all zeros, making the code clearer and allowing us to get rid of
some constant arrays.
Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1ead852dd8 upstream.
Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.
AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.
At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.
Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.
Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 62db7152c9 upstream.
vortex_wtdma_bufshift() function does calculate the page index
wrongly, first masking then shift, which always results in zero.
The proper computation is to first shift, then mask.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c4604a298 upstream.
The tt_req_node is added and removed from a list inside a spinlock. But the
locking is sometimes removed even when the object is still referenced and
will be used later via this reference. For example batadv_send_tt_request
can create a new tt_req_node (including add to a list) and later
re-acquires the lock to remove it from the list and to free it. But at this
time another context could have already removed this tt_req_node from the
list and freed it.
CPU#0
batadv_batman_skb_recv from net_device 0
-> batadv_iv_ogm_receive
-> batadv_iv_ogm_process
-> batadv_iv_ogm_process_per_outif
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_ogm_receive
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_ogm_handler_v1
-> batadv_tt_update_orig
-> batadv_send_tt_request
-> batadv_tt_req_node_new
spin_lock(...)
allocates new tt_req_node and adds it to list
spin_unlock(...)
return tt_req_node
CPU#1
batadv_batman_skb_recv from net_device 1
-> batadv_recv_unicast_tvlv
-> batadv_tvlv_containers_process
-> batadv_tvlv_call_handler
-> batadv_tt_tvlv_unicast_handler_v1
-> batadv_handle_tt_response
spin_lock(...)
tt_req_node gets removed from list and is freed
spin_unlock(...)
CPU#0
<- returned to batadv_send_tt_request
spin_lock(...)
tt_req_node gets removed from list and is freed
MEMORY CORRUPTION/SEGFAULT/...
spin_unlock(...)
This can only be solved via reference counting to allow multiple contexts
to handle the list manipulation while making sure that only the last
context holding a reference will free the object.
Fixes: a73105b8d4 ("batman-adv: improved client announcement mechanism")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Tested-by: Martin Weinelt <martin@darmstadt.freifunk.net>
Tested-by: Amadeus Alfa <amadeus@chemnitz.freifunk.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Use struct tt_req_node instead of struct batadv_tt_req_node
- Use list_empty() instead of hlist_unhashed()
- Drop kernel-doc change]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e547f26283 upstream.
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c6795a9b3 upstream.
'commpage_bak' is allocated with 'sizeof(struct echoaudio)' bytes.
We then copy 'sizeof(struct comm_page)' bytes in it.
On my system, smatch complains because one is 2960 and the other is 3072.
This would result in memory corruption or a oops.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0c1f91b985 upstream.
These two spi_w8r8() calls return a value with is used by the code
following the error check. The dubious use was caused by a cleanup
patch.
Fixes: d34dbee8ac ("staging:iio:accel:kxsd9 cleanup and conversion to iio_chan_spec.")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef3149eb3d upstream.
sca3000_read_ctrl_reg() returns a negative number on failure, check for
this instead of zero.
Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 63d2f95d63 upstream.
The value `bytes' comes from the filesystem which is about to be
mounted. We cannot trust that the value is always in the range we
expect it to be.
Check its value before using it to calculate the length for the crc32_le
call. It value must be larger (or equal) sumoff + 4.
This fixes a kernel bug when accidentially mounting an image file which
had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
s_bytes value of 1. This caused an underflow when substracting sumoff +
4 (20) in the call to crc32_le.
BUG: unable to handle kernel paging request at ffff88021e600000
IP: crc32_le+0x36/0x100
...
Call Trace:
nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
nilfs_load_super_block+0x142/0x300 [nilfs2]
init_nilfs+0x60/0x390 [nilfs2]
nilfs_mount+0x302/0x520 [nilfs2]
mount_fs+0x38/0x160
vfs_kern_mount+0x67/0x110
do_mount+0x269/0xe00
SyS_mount+0x9f/0x100
entry_SYSCALL_64_fastpath+0x16/0x71
Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 93a2001bdf upstream.
This patch validates the num_values parameter from userland during the
HIDIOCGUSAGES and HIDIOCSUSAGES commands. Previously, if the report id was set
to HID_REPORT_ID_UNKNOWN, we would fail to validate the num_values parameter
leading to a heap overflow.
Signed-off-by: Scott Bauer <sbauer@plzdonthack.me>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2940e2c76 upstream.
When calculating the required size of an RC QP send queue, leave
enough space for masked atomic operations, which require more space than
"regular" atomic operation.
Fixes: 6fa8f71984 ("IB/mlx4: Add support for masked atomic operations")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 02ef871eca upstream.
Current overlap check is evaluating to false a case where a filter
field is fully contained (proper subset) of a r/w request. This
change applies classical overlap check instead to include all the
scenarios.
More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver
the logic is such that the entire confspace is read and written in 4
byte chunks. In this case as an example, CACHE_LINE_SIZE,
LATENCY_TIMER and PCI_BIST are arriving together in one call to
xen_pcibk_config_write() with offset == 0xc and size == 4. With the
exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length
== 1) is fully contained in the write request and hence is excluded
from write, which is incorrect.
Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 25e1ed6e64 upstream.
For 'real' hardware CAN devices the netlink interface is used to set CAN
specific communication parameters. Real CAN hardware can not be created nor
removed with the ip tool ...
This patch adds a private dellink function for the CAN device driver interface
that does just nothing.
It's a follow up to commit 993e6f2fd ("can: fix oops caused by wrong rtnl
newlink usage") but for dellink.
Reported-by: ajneu <ajneu1@gmail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ac1c17b20 upstream.
During page migrations UBIFS might get confused
and the following assert triggers:
[ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436)
[ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008
[ 213.490000] Hardware name: Allwinner sun4i/sun5i Families
[ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14)
[ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0)
[ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50)
[ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8)
[ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290)
[ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80)
[ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0)
[ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4)
[ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0)
[ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8)
[ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274)
[ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c)
[ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0)
[ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8)
[ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48)
[ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444)
[ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614)
[ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c)
[ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34)
UBIFS is using PagePrivate() which can have different meanings across
filesystems. Therefore the generic page migration code cannot handle this
case correctly.
We have to implement our own migration function which basically does a
plain copy but also duplicates the page private flag.
UBIFS is not a block device filesystem and cannot use buffer_migrate_page().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[rw: Massaged changelog, build fixes, etc...]
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Christoph Hellwig <hch@lst.de>
[bwh: Backported to 3.2:
- migrate_page_move_mapping() doesn't take an extra_count parameter
- Use literal 0 instead of MIGRATEPAGE_SUCCESS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1118dce773 upstream.
Export these symbols such that UBIFS can implement
->migratepage.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Christoph Hellwig <hch@lst.de>
[bwh: Backported to 3.2: also change migrate_page_move_mapping() from
static to extern, done as part of an earlier commit upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 972228d874 upstream.
recover_peb() was never power cut aware,
if a power cut happened right after writing the VID header
upon next attach UBI would blindly use the new partial written
PEB and all data from the old PEB is lost.
In order to make recover_peb() power cut aware, write the new
VID with a proper crc and copy_flag set such that the UBI attach
process will detect whether the new PEB is completely written
or not.
We cannot directly use ubi_eba_atomic_leb_change() since we'd
have to unlock the LEB which is facing a write error.
Reported-by: Jörg Pfähler <pfaehler@isse.de>
Reviewed-by: Jörg Pfähler <pfaehler@isse.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2:
- Adjust context
- Use next_sqnum() instead of ubi_next_sqnum()
- Use ubi_device::peb_buf1 instead of ubi_device::peb_buf
- No need to unlock ubi->fm_eba_sem on error]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43200a4480 upstream.
At high bus load it could happen that "at91_poll()" enters with all RX
message boxes filled up. If then at the end the "quota" is exceeded as
well, "rx_next" will not be reset to the first RX mailbox and hence the
interrupts remain disabled.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Tested-by: Amr Bekhit <amrbekhit@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7613663cc1 upstream.
For security reasons ordinary user must not be able to control fan speed
via /proc/i8k by default. Some malicious software running under "nobody"
user could be able to turn fan off and cause HW problems. So this patch
changes default value of "restricted" parameter to 1.
Also restrict reading of DMI_PRODUCT_SERIAL from /proc/i8k via "restricted"
parameter. It is because non root user cannot read DMI_PRODUCT_SERIAL from
sysfs file /sys/class/dmi/id/product_serial.
Old non secure behaviour of file /proc/i8k can be achieved by loading this
module with "restricted" parameter set to 0.
Note that this patch has effects only for kernels compiled with CONFIG_I8K
and only for file /proc/i8k. Hwmon interface provided by this driver was
not changed and root access for setting fan speed was needed also before.
Reported-by: Mario Limonciello <Mario_Limonciello@dell.com>
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 32a5a0c047 upstream.
The isa_bus_init function must be called before drivers which utilize
the ISA bus driver are registered. A race condition for initilization
exists if device_initcall is used (the isa_bus_init callback is placed
in the same initcall level as dependent drivers which use module_init).
This patch ensures that isa_bus_init is called first by utilizing
postcore_initcall in favor of device_initcall.
Fixes: a5117ba7da ("[PATCH] Driver model: add ISA bus")
Cc: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c5122e45a upstream.
When this code was reworked for IBoE support the order of assignments
for the sl_tclass_flowlabel got flipped around resulting in
TClass & FlowLabel being permanently set to 0 in the packet headers.
This breaks IB routers that rely on these headers, but only affects
kernel users - libmlx4 does this properly for user space.
Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 38327424b4 upstream.
If __key_link_begin() failed then "edit" would be uninitialized. I've
added a check to fix that.
This allows a random user to crash the kernel, though it's quite
difficult to achieve. There are three ways it can be done as the user
would have to cause an error to occur in __key_link():
(1) Cause the kernel to run out of memory. In practice, this is difficult
to achieve without ENOMEM cropping up elsewhere and aborting the
attempt.
(2) Revoke the destination keyring between the keyring ID being looked up
and it being tested for revocation. In practice, this is difficult to
time correctly because the KEYCTL_REJECT function can only be used
from the request-key upcall process. Further, users can only make use
of what's in /sbin/request-key.conf, though this does including a
rejection debugging test - which means that the destination keyring
has to be the caller's session keyring in practice.
(3) Have just enough key quota available to create a key, a new session
keyring for the upcall and a link in the session keyring, but not then
sufficient quota to create a link in the nominated destination keyring
so that it fails with EDQUOT.
The bug can be triggered using option (3) above using something like the
following:
echo 80 >/proc/sys/kernel/keys/root_maxbytes
keyctl request2 user debug:fred negate @t
The above sets the quota to something much lower (80) to make the bug
easier to trigger, but this is dependent on the system. Note also that
the name of the keyring created contains a random number that may be
between 1 and 10 characters in size, so may throw the test off by
changing the amount of quota used.
Assuming the failure occurs, something like the following will be seen:
kfree_debugcheck: out of range ptr 6b6b6b6b6b6b6b68h
------------[ cut here ]------------
kernel BUG at ../mm/slab.c:2821!
...
RIP: 0010:[<ffffffff811600f9>] kfree_debugcheck+0x20/0x25
RSP: 0018:ffff8804014a7de8 EFLAGS: 00010092
RAX: 0000000000000034 RBX: 6b6b6b6b6b6b6b68 RCX: 0000000000000000
RDX: 0000000000040001 RSI: 00000000000000f6 RDI: 0000000000000300
RBP: ffff8804014a7df0 R08: 0000000000000001 R09: 0000000000000000
R10: ffff8804014a7e68 R11: 0000000000000054 R12: 0000000000000202
R13: ffffffff81318a66 R14: 0000000000000000 R15: 0000000000000001
...
Call Trace:
kfree+0xde/0x1bc
assoc_array_cancel_edit+0x1f/0x36
__key_link_end+0x55/0x63
key_reject_and_link+0x124/0x155
keyctl_reject_key+0xb6/0xe0
keyctl_negate_key+0x10/0x12
SyS_keyctl+0x9f/0xe7
do_syscall_64+0x63/0x13a
entry_SYSCALL64_slow_path+0x25/0x25
Fixes: f70e2e0619 ('KEYS: Do preallocation for __key_link()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit caf1ff26e1 upstream.
These days, we experienced one guest crash with 8 cores and 3 disks,
with qemu error logs as bellow:
qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
And then we found one patch(bdf026317d) in qemu tree, which said
could fix this bug.
Execute the following script will reproduce the BUG quickly:
irq_affinity.sh
========================================================================
vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
for irq in {1,2,4,8,10,20,40,80}
do
echo $irq > /proc/irq/$vda_irq_num/smp_affinity
echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
done
done
========================================================================
The following qemu log is added in the qemu code and is displayed when
this bug reproduced:
kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
irq_routes->nr: 1024, gsi_count: 1024.
That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
but in the kernel code when routes->nr >= 1024, will just return -EINVAL;
The nr is the number of the routing entries which is in of
[1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].
This patch fix the BUG above.
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com>
Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e1b1fc4da upstream.
Modules which register drivers via standard path (driver_register) in
parallel can cause a warning:
WARNING: CPU: 2 PID: 3492 at ../fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
sysfs: cannot create duplicate filename '/module/saa7146/drivers'
Modules linked in: hexium_gemini(+) mxb(+) ...
...
Call Trace:
...
[<ffffffff812e63a2>] sysfs_warn_dup+0x62/0x80
[<ffffffff812e6487>] sysfs_create_dir_ns+0x77/0x90
[<ffffffff8140f2c4>] kobject_add_internal+0xb4/0x340
[<ffffffff8140f5b8>] kobject_add+0x68/0xb0
[<ffffffff8140f631>] kobject_create_and_add+0x31/0x70
[<ffffffff8157a703>] module_add_driver+0xc3/0xd0
[<ffffffff8155e5d4>] bus_add_driver+0x154/0x280
[<ffffffff815604c0>] driver_register+0x60/0xe0
[<ffffffff8145bed0>] __pci_register_driver+0x60/0x70
[<ffffffffa0273e14>] saa7146_register_extension+0x64/0x90 [saa7146]
[<ffffffffa0033011>] hexium_init_module+0x11/0x1000 [hexium_gemini]
...
As can be (mostly) seen, driver_register causes this call sequence:
-> bus_add_driver
-> module_add_driver
-> module_create_drivers_dir
The last one creates "drivers" directory in /sys/module/<...>. When
this is done in parallel, the directory is attempted to be created
twice at the same time.
This can be easily reproduced by loading mxb and hexium_gemini in
parallel:
while :; do
modprobe mxb &
modprobe hexium_gemini
wait
rmmod mxb hexium_gemini saa7146_vv saa7146
done
saa7146 calls pci_register_driver for both mxb and hexium_gemini,
which means /sys/module/saa7146/drivers is to be created for both of
them.
Fix this by a new mutex in module_create_drivers_dir which makes the
test-and-create "drivers" dir atomic.
I inverted the condition and removed 'return' to avoid multiple
unlocks or a goto.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: fe480a2675 (Modules: only add drivers/ direcory if needed)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57675cb976 upstream.
Lengthy output of sysrq-w may take a lot of time on slow serial console.
Currently we reset NMI-watchdog on the current CPU to avoid spurious
lockup messages. Sometimes this doesn't work since softlockup watchdog
might trigger on another CPU which is waiting for an IPI to proceed.
We reset softlockup watchdogs on all CPUs, but we do this only after
listing all tasks, and this may be too late on a busy system.
So, reset watchdogs CPUs earlier, in for_each_process_thread() loop.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1465474805-14641-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dcfc47248d upstream.
Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
the flags register in the case of a fault fixup on single-stepping.
If we put a kprobe on the instruction which caused a
page fault (e.g. actual mov instructions in copy_user_*),
that fault happens on the single-stepping buffer. In this
case, kprobes resets running instance so that the CPU can
retry execution on the original ip address.
However, current code forgets to reset the TF bit. Since this
fault happens with TF bit set for enabling single-stepping,
when it retries, it causes a debug exception and kprobes
can not handle it because it already reset itself.
On the most of x86-64 platform, it can be easily reproduced
by using kprobe tracer. E.g.
# cd /sys/kernel/debug/tracing
# echo p copy_user_enhanced_fast_string+5 > kprobe_events
# echo 1 > events/kprobes/enable
And you'll see a kernel panic on do_debug(), since the debug
trap is not handled by kprobes.
To fix this problem, we just need to clear the TF bit when
resetting running kprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
[ Updated the comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 05082b8bbd upstream.
When executing in a PCI passthrough based virtuzliation environment, the
hypervisor will usually attempt to send a PCIe bus reset signal to the
ASIC when the VM reboots. In this scenario, the card is not correctly
initialized, but we still consider it to be posted. Therefore, in a
passthrough based environemnt we should always post the card to guarantee
it is in a good state for driver initialization.
Ported from amdgpu commit:
amdgpu: fix asic initialization for virtualized environments
Cc: Andres Rodriguez <andres.rodriguez@amd.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9954382335 upstream.
When attaching a pollfunc iio_trigger_attach_poll_func will allocate a
virtual irq and call the driver's set_trigger_state function. Fix error
handling to undo previous steps if any fails.
In particular this fixes handling errors from a driver's
set_trigger_state function. When using triggered buffers a failure to
enable the trigger used to make the buffer unusable.
Signed-off-by: Crestez Dan Leonard <leonard.crestez@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5dd72ecb01 upstream.
Both of these are decidedly silly bugs show up whilst testing
completely different code paths.
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7b2c17f829 upstream.
Ensure that the endpoint is stopped by clearing REQPKT before
clearing DATAERR_NAKTIMEOUT before rotating the queue on the
dedicated bulk endpoint.
This addresses an issue where a race could result in the endpoint
receiving data before it was reprogrammed resulting in a warning
about such data from musb_rx_reinit before it was thrown away.
The data thrown away was a valid packet that had been correctly
ACKed which meant the host and device got out of sync.
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3eec0cf78 upstream.
shared_fifo endpoints would only get a previous tx state cleared
out, the rx state was only cleared for non shared_fifo endpoints
Change this so that the rx state is cleared for all endpoints.
This addresses an issue that resulted in rx packets being dropped
silently.
Signed-off-by: Andrew Goodbody <andrew.goodbody@cambrionix.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0015f91560 upstream.
This loop is supposed to set all the .num[] values to -1 but it's off by
one so it skips the first element and sets one element past the end of
the array.
I've cleaned up the loop a little as well.
Fixes: ddf8abd259 ('USB: f_fs: the FunctionFS driver')
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2:
- Adjust filename, context
- Add 'i' for iteration but don't bother with 'eps_ptr' as the calculation is
simpler here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d56c25e3b upstream.
Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay. Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.
Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover. Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- Adjust context
- Also set the flag in __d_materialise_dentry())]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c77679cad upstream.
For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b78f26088 upstream.
One of the debian buildd servers had this crash in the syslog without
any other information:
Unaligned handler failed, ret = -2
clock_adjtime (pid 22578): Unaligned data reference (code 28)
CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G E 4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00001000000001001111100000001111 Tainted: G E
r00-03 000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
r04-07 00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
r08-11 0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
r12-15 000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
r16-19 0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
r20-23 0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
r24-27 0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
r28-31 0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
sr00-03 0000000001200000 0000000001200000 0000000000000000 0000000001200000
sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
IIR: 0ca0d089 ISR: 0000000001200000 IOR: 00000000fa6f7fff
CPU: 1 CR30: 00000001bde7c000 CR31: ffffffffffffffff
ORIG_R28: 00000002369fe628
IAOQ[0]: compat_get_timex+0x2dc/0x3c0
IAOQ[1]: compat_get_timex+0x2e0/0x3c0
RP(r2): compat_get_timex+0x40/0x3c0
Backtrace:
[<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
[<0000000040205024>] syscall_exit+0x0/0x14
This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.
The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in. The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.
The following program reproduces the problem:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>
int main(void) {
/* allocate 8k */
char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* free second half (upper 4k) and make it invalid. */
munmap(ptr+4096, 4096);
/* syscall where first int is unaligned and clobbers into invalid memory region */
/* syscall should return EFAULT */
return syscall(__NR_clock_adjtime, 0, ptr+4095);
}
To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.
While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e2dfb4b880 upstream.
PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().
Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.
Fix this by reverting the previous change.
Fixes: 8130b9d7b9 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Simon Marchi <simon.marchi@ericsson.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 62397da50b upstream.
A wmediumd that does not send this attribute causes a NULL pointer
dereference, as the attribute is accessed even if it does not exist.
The attribute was required but never checked ever since userspace frame
forwarding has been introduced. The issue gets more problematic once we
allow wmediumd registration from user namespaces.
Fixes: 7882513bac ("mac80211_hwsim driver support userspace frame tx/rx")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f49cf3b8b4 upstream.
Pwm config may sleep so defer it using a worker.
On a Freescale i.MX53 based board we ran into "BUG: scheduling while
atomic" because input_inject_event locks interrupts, but
imx_pwm_config_v2 sleeps.
Tested on Freescale i.MX53 SoC with 4.6.0.
Signed-off-by: Manfred Schlaegl <manfred.schlaegl@gmx.at>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1b12075ff upstream.
Calling pwm_config() with a period equal to zero always results in
error (-EINVAL) and pwm chip config method is never called.
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1ff5fa3c67 upstream.
After initially connecting a wired Xbox 360 controller or sending it
a command to change LEDs, a status/response packet is interpreted as
controller input. This causes the state of buttons represented in
byte 2 of the controller data packet to be incorrect until the next
valid input packet. Wireless Xbox 360 controllers are not affected.
Writing a new value to the LED device while holding the Start button
and running jstest is sufficient to reproduce this bug. An event will
come through with the Start button released.
Xboxdrv also won't attempt to read controller input from a packet
where byte 0 is non-zero. It also checks that byte 1 is 0x14, but
that value differs between wired and wireless controllers and this
code is shared by both. I think just checking byte 0 is enough to
eliminate unwanted packets.
The following are some examples of 3-byte status packets I saw:
01 03 02
02 03 00
03 03 03
08 03 00
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bf959931dd upstream.
The following program (simplified version of generated by syzkaller)
#include <pthread.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <stdio.h>
#include <signal.h>
void *thread_func(void *arg)
{
ptrace(PTRACE_TRACEME, 0,0,0);
return 0;
}
int main(void)
{
pthread_t thread;
if (fork())
return 0;
while (getppid() != 1)
;
pthread_create(&thread, NULL, thread_func, NULL);
pthread_join(thread, NULL);
return 0;
}
creates an unreapable zombie if /sbin/init doesn't use __WALL.
This is not a kernel bug, at least in a sense that everything works as
expected: debugger should reap a traced sub-thread before it can reap the
leader, but without __WALL/__WCLONE do_wait() ignores sub-threads.
Unfortunately, it seems that /sbin/init in most (all?) distributions
doesn't use it and we have to change the kernel to avoid the problem.
Note also that most init's use sys_waitid() which doesn't allow __WALL, so
the necessary user-space fix is not that trivial.
This patch just adds the "ptrace" check into eligible_child(). To some
degree this matches the "tsk->ptrace" in exit_notify(), ->exit_signal is
mostly ignored when the tracee reports to debugger. Or WSTOPPED, the
tracer doesn't need to set this flag to wait for the stopped tracee.
This obviously means the user-visible change: __WCLONE and __WALL no
longer have any meaning for debugger. And I can only hope that this won't
break something, but at least strace/gdb won't suffer.
We could make a more conservative change. Say, we can take __WCLONE into
account, or !thread_group_leader(). But it would be nice to not
complicate these historical/confusing checks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit affa80bd97 upstream.
When running a 32-bit userspace on a 64-bit kernel, the UI_SET_PHYS
ioctl needs to be treated with special care, as it has the pointer
size encoded in the command.
Signed-off-by: Ricky Liang <jcliang@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4b9c7f9db9 upstream.
Commit 176e21ee2e ("SUNRPC: Support for RPC over AF_LOCAL
transports") added a 5-character netid, but did not bump
RPCBIND_MAXNETIDLEN from 4 to 5.
Fixes: 176e21ee2e ("SUNRPC: Support for RPC over AF_LOCAL ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a967d6c9b upstream.
Only server which map unknown users to guest will allow
access using a non-null NTLMv2_Response.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- Adjust context, indentation
- Keep using cERROR()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 777f69b8d2 upstream.
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- Adjust context, indentation
- Keep using cERROR()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa8f3a354b upstream.
Only server which map unknown users to guest will allow
access using a non-null LMChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- Adjust context, indentation
- Keep ses->flags assignment out of the new if-statement]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cfda35d982 upstream.
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client:
...
Set NullSession to FALSE
If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND
AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND
(AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1)
OR
AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0))
-- Special case: client requested anonymous authentication
Set NullSession to TRUE
...
Only server which map unknown users to guest will allow
access using a non-null NTChallengeResponse.
For Samba it's the "map to guest = bad user" option.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: keep using cERROR()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad67b437f1 upstream.
b84106b4e2 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec. But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:
pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]
Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.
Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check. We haven't had reports of devices with broken SR-IOV
BARs yet.
[bhelgaas: changelog]
Fixes: b84106b4e2 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da77b67195 upstream.
Commit b894157145 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
BARs. Home Agent 1 also has non-compliant BARs.
Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
touch them.
The problem with these devices is documented in the Xeon v4 specification
update:
BDF2 PCI BARs in the Home Agent Will Return Non-Zero Values
During Enumeration
Problem: During system initialization the Operating System may access
the standard PCI BARs (Base Address Registers). Due to
this erratum, accesses to the Home Agent BAR registers (Bus
1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
non-zero values.
Implication: The operating system may issue a warning. Intel has not
observed any functional failures due to this erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
Fixes: b894157145 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bb07e155b upstream.
Prevent using uninitialized or negative index when handling
steering entries.
Fixes: b12d93d63c ('mlx4: Add support for promiscuous mode in the new steering model.')
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c447116d0 upstream.
Some eMMCs set the partition switch timeout too low.
Now typically eMMCs are considered a critical component (e.g. because
they store the root file system) and consequently are expected to be
reliable. Thus we can neglect the use case where eMMCs can't switch
reliably and we might want a lower timeout to facilitate speedy
recovery.
Although we could employ a quirk for the cards that are affected (if
we could identify them all), as described above, there is little
benefit to having a low timeout, so instead simply set a minimum
timeout.
The minimum is set to 300ms somewhat arbitrarily - the examples that
have been seen had a timeout of 10ms but were sometimes taking 60-70ms.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 20878232c5 upstream.
Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
have no load at all.
Uptime and /proc/loadavg on all systems with kernels released during the
last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
idle systems, but the way the kernel calculates this value prevents it
from getting lower than the mentioned values.
Likewise but not as obviously noticeable, a fully loaded system with no
processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
(multiplied by number of cores).
Once the (old) load becomes 93 or higher, it mathematically can never
get lower than 93, even when the active (load) remains 0 forever.
This results in the strange 0.00, 0.01, 0.05 uptime values on idle
systems. Note: 93/2048 = 0.0454..., which rounds up to 0.05.
It is not correct to add a 0.5 rounding (=1024/2048) here, since the
result from this function is fed back into the next iteration again,
so the result of that +0.5 rounding value then gets multiplied by
(2048-2037), and then rounded again, so there is a virtual "ghost"
load created, next to the old and active load terms.
By changing the way the internally kept value is rounded, that internal
value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
increasing load, the internally kept load value is rounded up, when the
load is decreasing, the load value is rounded down.
The modified code was tested on nohz=off and nohz kernels. It was tested
on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
tested on single, dual, and octal cores system. It was tested on virtual
hosts and bare hardware. No unwanted effects have been observed, and the
problems that the patch intended to fix were indeed gone.
Tested-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Vik Heyndrickx <vik.heyndrickx@veribox.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Doug Smythies <dsmythies@telus.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0f004f5a69 ("sched: Cure more NO_HZ load average woes")
Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aac55d7573 upstream.
With Linux page size of 64K and hardware only supporting 4K HPTE, if we
use subpage protection, we always fail for the subpage 0 as shown
below (using the selftest subpage_prot test):
520175565: (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !
This is because hash preload wrongly inserts the HPTE entry for subpage
0 without looking at the subpage protection information.
Fix it by teaching should_hash_preload() not to preload if we have
subpage protection configured for that range.
It appears this has been broken since it was introduced in 2008.
Fixes: fa28237cfc ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8bbc9b7b00 upstream.
Currently we have a check in hash_preload() against the psize, which is
only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
this check in a subsequent patch, so factor it out to allow that. As a
bonus it removes the #ifdef in the C code.
Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
block because it would require a forward declaration.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c87bf43144 upstream.
Enabling CONFIG_GCOV_PROFILE_ALL produces us a lot of warnings like
lib/lz4/lz4hc_compress.c: In function 'lz4_compresshcctx':
lib/lz4/lz4hc_compress.c:514:1: warning: the frame size of 1504 bytes is larger than 1024 bytes [-Wframe-larger-than=]
After some investigation, I found that this behavior started with gcc-4.9,
and opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702.
A suggested workaround for it is to use the -fno-tree-loop-im
flag that turns off one of the optimization stages in gcc, so the
code runs a little slower but does not use excessive amounts
of stack.
We could make this conditional on the gcc version, but I could not
find an easy way to do this in Kbuild and the benefit would be
fairly small, given that most of the gcc version in production are
affected now.
I'm marking this for 'stable' backports because it addresses a bug
with code generation in gcc that exists in all kernel versions
with the affected gcc releases.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 128639395b upstream.
Update the recent changes to set_pte() that were added in 46011e6ea3
to handle R10000_LLSC_WAR, and format the assembly to match other areas
of the MIPS tree using the same WAR.
This also incorporates a patch recently sent in my Markos Chandras,
"Remove local LL/SC preprocessor variants", so that patch doesn't need
to be applied if this one is accepted.
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Fixes: 46011e6ea3 ("MIPS: Make set_pte() SMP safe.)
Cc: David Daney <david.daney@cavium.com>
Cc: Linux/MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/11103/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2:
- Use {LL,SC}_INSN not __{LL,SC}
- Use literal arch=r4000 instead of MIPS_ISA_ARCH_LEVEL since R6 is not
supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f18ebc211e upstream.
The problem with ornamental, do-nothing gotos is that they lead to
"forgot to set the error code" bugs. We should be returning -EINVAL
here but we don't. It leads to an uninitalized variable in
counter_show():
drivers/acpi/sysfs.c:603 counter_show()
error: uninitialized symbol 'status'.
Fixes: 1c8fce27e2 (ACPI: introduce drivers/acpi/sysfs.c)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6798df4c5f upstream.
When csw->con_startup() fails in do_register_con_driver, we return no
error (i.e. 0). This was changed back in 2006 by commit 3e795de763.
Before that we used to return -ENODEV.
So fix the return value to be -ENODEV in that case again.
Fixes: 3e795de763 ("VT binding: Add binding/unbinding support for the VT console")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 309124e264 upstream.
According to full-history-linux commit d3794f4fa7c3edc3 ("[PATCH] M68k
update (part 25)"), port operations are allowed on m68k if CONFIG_ISA is
defined.
However, commit 153dcc54df ("[PATCH] mem driver: fix conditional
on isa i/o support") accidentally changed an "||" into an "&&",
disabling it completely on m68k. This logic was retained when
introducing the DEVPORT symbol in commit 4f911d64e0 ("Make
/dev/port conditional on config symbol").
Drop the bogus dependency on !M68K to fix this.
Fixes: 153dcc54df ("[PATCH] mem driver: fix conditional on isa i/o support")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Al Stone <ahs3@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9eb13a910 upstream.
If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly). Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.
This can be reproduced via:
mke2fs -t ext4 /tmp/foo.img 100
debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
mount -o loop /tmp/foo.img /mnt
(But don't do this if you are using an unpatched kernel if you care
about the system staying functional. :-)
This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1]. (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)
[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc4bf75ea3 upstream.
Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.
In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.
The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.
Fixed by: Adding another kthread_should_stop() before schedule_timeout()
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4c63c2454e upstream.
32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.
Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1497977fe upstream.
sg_dma_len() macro can be used only on scattelists which are mapped, so
all calls to it before dma_map_sg() are invalid. Replace them by proper
check for direct sg segment length read.
Fixes: a49e490c7a ("crypto: s5p-sss - add S5PV210 advanced crypto engine support")
Fixes: 9e4a1100a4 ("crypto: s5p-sss - Handle unaligned buffers")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: unaligned DMA is unsupported so there is a different
set of calls to replace]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c20e128030 upstream.
The alpha pci_mmap_resource() is used for both IORESOURCE_MEM and
IORESOURCE_IO resources, but iomem_is_exclusive() is only applicable for
IORESOURCE_MEM.
Call iomem_is_exclusive() only for IORESOURCE_MEM resources, and do it
earlier to match the generic version of pci_mmap_resource().
Fixes: 10a0ef39fb ("PCI/alpha: pci sysfs resources")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca620723d4 upstream.
iomem_is_exclusive() requires a CPU physical address, but on some arches we
supplied a PCI bus address instead.
On most arches, pci_resource_to_user(res) returns "res->start", which is a
CPU physical address. But on microblaze, mips, powerpc, and sparc, it
returns the PCI bus address corresponding to "res->start".
The result is that pci_mmap_resource() may fail when it shouldn't (if the
bus address happens to match an existing resource), or it may succeed when
it should fail (if the resource is exclusive but the bus address doesn't
match it).
Call iomem_is_exclusive() with "res->start", which is always a CPU physical
address, not the result of pci_resource_to_user().
Fixes: e8de1481fd ("resource: allow MMIO exclusivity for device drivers")
Suggested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79152e8d08 upstream.
The tcrypt testing module on Exynos5422-based Odroid XU3/4 board failed on
testing 8 kB size blocks:
$ sudo modprobe tcrypt sec=1 mode=500
testing speed of async ecb(aes) (ecb-aes-s5p) encryption
test 0 (128 bit key, 16 byte blocks): 21971 operations in 1 seconds (351536 bytes)
test 1 (128 bit key, 64 byte blocks): 21731 operations in 1 seconds (1390784 bytes)
test 2 (128 bit key, 256 byte blocks): 21932 operations in 1 seconds (5614592 bytes)
test 3 (128 bit key, 1024 byte blocks): 21685 operations in 1 seconds (22205440 bytes)
test 4 (128 bit key, 8192 byte blocks):
This was caused by a race issue of missed BRDMA_DONE ("Block cipher
Receiving DMA") interrupt. Device starts processing the data in DMA mode
immediately after setting length of DMA block: receiving (FCBRDMAL) or
transmitting (FCBTDMAL). The driver sets these lengths from interrupt
handler through s5p_set_dma_indata() function (or xxx_setdata()).
However the interrupt handler was first dealing with receive buffer
(dma-unmap old, dma-map new, set receive block length which starts the
operation), then with transmit buffer and finally was clearing pending
interrupts (FCINTPEND). Because of the time window between setting
receive buffer length and clearing pending interrupts, the operation on
receive buffer could end already and driver would miss new interrupt.
User manual for Exynos5422 confirms in example code that setting DMA
block lengths should be the last operation.
The tcrypt hang could be also observed in following blocked-task dmesg:
INFO: task modprobe:258 blocked for more than 120 seconds.
Not tainted 4.6.0-rc4-next-20160419-00005-g9eac8b7b7753-dirty #42
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe D c06b09d8 0 258 256 0x00000000
[<c06b09d8>] (__schedule) from [<c06b0f24>] (schedule+0x40/0xac)
[<c06b0f24>] (schedule) from [<c06b49f8>] (schedule_timeout+0x124/0x178)
[<c06b49f8>] (schedule_timeout) from [<c06b17fc>] (wait_for_common+0xb8/0x144)
[<c06b17fc>] (wait_for_common) from [<bf0013b8>] (test_acipher_speed+0x49c/0x740 [tcrypt])
[<bf0013b8>] (test_acipher_speed [tcrypt]) from [<bf003e8c>] (do_test+0x2240/0x30ec [tcrypt])
[<bf003e8c>] (do_test [tcrypt]) from [<bf008048>] (tcrypt_mod_init+0x48/0xa4 [tcrypt])
[<bf008048>] (tcrypt_mod_init [tcrypt]) from [<c010177c>] (do_one_initcall+0x3c/0x16c)
[<c010177c>] (do_one_initcall) from [<c0191ff0>] (do_init_module+0x5c/0x1ac)
[<c0191ff0>] (do_init_module) from [<c0185610>] (load_module+0x1a30/0x1d08)
[<c0185610>] (load_module) from [<c0185ab0>] (SyS_finit_module+0x8c/0x98)
[<c0185ab0>] (SyS_finit_module) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c)
Fixes: a49e490c7a ("crypto: s5p-sss - add S5PV210 advanced crypto engine support")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7b9bc799a4 upstream.
BugLink: http://bugs.launchpad.net/bugs/972604
Commit 09c9bae26b ("ath5k: add led pin
configuration for compaq c700 laptop") added a pin configuration for the Compaq
c700 laptop. However, the polarity of the led pin is reversed. It should be
red for wifi off and blue for wifi on, but it is the opposite. This bug was
reported in the following bug report:
http://pad.lv/972604
Fixes: 09c9bae26b ("ath5k: add led pin configuration for compaq c700 laptop")
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 267c858603 upstream.
Setting the flag 'cache_bypass' will bypass the cache not the hardware.
Fix this comment here.
Fixes: 0eef6b0415 ("regmap: Fix doc comment")
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79e4865032 upstream.
Stack object "dte_facilities" is allocated in x25_rx_call_request(),
which is supposed to be initialized in x25_negotiate_facilities.
However, 5 fields (8 bytes in total) are not initialized. This
object is then copied to userland via copy_to_user, thus infoleak
occurs.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f8e44741f upstream.
The stack object “map” has a total size of 32 bytes. Its last 4
bytes are padding generated by compiler. These padding bytes are
not initialized and sent out via “nla_put”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8670c09f3 upstream.
The stack object “info” has a total size of 12 bytes. Its last byte
is padding which is not initialized and leaked via “put_cmsg”.
Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 31b0b385f6 upstream.
The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.
Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.
This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.
Fixes: 5b3501faa8 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 99d825822e upstream.
Payloads of NM entries are not supposed to contain NUL. When we run
into such, only the part prior to the first NUL goes into the
concatenation (i.e. the directory entry name being encoded by a bunch
of NM entries). We do stop when the amount collected so far + the
claimed amount in the current NM entry exceed 254. So far, so good,
but what we return as the total length is the sum of *claimed*
sizes, not the actual amount collected. And that can grow pretty
large - not unlimited, since you'd need to put CE entries in
between to be able to get more than the maximum that could be
contained in one isofs directory entry / continuation chunk and
we are stop once we'd encountered 32 CEs, but you can get about 8Kb
easily. And that's what will be passed to readdir callback as the
name length. 8Kb __copy_to_user() from a buffer allocated by
__get_free_page()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0b22d1bb2 upstream.
Do not load one entry beyond the end of the syscall table when the
syscall number of a traced process equals to __NR_Linux_syscalls.
Similar bug with regular processes was fixed by commit 3bb457af4f
("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8148a73c99 upstream.
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13f4bb78cf upstream.
The crypto hash walk code is broken when supplied with an offset
greater than or equal to PAGE_SIZE. This patch fixes it by adjusting
walk->pg and walk->offset when this happens.
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 689de1d6ca upstream.
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.
In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much. In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.
There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem. It
simply picks a much better constant multiplier, spreading the bits out a
lot better.
NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive. The bigger hashing
cleanup will replace the 32-bit case with something better.
The new constants were picked by George Spelvin who wrote that bigger
cleanup series. I just picked out the constants and part of the comment
from that series.
Cc: George Spelvin <linux@horizon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23d0db76ff upstream.
The hash_64() function historically does the multiply by the
GOLDEN_RATIO_PRIME_64 number with explicit shifts and adds, because
unlike the 32-bit case, gcc seems unable to turn the constant multiply
into the more appropriate shift and adds when required.
However, that means that we generate those shifts and adds even when the
architecture has a fast multiplier, and could just do it better in
hardware.
Use the now-cleaned-up CONFIG_ARCH_HAS_FAST_MULTIPLIER (together with
"is it a 64-bit architecture") to decide whether to use an integer
multiply or the explicit sequence of shift/add instructions.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[bwh: This has no immediate effect in 3.2 because nothing defines
CONFIG_ARCH_HAS_FAST_MULTIPLIER. However the following fix removes
that condition.]
commit e6bd18f57a upstream.
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl(). This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.
For the immediate repair, detect and deny suspicious accesses to
the write API.
For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).
The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2:
- Drop changes to hfi1
- include/rdma/ib.h didn't exist, so create it with the usual header guard
and include it in drivers/infiniband/core/ucma.c
- ipath_write() has the same problem, so add the same restriction there]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dab9a2663f upstream.
During system resume we depended on pci_enable_device() also putting the
device into PCI D0 state. This won't work if the PCI device was already
enabled but still in D3 state. This is because pci_enable_device() is
refcounted and will not change the HW state if called with a non-zero
refcount. Leaving the device in D3 will make all subsequent device
accesses fail.
This didn't cause a problem most of the time, since we resumed with an
enable refcount of 0. But it fails at least after module reload because
after that we also happen to leak a PCI device enable reference: During
probing we call drm_get_pci_dev() which will enable the PCI device, but
during device removal drm_put_dev() won't disable it. This is a bug of
its own in DRM core, but without much harm as it only leaves the PCI
device enabled. Fixing it is also a bit more involved, due to DRM
mid-layering and because it affects non-i915 drivers too. The fix in
this patch is valid regardless of the problem in DRM core.
v2:
- Add a code comment about the relation of this fix to the freeze/thaw
vs. the suspend/resume phases. (Ville)
- Add a code comment about the inconsistent ordering of set power state
and device enable calls. (Chris)
CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460979954-14503-1-git-send-email-imre.deak@intel.com
(cherry picked from commit 44410cd0bf)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[bwh: Backported to 3.2:
- Return error code directly
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d377f4d69 upstream.
The Link ECU is an aftermarket ECU computer for vehicles that provides
full tuning abilities as well as datalogging and displaying capabilities
via the USB to Serial adapter built into the device.
Signed-off-by: Mike Manning <michael@bsch.com.au>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c4fdb6cff2 upstream.
When removing a single interface while a broadcast or ogm packet is
still pending then we will free the forward packet without releasing the
queue slots again.
This patch is supposed to fix this issue.
Fixes: 6d5808d4ae ("batman-adv: Add missing hardif_free_ref in forw_packet_free")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
[sven@narfation.org: fix conflicts with current version]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1a65f1741 upstream.
_batadv_update_route rcu_derefences orig_ifinfo->router outside of a
spinlock protected region to print some information messages to the debug
log. But this pointer is not checked again when the new pointer is assigned
in the spinlock protected region. Thus is can happen that the value of
orig_ifinfo->router changed in the meantime and thus the reference counter
of the wrong router gets reduced after the spinlock protected region.
Just rcu_dereferencing the value of orig_ifinfo->router inside the spinlock
protected region (which also set the new pointer) is enough to get the
correct old router object.
Fixes: e1a5382f97 ("batman-adv: Make orig_node->router an rcu protected pointer")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
[bwh: Backported to 3.2: s/orig_ifinfo/orig_node/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c78296665c upstream.
The encapsulated ethernet and VLAN header may be outside the received
ethernet frame. Thus the skb buffer size has to be checked before it can be
parsed to find out if it encapsulates another batman-adv packet.
Fixes: 420193573f ("batman-adv: softif bridge loop avoidance")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 103f6112f2 upstream.
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:
kernel BUG at .../fs/hugetlbfs/inode.c:428!
invalid opcode: 0000 [#1] SMP
...
RIP: e030:[<ffffffff811c333b>] [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
...
Call Trace:
[<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
[<ffffffff81167b3d>] evict+0xbd/0x1b0
[<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
[<ffffffff81165b0e>] dput+0x1fe/0x220
[<ffffffff81150535>] __fput+0x155/0x200
[<ffffffff81079fc0>] task_work_run+0x60/0xa0
[<ffffffff81063510>] do_exit+0x160/0x400
[<ffffffff810637eb>] do_group_exit+0x3b/0xa0
[<ffffffff8106e8bd>] get_signal+0x1ed/0x470
[<ffffffff8100f854>] do_signal+0x14/0x110
[<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
[<ffffffff814178a5>] retint_user+0x8/0x13
This is CVE-2016-3961 / XSA-174.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 457c1b27ed upstream.
Currently, I am seeing the following when I `mount -t hugetlbfs /none
/dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
related to the fact that hugetlbfs is properly not correctly setting
itself up in this state?:
Unable to handle kernel paging request for data at address 0x00000031
Faulting instruction address: 0xc000000000245710
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
....
In KVM guests on Power, in a guest not backed by hugepages, we see the
following:
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 64 kB
HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
are not supported at boot-time, but this is only checked in
hugetlb_init(). Extract the check to a helper function, and use it in a
few relevant places.
This does make hugetlbfs not supported (not registered at all) in this
environment. I believe this is fine, as there are no valid hugepages
and that won't change at runtime.
[akpm@linux-foundation.org: use pr_info(), per Mel]
[akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Drop changes to hugetlb_show_meminfo()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f43bfaeddc upstream.
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f061564 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
commit 6997e57d69 upstream.
The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.
This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).
Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.
This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.
We're also incorrectly setting MMU feature bit 1, which is:
#define MMU_FTR_TYPE_8xx 0x00000002
Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.
Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:
#define PPC_FEATURE_PPC_LE 0x00000001
Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.
Fix the code by adding the missing initialisation of the MMU feature.
Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.
Fixes: 44ae3ab335 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Flesh out changelog, add comment reserving 0x4]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eda5ecc0a6 upstream.
The trigger delay algorithm that converts from microseconds to
the register value looks incorrect. According to most of the PMIC
documentation, the equation is
delay (Seconds) = (1 / 1024) * 2 ^ (x + 4)
except for one case where the documentation looks to have a
formatting issue and the equation looks like
delay (Seconds) = (1 / 1024) * 2 x + 4
Most likely this driver was written with the improper
documentation to begin with. According to the downstream sources
the valid delays are from 2 seconds to 1/64 second, and the
latter equation just doesn't make sense for that. Let's fix the
algorithm and the range check to match the documentation and the
downstream sources.
Reported-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Fixes: 92d57a73e4 ("input: Add support for Qualcomm PMIC8XXX power key")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: use pdata->kpd_trigger_delay_us not kpd_delay]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e86103a757 upstream.
On BXT platform Host Controller and Device Controller figure as
same PCI device but with different device function. HCD should
not pass data to Device Controller but only to Host Controllers.
Checking if companion device is Host Controller, otherwise skip.
Signed-off-by: Robert Dobrowolski <robert.dobrowolski@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1363074667 upstream.
Add a new NO_REPORT_LUNS quirk and set it for Seagate drives with
an usb-id of: 0bc2:331a, as these will fail to respond to a
REPORT_LUNS command.
Reported-and-tested-by: David Webb <djw@noc.ac.uk>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Drop the UAS changes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71504062a7 upstream.
This patch fixes some wild pointers produced by xhci_mem_cleanup.
These wild pointers will cause system crash if xhci_mem_cleanup()
is called twice.
Reported-and-tested-by: Pengcheng Li <lpc.li@hisilicon.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: there's no xhci_hcd::ext_caps field to clear]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f815cdde3 upstream.
A non-privileged user can create a netlink socket with the same port_id as
used by an existing open nl80211 netlink socket (e.g. as used by a hostapd
process) with a different protocol number.
Closing this socket will then lead to the notification going to nl80211's
socket release notification handler, and possibly cause an action such as
removing a virtual interface.
Fix this issue by checking that the netlink protocol is NETLINK_GENERIC.
Since generic netlink has no notifier chain of its own, we can't fix the
problem more generically.
Fixes: 026331c4d9 ("cfg80211/mac80211: allow registering for and sending action frames")
Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc5b7f3bf1 upstream.
An interrupt handler that uses the fpu can kill a KVM VM, if it runs
under the following conditions:
- the guest's xcr0 register is loaded on the cpu
- the guest's fpu context is not loaded
- the host is using eagerfpu
Note that the guest's xcr0 register and fpu context are not loaded as
part of the atomic world switch into "guest mode". They are loaded by
KVM while the cpu is still in "host mode".
Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
interrupt handler will look something like this:
if (irq_fpu_usable()) {
kernel_fpu_begin();
[... code that uses the fpu ...]
kernel_fpu_end();
}
As long as the guest's fpu is not loaded and the host is using eager
fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
returns true). The interrupt handler proceeds to use the fpu with
the guest's xcr0 live.
kernel_fpu_begin() saves the current fpu context. If this uses
XSAVE[OPT], it may leave the xsave area in an undesirable state.
According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
xcr0[i] == 0 following an XSAVE.
kernel_fpu_end() restores the fpu context. Now if any bit i in
XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
fault is trapped and SIGSEGV is delivered to the current process.
Only pre-4.2 kernels appear to be vulnerable to this sequence of
events. Commit 653f52c ("kvm,x86: load guest FPU context more eagerly")
from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
This patch fixes the bug by keeping the host's xcr0 loaded outside
of the interrupts-disabled region where KVM switches into guest mode.
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Matlack <dmatlack@google.com>
[Move load after goto cancel_injection. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Drop change in__kvm_set_xcr()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fd0f46cb1 upstream.
In usecases where force_port_map is used saved_port_map is never set,
resulting in not programming the PORTS_IMPL register as part of initial
config. This patch fixes this by setting it to port_map even in case
where force_port_map is used, making it more inline with other parts of
the code.
Fixes: 566d1827df ("libata: disable forced PORTS_IMPL for >= AHCI 1.3")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andy Gross <andy.gross@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 162f98dea4 upstream.
The gtco driver expects at least one valid endpoint. If given malicious
descriptors that specify 0 for the number of endpoints, it will crash in
the probe function. Ensure there is at least one endpoint on the interface
before using it.
Also let's fix a minor coding style issue.
The full correct report of this issue can be found in the public
Red Hat Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1283385
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e21404dc0a upstream.
Loading ipmi_si module while bmc is disconnected, we found the timeout
is longer than 5 secs. Actually it takes about 3 mins and 20
secs.(HZ=250)
error message as below:
Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
[...]
Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]
Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
usecs from timeout.
Reported-by: Hu Shiyuan <hushiyuan@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: sebastian.riemer@profitbricks.com
Cc: cminyard@mvista.com
Cc: openipmi-developer@lists.sourceforge.net
commit df90ca9690 upstream.
Commit ff47ab4ff3 "x86: Add 1/2/4/8 byte optimization to 64bit
__copy_{from,to}_user_inatomic" added a "_nocheck" call in between
the copy_to/from_user() and copy_user_generic(). As both the
normal and nocheck versions of theses calls use the proper __user
annotation, a typecast to remove it should not be added.
This causes sparse to spin out the following warnings:
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident>
arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src
arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo@redhat.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: h.zuidam@computer.org
commit ff47ab4ff3 upstream.
The 64bit __copy_{from,to}_user_inatomic always called
copy_from_user_generic, but skipped the special optimizations for 1/2/4/8
byte accesses.
This especially hurts the futex call, which accesses the 4 byte futex
user value with a complicated fast string operation in a function call,
instead of a single movl.
Use __copy_{from,to}_user for _inatomic instead to get the same
optimizations. The only problem was the might_fault() in those functions.
So move that to new wrapper and call __copy_{f,t}_user_nocheck()
from *_inatomic directly.
32bit already did this correctly by duplicating the code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo@redhat.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: h.zuidam@computer.org
This bug has already bee fixed upstream since 4.2. However, it
was fixed during the AEAD conversion so no fix was backported to
the older kernels.
When we do an RFC 4543 decryption, we will end up writing the
ICV beyond the end of the dst buffer. This should lead to a
crash but for some reason it was never noticed.
This patch fixes it by only writing back the ICV for encryption.
Fixes: d733ac90f9 ("crypto: gcm - fix rfc4543 to handle async...")
Reported-by: Patrick Meyer <patrick.meyer@vasgard.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d733ac90f9 upstream.
If the gcm cipher used by rfc4543 does not complete request immediately,
the authentication tag is not copied to destination buffer. Patch adds
correct async logic for this case.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2eff73c0a1 upstream.
Pave the way for checking the current patch level of the
microcode in a core. We want to be able to do stuff depending on
the patch level - in this case decide whether to update or not.
But that will be added in a later patch.
Drop unused local var uci assignment, while at it.
Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai:
Use native_rdmsr() in check_current_patch_level() because with
CONFIG_PARAVIRT enabled and on 32-bit, where we run before
paging has been enabled, we cannot deref pv_info yet. Or we
could, but we'd need to access its physical address. This way of
fixing it is simpler. See:
https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.com>:
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
This reverts commit b5518429e7, which was
commit 2793a23aac upstream. It is
pointless unless af_packet calls the new function.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 0954b59d9f, which was
commit ea47781c26 upstream. It is
pointless unless af_packet calls the new function.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 54d83fc74a upstream.
Ben Hawkes says:
In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
is possible for a user-supplied ipt_entry structure to have a large
next_offset field. This field is not bounds checked prior to writing a
counter value at the supplied offset.
Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.
However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.
However, an unconditional rule must also not be using any matches
(no -m args).
The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.
Unify this so that all the callers have same idea of 'unconditional rule'.
Reported-by: Ben Hawkes <hawkes@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 81422e672f upstream.
According to Documentation/power/devices.txt
The driver should not use device_set_wakeup_enable() which is the policy
for user to decide.
Using device_init_wakeup() to initialize dev->power.should_wakeup and
dev->power.can_wakeup on driver initialization.
And use device_may_wakeup() on suspend to decide if WoL function should
be enabled on NIC.
Reported-by: Diego Viola <diego.viola@gmail.com>
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3ba3458fb9 ]
When sending a UDPv6 message longer than MTU, account for the length
of fragmentable IPv6 extension headers in skb->network_header offset.
Same as we do in alloc_new_skb path in __ip6_append_data().
This ensures that later on __ip6_make_skb() will make space in
headroom for fragmentable extension headers:
/* move skb->data to ip header from ext header */
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
Prevents a splat due to skb_under_panic:
skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \
head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] KASAN
CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
[...]
Call Trace:
[<ffffffff813eb7b9>] skb_push+0x79/0x80
[<ffffffff8143397b>] eth_header+0x2b/0x100
[<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
[<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
[<ffffffff814efe3a>] ip6_output+0x16a/0x280
[<ffffffff815440c1>] ip6_local_out+0xb1/0xf0
[<ffffffff814f1115>] ip6_send_skb+0x45/0xd0
[<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
[<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
[...]
Reported-by: Ji Jianwen <jiji@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 5745b8232e ]
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the
right place.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 071d36bf21 ]
A crash is observed when a decrypted packet is processed in receive
path. get_rps_cpus() tries to dereference the skb->dev fields but it
appears that the device is freed from the poison pattern.
[<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0
[<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc
[<ffffffc000af6094>] netif_rx+0x74/0x94
[<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0
[<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c
[<ffffffc000ba6eb8>] esp_input_done+0x20/0x30
[<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
[<ffffffc0000b7324>] worker_thread+0x2f8/0x418
[<ffffffc0000bb40c>] kthread+0xe0/0xec
-013|get_rps_cpu(
| dev = 0xFFFFFFC08B688000,
| skb = 0xFFFFFFC0C76AAC00 -> (
| dev = 0xFFFFFFC08B688000 -> (
| name =
"......................................................
| name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev =
0xAAAAAAAAAAA
Following are the sequence of events observed -
- Encrypted packet in receive path from netdevice is queued
- Encrypted packet queued for decryption (asynchronous)
- Netdevice brought down and freed
- Packet is decrypted and returned through callback in esp_input_done
- Packet is queued again for process in network stack using netif_rx
Since the device appears to have been freed, the dereference of
skb->dev in get_rps_cpus() leads to an unhandled page fault
exception.
Fix this by holding on to device reference when queueing packets
asynchronously and releasing the reference on call back return.
v2: Make the change generic to xfrm as mentioned by Steffen and
update the title to xfrm
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2c9a266afe ]
When running small packets [length < 256 bytes] traffic, packets were
being dropped due to invalid data in those packets which were
delivered by the driver upto the stack. Using pci_dma_sync_single_for_cpu
ensures copying latest and updated data into skb from the receive buffer.
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e725a66c02 ]
gcc-6 finds an out of bounds access in the fst_add_one function
when calculating the end of the mmio area:
drivers/net/wan/farsync.c: In function 'fst_add_one':
drivers/net/wan/farsync.c:418:53: error: index 2 denotes an offset greater than size of 'u8[2][8192] {aka unsigned char[2][8192]}' [-Werror=array-bounds]
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^
include/linux/compiler-gcc.h:158:21: note: in definition of macro '__compiler_offsetof'
__builtin_offsetof(a, b)
^
drivers/net/wan/farsync.c:418:37: note: in expansion of macro 'offsetof'
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^~~~~~~~
drivers/net/wan/farsync.c:2519:36: note: in expansion of macro 'BUF_OFFSET'
+ BUF_OFFSET ( txBuffer[i][NUM_TX_BUFFER][0]);
^~~~~~~~~~
The warning is correct, but not critical because this appears
to be a write-only variable that is set by each WAN driver but
never accessed afterwards.
I'm taking the minimal fix here, using the correct pointer by
pointing 'mem_end' to the last byte inside of the register area
as all other WAN drivers do, rather than the first byte outside of
it. An alternative would be to just remove the mem_end member
entirely.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8e2ad4113c ]
The stack expects link layer headers in the skb linear section.
Macvtap can create skbs with llheader in frags in edge cases:
when (IFF_VNET_HDR is off or vnet_hdr.hdr_len < ETH_HLEN) and
prepad + len > PAGE_SIZE and vnet_hdr.flags has no or bad csum.
Add checks to ensure linear is always at least ETH_HLEN.
At this point, len is already ensured to be >= ETH_HLEN.
For backwards compatiblity, rounds up short vnet_hdr.hdr_len.
This differs from tap and packet, which return an error.
Fixes b9fb9ee07e ("macvtap: add GSO/csum offload support")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: don't use macvtap16_to_cpu()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit c1b7fca650 ]
In a low memory situation, if netdev_alloc_skb() fails on a first RX ring
loop iteration in sh_eth_ring_format(), 'rxdesc' is still NULL. Avoid
kernel oops by adding the 'rxdesc' check after the loop.
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ea47781c26 ]
As variable length protocol, AX25 fails link layer header validation
tests based on a minimum length. header_ops.validate allows protocols
to validate headers that are shorter than hard_header_len. Implement
this callback for AX25.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2793a23aac ]
Netdevice parameter hard_header_len is variously interpreted both as
an upper and lower bound on link layer header length. The field is
used as upper bound when reserving room at allocation, as lower bound
when validating user input in PF_PACKET.
Clarify the definition to be maximum header length. For validation
of untrusted headers, add an optional validate member to header_ops.
Allow bypassing of validation by passing CAP_SYS_RAWIO, for instance
for deliberate testing of corrupt input. In this case, pad trailing
bytes, as some device drivers expect completely initialized headers.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: net_device has inline comments instead of kernel-doc]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 48906f62c9 ]
Some devices will silently fail setup unless they are reset first.
This is necessary even if the data interface is already in
altsetting 0, which it will be when the device is probed for the
first time. Briefly toggling the altsetting forces a function
reset regardless of the initial state.
This fixes a setup problem observed on a number of Huawei devices,
appearing to operate in NTB-32 mode even if we explicitly set them
to NTB-16 mode.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: hard-code 1 for data_altsetting]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 40b4f0fd74 ]
As the member .cmp_addr of sctp_af_inet6, sctp_v6_cmp_addr should also check
the port of addresses, just like sctp_v4_cmp_addr, cause it's invoked by
sctp_cmp_addr_exact().
Now sctp_v6_cmp_addr just check the port when two addresses have different
family, and lack the port check for two ipv6 addresses. that will make
sctp_hash_cmp() cannot work well.
so fix it by adding ports comparison in sctp_v6_cmp_addr().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ee50c130c8 ]
The JMC260 network card fails to suspend/resume because the call to
jme_start_irq() was too early, moving the call to jme_start_irq() after
the call to jme_reset_link() makes it work.
Prior this change suspend/resume would fail unless /sys/power/pm_async=0
was explicitly specified.
Relevant bug report: https://bugzilla.kernel.org/show_bug.cgi?id=112351
Signed-off-by: Diego Viola <diego.viola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff1cab374a upstream.
The BSP team noticed that there is spin/mutex lock issue on sh-sci when
CPUFREQ is used. The issue is that the notifier function may call
mutex_lock() while the spinlock is held, which can lead to a BUG().
This may happen if CPUFREQ is changed while another CPU calls
clk_get_rate().
Taking the spinlock was added to the notifier function in commit
e552de2413 ("sh-sci: add platform device private data"), to
protect the list of serial ports against modification during traversal.
At that time the Common Clock Framework didn't exist yet, and
clk_get_rate() just returned clk->rate without taking a mutex.
Note that since commit d535a2305f ("serial: sh-sci: Require a
device per port mapping."), there's no longer a list of serial ports to
traverse, and taking the spinlock became superfluous.
To fix the issue, just remove the cpufreq notifier:
1. The notifier doesn't work correctly: all it does is update the
stored clock rate; it does not update the divider in the hardware.
The divider will only be updated when calling sci_set_termios().
I believe this was broken back in 2004, when the old
drivers/char/sh-sci.c driver (where the notifier did update the
divider) was replaced by drivers/serial/sh-sci.c (where the
notifier just updated port->uartclk).
Cfr. full-history-linux commits 6f8deaef2e9675d9 ("[PATCH] sh: port
sh-sci driver to the new API") and 3f73fe878dc9210a ("[PATCH]
Remove old sh-sci driver").
2. On modern SoCs, the sh-sci parent clock rate is no longer related
to the CPU clock rate anyway, so using a cpufreq notifier is
futile.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d99b55d37 upstream.
Commit 35dc248383 introduced a check for
current->mm to see if we have a user space context and only copies data
if we do. Now if an IO gets interrupted by a signal data isn't copied
into user space any more (as we don't have a user space context) but
user space isn't notified about it.
This patch modifies the behaviour to return -EINTR from bio_uncopy_user()
to notify userland that a signal has interrupted the syscall, otherwise
it could lead to a situation where the caller may get a buffer with
no data returned.
This can be reproduced by issuing SG_IO ioctl()s in one thread while
constantly sending signals to it.
Fixes: 35dc248 [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2:
- Adjust filename
- Put the assignment in the existing 'else' block]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d9749fb594 ]
Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires. The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table). Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
To fix this, we change the logic slightly. We start by computing a goal
allocation order (which is limited by the maximum size hash table we want
to support. Then we attempt to allocate that size table, decreasing the
order until a successful allocation is made. Then, with the resultant
successful order we compute the number of buckets that hash table supports,
which we then round down to the nearest power of two, giving us the number
of entries the table actually supports.
I've tested this locally here, using non-debug and spinlock-debug kernels,
and the number of entries in the hashtable consistently work out to be
powers of two in all cases.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Dmitry Vyukov <dvyukov@google.com>
CC: Vladislav Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 29e73269aa ]
Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 919483096b ]
Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.
Callers are responsible for the freeing.
Many thanks to Dmitry for the report and diagnostic.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8013d1d7ea ]
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust filename, context
- Number DEVCONF enumerators explicitly to match upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1cdda91871 ]
Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 81e8f2e930 ]
PHY status frames are not reliable, the PHY may not be able to send them
during heavy receive traffic. This overflow condition is signaled by the
PHY in the next status frame, but the driver did not make use of it.
Instead it always reported wrong tx timestamps to user space after an
overflow happened because it assigned newly received tx timestamps to old
packets in the queue.
This commit fixes this issue by clearing the tx timestamp queue every time
an overflow happens, so that no timestamps are delivered for overflow
packets. This way time stamping will continue correctly after an overflow.
Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 34ae6a1aa0 ]
When a tunnel decapsulates the outer header, it has to comply
with RFC 6080 and eventually propagate CE mark into inner header.
It turns out IP6_ECN_set_ce() does not correctly update skb->csum
for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure"
messages and stack traces.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Add skb argument to other callers of IP6_ECN_set_ce()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7aaed57c5c ]
Ivaylo Dimitrov reported a regression caused by commit 7866a62104
("dev: add per net_device packet type chains").
skb->dev becomes NULL and we crash in __netif_receive_skb_core().
Before above commit, different kind of bugs or corruptions could happen
without major crash.
But the root cause is that phonet_rcv() can queue skb without checking
if skb is shared or not.
Many thanks to Ivaylo Dimitrov for his help, diagnosis and tests.
Reported-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Tested-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 83d15e70c4 ]
For tcp_yeah, use an ssthresh floor of 2, the same floor used by Reno
and CUBIC, per RFC 5681 (equation 4).
tcp_yeah_ssthresh() was sometimes returning a 0 or negative ssthresh
value if the intended reduction is as big or bigger than the current
cwnd. Congestion control modules should never return a zero or
negative ssthresh. A zero ssthresh generally results in a zero cwnd,
causing the connection to stall. A negative ssthresh value will be
interpreted as a u32 and will set a target cwnd for PRR near 4
billion.
Oleksandr Natalenko reported that a system using tcp_yeah with ECN
could see a warning about a prior_cwnd of 0 in
tcp_cwnd_reduction(). Testing verified that this was due to
tcp_yeah_ssthresh() misbehaving in this way.
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ff62198553 ]
[I stole this patch from Eric Biederman. He wrote:]
> There is no defined mechanism to pass network namespace information
> into /sbin/bridge-stp therefore don't even try to invoke it except
> for bridge devices in the initial network namespace.
>
> It is possible for unprivileged users to cause /sbin/bridge-stp to be
> invoked for any network device name which if /sbin/bridge-stp does not
> guard against unreasonable arguments or being invoked twice on the
> same network device could cause problems.
[Hannes: changed patch using netns_eq]
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 55285bf094 ]
Dmitry reports memleak with syskaller program.
Problem is that connector bumps skb usecount but might not invoke callback.
So move skb_get to where we invoke the callback.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 068d8bd338 ]
In sctp_close, sctp_make_abort_user may return NULL because of memory
allocation failure. If this happens, it will bypass any state change
and never free the assoc. The assoc has no chance to be freed and it
will be kept in memory with the state it had even after the socket is
closed by sctp_close().
So if sctp_make_abort_user fails to allocate memory, we should abort
the asoc via sctp_primitive_ABORT as well. Just like the annotation in
sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
"Even if we can't send the ABORT due to low memory delete the TCB.
This is a departure from our typical NOMEM handling".
But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
dereference the chunk pointer, and system crash. So we should add
SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
places where it adds SCTP_CMD_REPLY cmd.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e1021f2b6 upstream.
ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
ignored in the following "if" condition and ext4_expand_extra_isize()
might be called with NULL iloc.bh set, which triggers NULL pointer
dereference.
This is uncovered by commit 8b4953e13f ("ext4: reserve code points for
the project quota feature"), which enlarges the ext4_inode size, and
run the following script on new kernel but with old mke2fs:
#/bin/bash
mnt=/mnt/ext4
devname=ext4-error
dev=/dev/mapper/$devname
fsimg=/home/fs.img
trap cleanup 0 1 2 3 9 15
cleanup()
{
umount $mnt >/dev/null 2>&1
dmsetup remove $devname
losetup -d $backend_dev
rm -f $fsimg
exit 0
}
rm -f $fsimg
fallocate -l 1g $fsimg
backend_dev=`losetup -f --show $fsimg`
devsize=`blockdev --getsz $backend_dev`
good_tab="0 $devsize linear $backend_dev 0"
error_tab="0 $devsize error $backend_dev 0"
dmsetup create $devname --table "$good_tab"
mkfs -t ext4 $dev
mount -t ext4 -o errors=continue,strictatime $dev $mnt
dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
echo 3 > /proc/sys/vm/drop_caches
ls -l $mnt
exit 0
[ Patch changed to simplify the function a tiny bit. -- Ted ]
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fbd40ea018 upstream.
When an inetdev is destroyed, every address assigned to the interface
is removed. And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:
1) Address promotion. We are deleting all addresses, so there is no
point in doing this.
2) A full nf conntrack table purge for every address. We only need to
do this once, as is already caught by the existing
masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b348d7dddb upstream.
Fix potential out-of-bounds write to urb->transfer_buffer
usbip handles network communication directly in the kernel. When receiving a
packet from its peer, usbip code parses headers according to protocol. As
part of this parsing urb->actual_length is filled. Since the input for
urb->actual_length comes from the network, it should be treated as untrusted.
Any entity controlling the network may put any value in the input and the
preallocated urb->transfer_buffer may not be large enough to hold the data.
Thus, the malicious entity is able to write arbitrary data to kernel memory.
Signed-off-by: Ignat Korchagin <ignat.korchagin@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1666984c86 upstream.
In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b8addf891 upstream.
Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only
the stack and the executable are randomized but not other mmapped files
(libraries, vDSO, etc.). This patch enables randomization for the
libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode.
By default on i386 there are 8 bits for the randomization of the libraries,
vDSO and mmaps which only uses 1MB of VA.
This patch preserves the original randomness, using 1MB of VA out of 3GB or
4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR.
The first obvious security benefit is that all objects are randomized (not
only the stack and the executable) in legacy mode which highly increases
the ASLR effectiveness, otherwise the attackers may use these
non-randomized areas. But also sensitive setuid/setgid applications are
more secure because currently, attackers can disable the randomization of
these applications by setting the ulimit stack to "unlimited". This is a
very old and widely known trick to disable the ASLR in i386 which has been
allowed for too long.
Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE
personality flag, but fortunately this doesn't work on setuid/setgid
applications because there is security checks which clear Security-relevant
flags.
This patch always randomizes the mmap_legacy_base address, removing the
possibility to disable the ASLR by setting the stack to "unlimited".
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Acked-by: Ismael Ripoll Ripoll <iripoll@upv.es>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e94e0cfb0 upstream.
Otherwise this function may read data beyond the ruleset blob.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bdf533de69 upstream.
We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ef4dfd9d9 upstream.
Handling exceptions from modules never worked on parisc.
It was just masked by the fact that exceptions from modules
don't happen during normal use.
When a module triggers an exception in get_user() we need to load the
main kernel dp value before accessing the exception_data structure, and
afterwards restore the original dp value of the module on exit.
Noticed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef72f3110d upstream.
The kernel module testcase (lib/test_user_copy.c) exhibited a kernel
crash on parisc if the parameters for copy_from_user were reversed
("illegal reversed copy_to_user" testcase).
Fix this potential crash by checking the fault handler if the faulting
address is in the exception table.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3893027a3 upstream.
We want to avoid the kernel module loader to create function pointers
for the kernel fixup routines of get_user() and put_user(). Changing
the external reference from function type to int type fixes this.
This unbreaks exception handling for get_user() and put_user() when
called from a kernel module.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cddc9434e3 upstream.
The CP2105 is used in the GE Healthcare Remote Alarm Box, with the
Manufacturer ID of 0x1901 and Product ID of 0x0194.
Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea6db90e75 upstream.
A Fedora user reports that the ftdi_sio driver works properly for the
ICP DAS I-7561U device. Further, the user manual for these devices
instructs users to load the driver and add the ids using the sysfs
interface.
Add support for these in the driver directly so that the devices work
out of the box instead of needing manual configuration.
Reported-by: <thesource@mail.ru>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff1e22e7a6 upstream.
Moving an unmasked irq may result in irq handler being invoked on both
source and target CPUs.
With 2-level this can happen as follows:
On source CPU:
evtchn_2l_handle_events() ->
generic_handle_irq() ->
handle_edge_irq() ->
eoi_pirq():
irq_move_irq(data);
/***** WE ARE HERE *****/
if (VALID_EVTCHN(evtchn))
clear_evtchn(evtchn);
If at this moment target processor is handling an unrelated event in
evtchn_2l_handle_events()'s loop it may pick up our event since target's
cpu_evtchn_mask claims that this event belongs to it *and* the event is
unmasked and still pending. At the same time, source CPU will continue
executing its own handle_edge_irq().
With FIFO interrupt the scenario is similar: irq_move_irq() may result
in a EVTCHNOP_unmask hypercall which, in turn, may make the event
pending on the target CPU.
We can avoid this situation by moving and clearing the event while
keeping event masked.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[bwh: Backported to 3.2:
- Adjust filename
- Add a suitable definition of test_and_set_mask()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4a07083ed6 upstream.
ALSA system timer backend stops the timer via del_timer() without sync
and leaves del_timer_sync() at the close instead. This is because of
the restriction by the design of ALSA timer: namely, the stop callback
may be called from the timer handler, and calling the sync shall lead
to a hangup. However, this also triggers a kernel BUG() when the
timer is rearmed immediately after stopping without sync:
kernel BUG at kernel/time/timer.c:966!
Call Trace:
<IRQ>
[<ffffffff8239c94e>] snd_timer_s_start+0x13e/0x1a0
[<ffffffff8239e1f4>] snd_timer_interrupt+0x504/0xec0
[<ffffffff8122fca0>] ? debug_check_no_locks_freed+0x290/0x290
[<ffffffff8239ec64>] snd_timer_s_function+0xb4/0x120
[<ffffffff81296b72>] call_timer_fn+0x162/0x520
[<ffffffff81296add>] ? call_timer_fn+0xcd/0x520
[<ffffffff8239ebb0>] ? snd_timer_interrupt+0xec0/0xec0
....
It's the place where add_timer() checks the pending timer. It's clear
that this may happen after the immediate restart without sync in our
cases.
So, the workaround here is just to use mod_timer() instead of
add_timer(). This looks like a band-aid fix, but it's a right move,
as snd_timer_interrupt() takes care of the continuous rearm of timer.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 321c5658c5 upstream.
Non maskable interrupts (NMI) are preferred to interrupts in current
implementation. If a NMI is pending and NMI is blocked by the result
of nmi_allowed(), pending interrupt is not injected and
enable_irq_window() is not executed, even if interrupts injection is
allowed.
In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
In this case, interrupts are needed to execute iret that intends end
of NMI. The flag of blocking new NMI is not cleared until the guest
execute the iret, and interrupts are blocked by pending NMI. Due to
this, iret can't be invoked in the guest, and the guest is starved
until block is cleared by some events (e.g. canceling injection).
This patch injects pending interrupts, when it's allowed, even if NMI
is blocked. And, If an interrupts is pending after executing
inject_pending_event(), enable_irq_window() is executed regardless of
NMI pending counter.
Signed-off-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- vcpu_enter_guest() is simpler because inject_pending_event() can't fail
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f08bb1e0db upstream.
During revalidate we check whether device capacity has changed before we
decide whether to output disk information or not.
The check for old capacity failed to take into account that we scaled
sdkp->capacity based on the reported logical block size. And therefore
the capacity test would always fail for devices with sectors bigger than
512 bytes and we would print several copies of the same discovery
information.
Avoid scaling sdkp->capacity and instead adjust the value on the fly
when setting the block device capacity and generating fake C/H/S
geometry.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinicke <hare@suse.de>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2:
- logical_to_sectors() is a new function
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e9a0b0525 upstream.
An attack using the lack of sanity checking in probe is known. This
patch checks for the existence of a second port.
CVE-2016-3136
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
[johan: add error message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: put the check in mct_u232_startup(), which already
has a 'serial' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 836b34a935 upstream.
create_fixed_stream_quirk(), snd_usb_parse_audio_interface() and
create_uaxx_quirk() functions allocate the audioformat object by themselves
and free it upon error before returning. However, once the object is linked
to a stream, it's freed again in snd_usb_audio_pcm_free(), thus it'll be
double-freed, eventually resulting in a memory corruption.
This patch fixes these failures in the error paths by unlinking the audioformat
object before freeing it.
Based on a patch by Takashi Iwai <tiwai@suse.de>
[Note for stable backports:
this patch requires the commit 902eb7fd1e ('ALSA: usb-audio: Minor
code cleanup in create_fixed_stream_quirk()')]
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1283358
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 902eb7fd1e upstream.
Just a minor code cleanup: unify the error paths.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6490865c67 upstream.
This patch adds a code to surely disable TX IRQ of the pipe before
starting TX DMAC transfer. Otherwise, a lot of unnecessary TX IRQs
may happen in rare cases when DMAC is used.
Fixes: e73a989 ("usb: renesas_usbhs: add DMAEngine support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c2e2266a5 upstream.
arm:pxa_defconfig can result in the following crash if the max1111 driver
is not instantiated.
Unhandled fault: page domain fault (0x01b) at 0x00000000
pgd = c0004000
[00000000] *pgd=00000000
Internal error: : 1b [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 300 Comm: kworker/0:1 Not tainted 4.5.0-01301-g1701f680407c #10
Hardware name: SHARP Akita
Workqueue: events sharpsl_charge_toggle
task: c390a000 ti: c391e000 task.ti: c391e000
PC is at max1111_read_channel+0x20/0x30
LR is at sharpsl_pm_pxa_read_max1111+0x2c/0x3c
pc : [<c03aaab0>] lr : [<c0024b50>] psr: 20000013
...
[<c03aaab0>] (max1111_read_channel) from [<c0024b50>]
(sharpsl_pm_pxa_read_max1111+0x2c/0x3c)
[<c0024b50>] (sharpsl_pm_pxa_read_max1111) from [<c00262e0>]
(spitzpm_read_devdata+0x5c/0xc4)
[<c00262e0>] (spitzpm_read_devdata) from [<c0024094>]
(sharpsl_check_battery_temp+0x78/0x110)
[<c0024094>] (sharpsl_check_battery_temp) from [<c0024f9c>]
(sharpsl_charge_toggle+0x48/0x110)
[<c0024f9c>] (sharpsl_charge_toggle) from [<c004429c>]
(process_one_work+0x14c/0x48c)
[<c004429c>] (process_one_work) from [<c0044618>] (worker_thread+0x3c/0x5d4)
[<c0044618>] (worker_thread) from [<c004a238>] (kthread+0xd0/0xec)
[<c004a238>] (kthread) from [<c000a670>] (ret_from_fork+0x14/0x24)
This can occur because the SPI controller driver (SPI_PXA2XX) is built as
module and thus not necessarily loaded. While building SPI_PXA2XX into the
kernel would make the problem disappear, it appears prudent to ensure that
the driver is instantiated before accessing its data structures.
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit be12b299a8 upstream.
When master handles convert request, it queues ast first and then
returns status. This may happen that the ast is sent before the request
status because the above two messages are sent by two threads. And
right after the ast is sent, if master down, it may trigger BUG in
dlm_move_lockres_to_recovery_list in the requested node because ast
handler moves it to grant list without clear lock->convert_pending. So
remove BUG_ON statement and check if the ast is processed in
dlmconvert_remote.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ac7cf246df upstream.
There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.
dlmconvert_remote
{
spin_lock(&res->spinlock);
list_move_tail(&lock->list, &res->converting);
lock->convert_pending = 1;
spin_unlock(&res->spinlock);
status = dlm_send_remote_convert_request();
>>>>>> race window, master has queued ast and return DLM_NORMAL,
and then down before sending ast.
this node detects master down and calls
dlm_move_lockres_to_recovery_list, which will revert the
lock to grant list.
Then OCFS2_LOCK_BUSY won't be cleared as new master won't
send ast any more because it thinks already be authorized.
spin_lock(&res->spinlock);
lock->convert_pending = 0;
if (status != DLM_NORMAL)
dlm_revert_pending_convert(res, lock);
spin_unlock(&res->spinlock);
}
In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
retry convert.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 968ce1b1f4 upstream.
The old web page for the hwmon subsystem is no longer operational,
and the mailing list has become unreliable. Move both to kernel.org.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.2: the set of hwmon drivers is different, so do a
search-and-replace for the same addresses]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 950336ba3e upstream.
The ati_remote2 driver expects at least two interfaces with one
endpoint each. If given malicious descriptor that specify one
interface or no endpoints, it will crash in the probe function.
Ensure there is at least two interfaces and one endpoint for each
interface before using it.
The full disclosure: http://seclists.org/bugtraq/2016/Mar/90
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 378c6520e7 upstream.
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:
- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)
Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.
To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3af0d554c1 upstream.
There were two issues here:
1) dma_mapping_error() return true/false but we want to return -ENOMEM
2) If dmaengine_prep_slave_sg() failed then "err" wasn't set but
presumably that should be -ENOMEM as well.
I changed the success path to "return 0;" instead of "return ret;" for
clarity.
Fixes: 94fe8c683c ('ks8842: Support DMA when accessed via timberdale')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6785d9152 upstream.
Running the following command:
busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null
with any tracing enabled pretty very quickly leads to various NULL
pointer dereferences and VM BUG_ON()s, such as these:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40
Call Trace:
[<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
[<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
[<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
[<ffffffff81196869>] do_sendfile+0x199/0x380
[<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
[<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d
page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
kernel BUG at include/linux/mm.h:367!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40
Call Trace:
[<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
[<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
[<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
[<ffffffff81196869>] do_sendfile+0x199/0x380
[<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
[<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89
(busybox's cat uses sendfile(2), unlike the coreutils version)
This is because tracing_splice_read_pipe() can call splice_to_pipe()
with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and
we fill the page pointers and the other fields of the pipe_buffers with
garbage.
All other callers of splice_to_pipe() avoid calling it when nr_pages ==
0, and we could make tracing_splice_read_pipe() do that too, but it
seems reasonable to have splice_to_page() handle this condition
gracefully.
Signed-off-by: Rabin Vincent <rabin@rab.in>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aeb6641f8e upstream.
gcc-6 complains about the indentation of the lpfc_destroy_vport_work_array()
call in lpfc_online(), which clearly doesn't look right:
drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_online':
drivers/scsi/lpfc/lpfc_init.c:2880:3: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
lpfc_destroy_vport_work_array(phba, vports);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/lpfc/lpfc_init.c:2863:2: note: ...this 'if' clause, but it is not
if (vports != NULL)
^~
Looking at the patch that introduced this code, it's clear that the
behavior is correct and the indentation is wrong.
This fixes the indentation and adds curly braces around the previous
if() block for clarity, as that is most likely what caused the code
to be misindented in the first place.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 549e55cd2a ("[SCSI] lpfc 8.2.2 : Fix locking around HBA's port_list")
Reviewed-by: Sebastian Herbszt <herbszt@gmx.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb86e05390 upstream.
Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.
Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:
<...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit
<...>-2265 1d..1 3419us : __do_softirq <-irq_exit
<...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq
<...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq
<...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq
<...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq
<...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
<...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
<...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
<...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
<...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
<...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq
<...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq
<...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq
There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.
Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com
Fixes: 5e6d2b9cfa "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Cc: stable@vget.kernel.org # 2.6.37+
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8835ba4a39 upstream.
An attack has become available which pretends to be a quirky
device circumventing normal sanity checks and crashes the kernel
by an insufficient number of interfaces. This patch adds a check
to the code path for quirky devices.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0b818e3956 upstream.
Attacks that trick drivers into passing a NULL pointer
to usb_driver_claim_interface() using forged descriptors are
known. This thwarts them by sanity checking.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ec0ef3a82 upstream.
The iowarrior driver expects at least one valid endpoint. If given
malicious descriptors that specify 0 for the number of endpoints,
it will crash in the probe function. Ensure there is at least
one endpoint on the interface before using it.
The full report of this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/87
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23ddba80eb upstream.
This is the raid10 counterpart of the bug fixed by Nate
(raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang)
Fixes: 95af587e95(md/raid10: ensure device failure recorded before write request returns)
Cc: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccfc7bf1f0 upstream.
If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.
This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.
If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.
To prevent the hang, include bio_end_io_list writes in nr_queued.
There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.
I'm happy to supply the script I used to repro this hang.
Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8e56ffd9d upstream.
Locking ppp_mutex must be done before dereferencing file->private_data,
otherwise it could be modified before ppp_unattached_ioctl() takes the
lock. This could lead ppp_unattached_ioctl() to override ->private_data,
thus leaking reference to the ppp_file previously pointed to.
v2: lock all ppp_ioctl() instead of just checking private_data in
ppp_unattached_ioctl(), to avoid ambiguous behaviour.
Fixes: f3ff8a4d80 ("ppp: push BKL down into the driver")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f6fc056e8 upstream.
nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.
A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 447d6275f0 upstream.
Add some sanity check codes before actually accessing the endpoint via
get_endpoint() in order to avoid the invalid access through a
malformed USB descriptor. Mostly just checking bNumEndpoints, but in
one place (snd_microii_spdif_default_get()), the validity of iface and
altsetting index is checked as well.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=971125
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: drop changes to code we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0f886ca127 upstream.
create_fixed_stream_quirk() may cause a NULL-pointer dereference by
accessing the non-existing endpoint when a USB device with a malformed
USB descriptor is used.
This patch avoids it simply by adding a sanity check of bNumEndpoints
before the accesses.
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=971125
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2:
- There's no altsd variable
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 83d6f1f15f upstream.
Code that was added back in 2.6.38 has an obvious overflow
when accessing a static array, and at the time it was added
only a code comment was put in front of it as a reminder
to have it reviewed properly.
This has not happened, but gcc-6 now points to the specific
overflow:
drivers/net/wireless/ath/ath9k/eeprom.c: In function 'ath9k_hw_get_gain_boundaries_pdadcs':
drivers/net/wireless/ath/ath9k/eeprom.c:483:44: error: array subscript is above array bounds [-Werror=array-bounds]
maxPwrT4[i] = data_9287[idxL].pwrPdg[i][4];
~~~~~~~~~~~~~~~~~~~~~~~~~^~~
It turns out that the correct array length exists in the local
'intercepts' variable of this function, so we can just use that
instead of hardcoding '4', so this patch changes all three
instances to use that variable. The other two instances were
already correct, but it's more consistent this way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 940cd2c12e ("ath9k_hw: merge the ar9287 version of ath9k_hw_get_gain_boundaries_pdadcs")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34b88a68f2 upstream.
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541 ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c6ba45671 upstream.
The powermate driver expects at least one valid USB endpoint in its
probe function. If given malicious descriptors that specify 0 for
the number of endpoints, it will crash. Validate the number of
endpoints on the interface before using them.
The full report for this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/85
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a25f4a95ec upstream.
drivers/rtc/rtc-vr41xx.c:229: warning: ‘vr41xx_rtc_alarm_irq_enable’ defined but not used
Apparently the conversion to alarm_irq_enable forgot to wire up the
callback.
Fixes: 16380c153a ("RTC: Convert rtc drivers to use the alarm_irq_enable method")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0a2ad9b50 upstream.
On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.
The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.
mount (id=10)
write transaction (id=11)
write transaction (id=12)
umount (id=10) <= the bug doesn't write latest ID
mount (id=10)
write transaction (id=11)
crash
mount
[recovery process]
transaction (id=11)
transaction (id=12) <= valid transaction ID, but old commit
must not replay
Like above, this bug become the cause of recovery failure, or FS
corruption.
So why ->j_tail_sequence doesn't point latest ID?
Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)
So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.
So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)
BTW,
journal->j_tail_sequence =
++journal->j_transaction_sequence;
Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5ecee0a3ee upstream.
One of the strange things that the original sg driver did was let the
user provide both a data-out buffer (it followed the sg_header+cdb)
_and_ specify a reply length greater than zero. What happened was that
the user data-out buffer was copied into some kernel buffers and then
the mid level was told a read type operation would take place with the
data from the device overwriting the same kernel buffers. The user would
then read those kernel buffers back into the user space.
From what I can tell, the above action was broken by commit fad7f01e61
("sg: set dxferp to NULL for READ with the older SG interface") in 2008
and syzkaller found that out recently.
Make sure that a user space pointer is passed through when data follows
the sg_header structure and command. Fix the abnormal case when a
non-zero reply_len is also given.
Fixes: fad7f01e61
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 459ee1c3fd upstream.
As observed on Apple iMac10,1, DCE-3.2, RV-730,
link rate of 2.7 Ghz is not selected, because
the args.v1.ucConfig flag setting for 2.7 Ghz
gets overwritten by a following assignment of
the transmitter to use.
Move link rate setup a few lines down to fix this.
In practice this didn't have any positive or
negative effect on display setup on the tested
iMac10,1 so i don't know if backporting to stable
makes sense or not.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b894157145 upstream.
The Home Agent and PCU PCI devices in Broadwell-EP have a non-BAR register
where a BAR should be. We don't know what the side effects of sizing the
"BAR" would be, and we don't know what address space the "BAR" might appear
to describe.
Mark these devices as having non-compliant BARs so the PCI core doesn't
touch them.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7617a24f83 upstream.
The IPVS SIP persistence engine is not able to parse the SIP header
"Call-ID" when such header is inserted in the first positions of
the SIP message.
When IPVS is configured with "--pe sip" option, like for example:
ipvsadm -A -u 1.2.3.4:5060 -s rr --pe sip -p 120 -o
some particular messages (see below for details) do not create entries
in the connection template table, which can be listed with:
ipvsadm -Lcn --persistent-conn
Problematic SIP messages are SIP responses having "Call-ID" header
positioned just after message first line:
SIP/2.0 200 OK
[Call-ID header here]
[rest of the headers]
When "Call-ID" header is positioned down (after a few other headers)
it is correctly recognized.
This is due to the data offset used in get_callid function call inside
ip_vs_pe_sip.c file: since dptr already points to the start of the
SIP message, the value of dataoff should be initially 0.
Otherwise the header is searched starting from some bytes after the
first character of the SIP message.
Fixes: 758ff03387 ("IPVS: sip persistence engine")
Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9532e69b8 upstream.
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:
u64 steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.
Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.
None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.
Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa83e "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa48380851 "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685acc "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7dd0fdff14 upstream.
Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.
This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.
The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.
Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)
Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- s/ps->reinject/ps->pit_timer.reinject/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0d5ce778c4 upstream.
A typo of j for i led to a logic bug. To rule out future
confusion, the variable names are made meaningful.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e83b79b2d upstream.
This plugs 2 trivial leaks in xfs_attr_shortform_list and
xfs_attr3_leaf_list_int.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4aed9c46af upstream.
A number of spots in the xdr decoding follow a pattern like
n = be32_to_cpup(p++);
READ_BUF(n + 4);
where n is a u32. The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher. I'm not sure exactly what the consequences
are, but we've seen crashes soon after.
Instead, just break these up into two READ_BUF()s.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 10e7ac22cd upstream.
Calling return copy_to_user(...) in an ioctl will not do the right thing
if there's a pagefault: copy_to_user returns the number of bytes not
copied in this case.
Fix up watchdog/rc32434_wdt to do
return copy_to_user(...)) ? -EFAULT : 0;
instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c915c6876 upstream.
On my bttv card "Hauppauge WinTV [card=10]" capturing in YV12 fmt at max
size results in a solid green rectangle being captured (all colors 0 in
YUV).
This turns out to be caused by max-width (924) not being a multiple of 16.
We've likely never hit this problem before since normally xawtv / tvtime,
etc. will prefer packed pixel formats. But when using a video card which
is using xf86-video-modesetting + glamor, only planar XVideo fmts are
available, and xawtv will chose a matching capture format to avoid needing
to do conversion, triggering the solid green window problem.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b84106b4e2 upstream.
The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.
Some non-spec-compliant devices put registers other than BARs where the
BARs should be. When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.
Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea32f065bd upstream.
On error we jumped to the error label and returned the error code but we
missed releasing sinfo.
Fixes: 5fe74014172d ("mac80211: avoid excessive stack usage in sta_info")
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: there's no out_err label but there is another
error case that would leak sinfo]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ef049dc11 upstream.
When CONFIG_OPTIMIZE_INLINING is set, the sta_info_insert_finish
function consumes more stack than normally, exceeding the
1024 byte limit on ARM:
net/mac80211/sta_info.c: In function 'sta_info_insert_finish':
net/mac80211/sta_info.c:561:1: error: the frame size of 1080 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
It turns out that there are two functions that put a 'struct station_info'
on the stack: __sta_info_destroy_part2 and sta_info_insert_finish, and
this structure alone requires up to 792 bytes.
Hoping that both are called rarely enough, this replaces the
on-stack structure with a dynamic allocation, which unfortunately
requires some suboptimal error handling for out-of-memory.
The __sta_info_destroy_part2 function is actually affected by the
stack usage twice because it calls cfg80211_del_sta_sinfo(), which
has another instance of struct station_info on its stack.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 98b6218388 ("mac80211/cfg80211: add station events")
Fixes: 6f7a8d26e2 ("mac80211: send statistics with delete station event")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2:
- There's only one instance to fix
- Adjust context,indentation
- Use 'return' instead of 'goto out_err']
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f88fa79a61 upstream.
aac_fib_map_free() calls pci_free_consistent() without checking that
dev->hw_fib_va is not NULL and dev->max_fib_size is not zero.If they are
indeed NULL/0, this will result in a hang as pci_free_consistent() will
attempt to invalidate cache for the entire 64-bit address space
(which would take a very long time).
Fixed by adding a check to make sure that dev->hw_fib_va and
dev->max_fib_size are not NULL and 0 respectively.
Fixes: 9ad5204d6 - "[SCSI]aacraid: incorrect dma mapping mask during blinked recover or user initiated reset"
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@pmcs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 264904ccc3 upstream.
Some devices I got show an inability to operate right after
power on if they are already connected. They are beyond recovery
if the descriptors are requested multiple times. So in case of
a timeout we rather bail early and reset again. But it must be
done only on the first loop lest we get into a reset/time out
spiral that can be overcome with a retry.
This patch is a rework of a patch that fell through the cracks.
http://www.spinics.net/lists/linux-usb/msg103263.html
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 401879c57f upstream.
The N_IRDA line discipline may access the previous line discipline's closed
and already-fre private data on open [1].
The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).
[1]
==================================================================
BUG: KASAN: use-after-free in irtty_open+0x422/0x550 at addr ffff8800331dd068
Read of size 4 by task a.out/13960
=============================================================================
BUG kmalloc-512 (Tainted: G B ): kasan: bad access detected
-----------------------------------------------------------------------------
...
Call Trace:
[<ffffffff815fa2ae>] __asan_report_load4_noabort+0x3e/0x40 mm/kasan/report.c:279
[<ffffffff836938a2>] irtty_open+0x422/0x550 drivers/net/irda/irtty-sir.c:436
[<ffffffff829f1b80>] tty_ldisc_open.isra.2+0x60/0xa0 drivers/tty/tty_ldisc.c:447
[<ffffffff829f21c0>] tty_set_ldisc+0x1a0/0x940 drivers/tty/tty_ldisc.c:567
[< inline >] tiocsetd drivers/tty/tty_io.c:2650
[<ffffffff829da49e>] tty_ioctl+0xace/0x1fd0 drivers/tty/tty_io.c:2883
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff816708ac>] do_vfs_ioctl+0x57c/0xe60 fs/ioctl.c:607
[< inline >] SYSC_ioctl fs/ioctl.c:622
[<ffffffff81671204>] SyS_ioctl+0x74/0x80 fs/ioctl.c:613
[<ffffffff852a7876>] entry_SYSCALL_64_fastpath+0x16/0x7a
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0b41ce9910 upstream.
Some UART HW has a single register combining UART_DLL/UART_DLM
(this was probably forgotten in the change that introduced the
callbacks, commit b32b19b8ff)
Fixes: b32b19b8ff ("[SERIAL] 8250: set divisor register correctly ...")
Signed-off-by: Sebastian Frias <sf84@laposte.net>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust filename
- We're using serial_{in,out}p for 8-bit I/O]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e470127e96 upstream.
The critical section protected by usbhid->lock in hid_ctrl() is too
big and because of this it causes a recursive deadlock. "Too big" means
the case statement and the call to hid_input_report() do not need to be
protected by the spinlock (no URB operations are done inside them).
The deadlock happens because in certain rare cases drivers try to grab
the lock while handling the ctrl irq which grabs the lock before them
as described above. For example newer wacom tablets like 056a:033c try
to reschedule proximity reads from wacom_intuos_schedule_prox_event()
calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
which tries to grab the usbhid lock already held by hid_ctrl().
There are two ways to get out of this deadlock:
1. Make the drivers work "around" the ctrl critical region, in the
wacom case for ex. by delaying the scheduling of the proximity read
request itself to a workqueue.
2. Shrink the critical region so the usbhid lock protects only the
instructions which modify usbhid state, calling hid_input_report()
with the spinlock unlocked, allowing the device driver to grab the
lock first, finish and then grab the lock afterwards in hid_ctrl().
This patch implements the 2nd solution.
Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2: adjust context]
Cc: Jason Gerecke <jason.gerecke@wacom.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8a5e5e02fc upstream.
Poison pointer values should be small enough to find a room in
non-mmap'able/hardly-mmap'able space. E.g. on x86 "poison pointer space"
is located starting from 0x0. Given unprivileged users cannot mmap
anything below mmap_min_addr, it should be safe to use poison pointers
lower than mmap_min_addr.
The current poison pointer values of LIST_POISON{1,2} might be too big for
mmap_min_addr values equal or less than 1 MB (common case, e.g. Ubuntu
uses only 0x10000). There is little point to use such a big value given
the "poison pointer space" below 1 MB is not yet exhausted. Changing it
to a smaller value solves the problem for small mmap_min_addr setups.
The values are suggested by Solar Designer:
http://www.openwall.com/lists/oss-security/2015/05/02/6
Signed-off-by: Vasily Kulikov <segoon@openwall.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8e20cf2bce upstream.
The aiptek driver crashes in aiptek_probe() when a specially crafted USB
device without endpoints is detected. This fix adds a check that the device
has proper configuration expected by the driver. Also an error return value
is changed to more matching one in one of the error paths.
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3446c13b26 upstream.
The fork of a process with four page table levels is broken since
git commit 6252d702c5 "[S390] dynamic page tables."
All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.
The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.
This fixes CVE-2016-2143.
Reported-by: Marcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2:
- 31-bit s390 is still supported so keep the #ifdef CONFIG_64BIT conditions
- Split page table locks are not implemented for PMDs so don't call
pgtable_pmd_page_{ctor,dtor}()
- PMDs are not accounted so don't call mm_inc_nr_pmds()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 256faedcfd upstream.
This reverts commit dbb17a21c1.
It turns out that commit can cause problems for systems with multiple
GPUs, and causes X to hang on at least a HP Pavilion dv7 with hybrid
graphics.
This got noticed originally in 4.4.4, where this patch had already
gotten back-ported, but 4.5-rc7 was verified to have the same problem.
Alexander Deucher says:
"It looks like you have a muxed system so I suspect what's happening is
that one of the display is being reported as connected for both the
IGP and the dGPU and then the desktop environment gets confused or
there some sort problem in the detect functions since the mux is not
switched to the dGPU. I don't see an easy fix unless Dave has any
ideas. I'd say just revert for now"
Reported-by: Jörg-Volker Peetz <jvpeetz@web.de>
Acked-by: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4f6daac20 upstream.
ubi_start_leb_change() allocates too few bytes.
ubi_more_leb_change_data() will write up to req->upd_bytes +
ubi->min_io_size bytes.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e723e3f7f9 upstream.
Avoid sending a partially initialised `siginfo_t' structure along SIGFPE
signals issued from `do_ov' and `do_trap_or_bp', leading to information
leaking from the kernel stack.
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1837b2e2bc upstream.
The current reserved_tailroom calculation fails to take hlen and tlen into
account.
skb:
[__hlen__|__data____________|__tlen___|__extra__]
^ ^
head skb_end_offset
In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:
[__hlen__|__data____________|__extra__|__tlen___]
^ ^
head skb_end_offset
The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.
The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen
Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.
Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11d8d64534 upstream.
According to IBTA spec v1.3 section 12.7.19, QPs should use GRH when
the path returned by the SA has hop-limit > 0. Currently, we do that
only for the > 1 case, fix that.
Fixes: 6d969a471b ('IB/sa: Add ib_init_ah_from_path()')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 92f9e179a7 upstream.
Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.
To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 197b958c1e upstream.
The OSS sequencer client tries to drain the pending events at
releasing. Unfortunately, as spotted by syzkaller fuzzer, this may
lead to an unkillable process state when the event has been queued at
the far future. Since the process being released can't be signaled
any longer, it remains and waits for the echo-back event in that far
future.
Back to history, the draining feature was implemented at the time we
misinterpreted POSIX definition for blocking file operation.
Actually, such a behavior is superfluous at release, and we should
just release the device as is instead of keeping it up forever.
This patch just removes the draining call that may block the release
for too long time unexpectedly.
BugLink: http://lkml.kernel.org/r/CACT4Y+Y4kD-aBGj37rf-xBw9bH3GMU6P+MYg4W1e-s-paVD2pg@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: snd_seq_oss_drain_write() has an extra log statement
to be deleted]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8019c0b37c upstream.
The DRC Mode like "AIF1DRC1 Mode" and EQ Mode like "AIF1.1 EQ Mode" in
wm8994 codec driver are enum ctls, while the current driver accesses
wrongly via value.integer.value[]. They have to be via
value.enumerated.item[] instead.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d0784829ae upstream.
"MBC Mode", "VSS Mode", "VSS HPF Mode" and "Enhanced EQ Mode" ctls in
wm8958 codec driver are enum, while the current driver accesses
wrongly via value.integer.value[]. They have to be via
value.enumerated.item[] instead.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c4c615d70 upstream.
The Parrot NMEA GPS Flight Recorder is a USB composite device
consisting of hub, flash storage, and cp210x usb to serial chip.
It is an accessory to the mass-produced Parrot AR Drone 2.
The device emits standard NMEA messages which make the it compatible
with NMEA compatible software. It was tested using gpsd version 3.11-3
as an NMEA interpreter and using the official Parrot Flight Recorder.
Signed-off-by: Vittorio Alfieri <vittorio88@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 537e481362 upstream.
snd-hdspm driver accesses enum item values (int) instead of boolean
values (long) wrongly for some ctl elements. This patch fixes them.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: drop change to snd_hdspm_put_system_sample_rate()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3a72494ac2 upstream.
The timer user status compat ioctl returned the bogus struct used for
64bit architectures instead of the 32bit one. This patch addresses
it to return the proper struct.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6853f78e7 upstream.
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.
If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit
7dd29d8d86 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").
This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: deleted code is slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad33bb04b2 upstream.
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 21b81716c6 upstream.
Commit d63c7dd5bc ("ipr: Fix out-of-bounds null overwrite") removed
the end of line handling when storing the update_fw sysfs attribute.
This changed the userpace API because it started refusing writes
terminated by a line feed, which broke the update tools we already have.
This patch re-adds that handling, so both a write terminated by a line
feed or not can make it through with the update.
Fixes: d63c7dd5bc ("ipr: Fix out-of-bounds null overwrite")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Insu Yun <wuninsu@gmail.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ee34ea3a1 upstream.
The id buffer in ata_device is a DMA target, but it isn't explicitly
cacheline aligned. Due to this, adjacent fields can be overwritten with
stale data from memory on non coherent architectures. As a result, the
kernel is sometimes unable to communicate with an ATA device.
Fix this by ensuring that the id buffer is cacheline aligned.
This issue is similar to that fixed by Commit 84bda12af3
("libata: align ap->sector_buf").
Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit be629c62a6 upstream.
When a directory is deleted, we don't take too much care about killing off
all the dirents that belong to it — on the basis that on remount, the scan
will conclude that the directory is dead anyway.
This doesn't work though, when the deleted directory contained a child
directory which was moved *out*. In the early stages of the fs build
we can then end up with an apparent hard link, with the child directory
appearing both in its true location, and as a child of the original
directory which are this stage of the mount process we don't *yet* know
is defunct.
To resolve this, take out the early special-casing of the "directories
shall not have hard links" rule in jffs2_build_inode_pass1(), and let the
normal nlink processing happen for directories as well as other inodes.
Then later in the build process we can set ic->pino_nlink to the parent
inode#, as is required for directories during normal operaton, instead
of the nlink. And complain only *then* about hard links which are still
in evidence even after killing off all the unreachable paths.
Reported-by: Liu Song <liu.song11@zte.com.cn>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 157078f64b upstream.
This reverts commit 5ffd3412ae
("jffs2: Fix lock acquisition order bug in jffs2_write_begin").
The commit modified jffs2_write_begin() to remove a deadlock with
jffs2_garbage_collect_live(), but this introduced new deadlocks found
by multiple users. page_lock() actually has to be called before
mutex_lock(&c->alloc_sem) or mutex_lock(&f->sem) because
jffs2_write_end() and jffs2_readpage() are called with the page locked,
and they acquire c->alloc_sem and f->sem, resp.
In other words, the lock order in jffs2_write_begin() was correct, and
it is the jffs2_garbage_collect_live() path that has to be changed.
Revert the commit to get rid of the new deadlocks, and to clear the way
for a better fix of the original deadlock.
Reported-by: Deng Chao <deng.chao1@zte.com.cn>
Reported-by: Ming Liu <liu.ming50@gmail.com>
Reported-by: wangzaiwei <wangzaiwei@top-vision.cn>
Signed-off-by: Thomas Betker <thomas.betker@rohde-schwarz.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2: use D1(printk(KERN_DEBUG ...)) instead of
jffs2_dbg(1, ...)]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b7052cd7bc upstream.
The qword_get() function NUL-terminates its output buffer. If the input
string is in hex format \xXXXX... and the same length as the output
buffer, there is an off-by-one:
int qword_get(char **bpp, char *dest, int bufsize)
{
...
while (len < bufsize) {
...
*dest++ = (h << 4) | l;
len++;
}
...
*dest = '\0';
return len;
}
This patch ensures the NUL terminator doesn't fall outside the output
buffer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7a36b930e6 upstream.
The value 5000 was put here with the addition of the timeout field to
ieee80211_start_tx_ba_session. It was originally added in mac80211 to
save resources for drivers like iwlwifi, which only supports a limited
number of concurrent aggregation sessions.
Since iwlwifi does not use minstrel_ht and other drivers don't need
this, 0 is a better default - especially since there have been
recent reports of aggregation setup related issues reproduced with
ath9k. This should improve stability without causing any adverse
effects.
Acked-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90cfde4658 upstream.
This patch fixes the problem that more CAN messages could be sent to the
interface as could be send on the CAN bus. This was more likely for slow baud
rates. The sleeping _start_xmit was woken up in the _write_bulk_callback. Under
heavy TX load this produced another bulk transfer without checking the
free_slots variable and hence caused the overflow in the interface.
Signed-off-by: Gerhard Uttenthaler <uttenthaler@ems-wuensche.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 59ceeaaf35 upstream.
In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released. A pointer on the conflicting resource is kept. At wake-up
this pointer is used as a parent to retry to request the region.
A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed). Another
problem is that the next call to __request_region() fails to detect a
remaining conflict. The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself. It is likely
to succeed anyway, even if there is still a conflict.
Instead, the parent of the conflicting resource should be passed to
__request_region().
As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.
Reported-and-tested-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Tested-by: Vincent Donnefort <vdonnefort@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed8ad83808 upstream.
ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
Fix the problem by always updating bh->b_state bits atomically.
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- s/READ_ONCE/ACCESS_ONCE/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 281e8b2fdf upstream.
RdropOvflw counts overrun of HW buffer, therefore should
be used for rx_fifo_errors only.
Currently RdropOvflw counter is mistakenly also set into
rx_missed_errors and rx_over_errors too, which makes the
device total dropped packets accounting to show wrong results.
Fix that. Use it for rx_fifo_errors only.
Fixes: c27a02cd94 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c450960187 upstream.
The assignement of EP transfer resources was not handled properly in the
dwc3 driver. Commit aebda61871 ("usb: dwc3: Reset the transfer
resource index on SET_INTERFACE") previously fixed one aspect of this
where resources may be exhausted with multiple calls to SET_INTERFACE.
However, it introduced an issue where composite devices with multiple
interfaces can be assigned the same transfer resources for different
endpoints. This patch solves both issues.
The assignment of transfer resources cannot perfectly follow the data
book due to the fact that the controller driver does not have all
knowledge of the configuration in advance. It is given this information
piecemeal by the composite gadget framework after every
SET_CONFIGURATION and SET_INTERFACE. Trying to follow the databook
programming model in this scenario can cause errors. For two reasons:
1) The databook says to do DEPSTARTCFG for every SET_CONFIGURATION and
SET_INTERFACE (8.1.5). This is incorrect in the scenario of multiple
interfaces.
2) The databook does not mention doing more DEPXFERCFG for new endpoint
on alt setting (8.1.6).
The following simplified method is used instead:
All hardware endpoints can be assigned a transfer resource and this
setting will stay persistent until either a core reset or hibernation.
So whenever we do a DEPSTARTCFG(0) we can go ahead and do DEPXFERCFG for
every hardware endpoint as well. We are guaranteed that there are as
many transfer resources as endpoints.
This patch triggers off of the calling dwc3_gadget_start_config() for
EP0-out, which always happens first, and which should only happen in one
of the above conditions.
Fixes: aebda61871 ("usb: dwc3: Reset the transfer resource index on SET_INTERFACE")
Reported-by: Ravi Babu <ravibabu@ti.com>
Signed-off-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a5527dda34 upstream.
The unix_dgram_sendmsg routine use the following test
if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.
Fixes: 7d267278a9 ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b92ee3d03 upstream.
The present unix_stream_read_generic contains various code sequences of
the form
err = -EDISASTER;
if (<test>)
goto out;
This has the unfortunate side effect of possibly causing the error code
to bleed through to the final
out:
return copied ? : err;
and then to be wrongly returned if no data was copied because the caller
didn't supply a data buffer, as demonstrated by the program available at
http://pad.lv/1540731
Change it such that err is only set if an error condition was detected.
Fixes: 3822b5c2fc ("af_unix: Revert 'lock_interruptible' in stream receive code")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13d5e5d472 upstream.
The commit [7f0973e973: ALSA: seq: Fix lockdep warnings due to
double mutex locks] split the management of two linked lists (source
and destination) into two individual calls for avoiding the AB/BA
deadlock. However, this may leave the possible double deletion of one
of two lists when the counterpart is being deleted concurrently.
It ends up with a list corruption, as revealed by syzkaller fuzzer.
This patch fixes it by checking the list emptiness and skipping the
deletion and the following process.
BugLink: http://lkml.kernel.org/r/CACT4Y+bay9qsrz6dQu31EcGaH9XwfW7o3oBzSQUG9fMszoh=Sg@mail.gmail.com
Fixes: 7f0973e973 ('ALSA: seq: Fix lockdep warnings due to 'double mutex locks)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b33c8ff443 upstream.
In my randconfig tests, I came across a bug that involves several
components:
* gcc-4.9 through at least 5.3
* CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files
* CONFIG_PROFILE_ALL_BRANCHES overriding every if()
* The optimized implementation of do_div() that tries to
replace a library call with an division by multiplication
* code in drivers/media/dvb-frontends/zl10353.c doing
u32 adc_clock = 450560; /* 45.056 MHz */
if (state->config.adc_clock)
adc_clock = state->config.adc_clock;
do_div(value, adc_clock);
In this case, gcc fails to determine whether the divisor
in do_div() is __builtin_constant_p(). In particular, it
concludes that __builtin_constant_p(adc_clock) is false, while
__builtin_constant_p(!!adc_clock) is true.
That in turn throws off the logic in do_div() that also uses
__builtin_constant_p(), and instead of picking either the
constant- optimized division, and the code in ilog2() that uses
__builtin_constant_p() to figure out whether it knows the answer at
compile time. The result is a link error from failing to find
multiple symbols that should never have been called based on
the __builtin_constant_p():
dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN'
dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod'
ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined!
ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined!
This patch avoids the problem by changing __trace_if() to check
whether the condition is known at compile-time to be nonzero, rather
than checking whether it is actually a constant.
I see this one link error in roughly one out of 1600 randconfig builds
on ARM, and the patch fixes all known instances.
Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de
Acked-by: Nicolas Pitre <nico@linaro.org>
Fixes: ab3c9c686e ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f37755490f upstream.
The tracepoint infrastructure uses RCU sched protection to enable and
disable tracepoints safely. There are some instances where tracepoints are
used in infrastructure code (like kfree()) that get called after a CPU is
going offline, and perhaps when it is coming back online but hasn't been
registered yet.
This can probuce the following warning:
[ INFO: suspicious RCU usage. ]
4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
-------------------------------
include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34
Call Trace:
[c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
[c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
[c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
[c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
[c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
[c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
[c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
[c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
[c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
[c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
[c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
[c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
This warning is not a false positive either. RCU is not protecting code that
is being executed while the CPU is offline.
Instead of playing "whack-a-mole(TM)" and adding conditional statements to
the tracepoints we find that are used in this instance, simply add a
cpu_online() test to the tracepoint code where the tracepoint will be
ignored if the CPU is offline.
Use of raw_smp_processor_id() is fine, as there should never be a case where
the tracepoint code goes from running on a CPU that is online and suddenly
gets migrated to a CPU that is offline.
Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Fixes: 97e1c18e8d ("tracing: Kernel Tracepoints")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d99a36f472 upstream.
When multiple concurrent writes happen on the ALSA sequencer device
right after the open, it may try to allocate vmalloc buffer for each
write and leak some of them. It's because the presence check and the
assignment of the buffer is done outside the spinlock for the pool.
The fix is to move the check and the assignment into the spinlock.
(The current implementation is suboptimal, as there can be multiple
unnecessary vmallocs because the allocation is done before the check
in the spinlock. But the pool size is already checked beforehand, so
this isn't a big problem; that is, the only possible path is the
multiple writes before any pool assignment, and practically seen, the
current coverage should be "good enough".)
The issue was triggered by syzkaller fuzzer.
BugLink: http://lkml.kernel.org/r/CACT4Y+bSzazpXNvtAr=WXaL8hptqjHwqEyFA+VN2AWEx=aurkg@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4d8c8bd6f2 upstream.
Occasionaly PV guests would crash with:
pciback 0000:00:00.1: Xen PCI mapped GSI0 to IRQ16
BUG: unable to handle kernel paging request at 0000000d1a8c0be0
.. snip..
<ffffffff8139ce1b>] find_next_bit+0xb/0x10
[<ffffffff81387f22>] cpumask_next_and+0x22/0x40
[<ffffffff813c1ef8>] pci_device_probe+0xb8/0x120
[<ffffffff81529097>] ? driver_sysfs_add+0x77/0xa0
[<ffffffff815293e4>] driver_probe_device+0x1a4/0x2d0
[<ffffffff813c1ddd>] ? pci_match_device+0xdd/0x110
[<ffffffff81529657>] __device_attach_driver+0xa7/0xb0
[<ffffffff815295b0>] ? __driver_attach+0xa0/0xa0
[<ffffffff81527622>] bus_for_each_drv+0x62/0x90
[<ffffffff8152978d>] __device_attach+0xbd/0x110
[<ffffffff815297fb>] device_attach+0xb/0x10
[<ffffffff813b75ac>] pci_bus_add_device+0x3c/0x70
[<ffffffff813b7618>] pci_bus_add_devices+0x38/0x80
[<ffffffff813dc34e>] pcifront_scan_root+0x13e/0x1a0
[<ffffffff817a0692>] pcifront_backend_changed+0x262/0x60b
[<ffffffff814644c6>] ? xenbus_gather+0xd6/0x160
[<ffffffff8120900f>] ? put_object+0x2f/0x50
[<ffffffff81465c1d>] xenbus_otherend_changed+0x9d/0xa0
[<ffffffff814678ee>] backend_changed+0xe/0x10
[<ffffffff81463a28>] xenwatch_thread+0xc8/0x190
[<ffffffff810f22f0>] ? woken_wake_function+0x10/0x10
which was the result of two things:
When we call pci_scan_root_bus we would pass in 'sd' (sysdata)
pointer which was an 'pcifront_sd' structure. However in the
pci_device_add it expects that the 'sd' is 'struct sysdata' and
sets the dev->node to what is in sd->node (offset 4):
set_dev_node(&dev->dev, pcibus_to_node(bus));
__pcibus_to_node(const struct pci_bus *bus)
{
const struct pci_sysdata *sd = bus->sysdata;
return sd->node;
}
However our structure was pcifront_sd which had nothing at that
offset:
struct pcifront_sd {
int domain; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct pcifront_device * pdev; /* 8 8 */
}
That is an hole - filled with garbage as we used kmalloc instead of
kzalloc (the second problem).
This patch fixes the issue by:
1) Use kzalloc to initialize to a well known state.
2) Put 'struct pci_sysdata' at the start of 'pcifront_sd'. That
way access to the 'node' will access the right offset.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d159457b84 upstream.
Commit 8135cf8b09 (xen/pciback: Save
xen_pci_op commands before processing it) broke enabling MSI-X because
it would never copy the resulting vectors into the response. The
number of vectors requested was being overwritten by the return value
(typically zero for success).
Save the number of vectors before processing the op, so the correct
number of vectors are copied afterwards.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8d47065f7d upstream.
Commit 408fb0e5aa (xen/pciback: Don't
allow MSI-X ops if PCI_COMMAND_MEMORY is not set) prevented enabling
MSI-X on passed-through virtual functions, because it checked the VF
for PCI_COMMAND_MEMORY but this is not a valid bit for VFs.
Instead, check the physical function for PCI_COMMAND_MEMORY.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 287e6611ab upstream.
As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not
work correctly in compat mode with libata.
I have investigated the issue further and found multiple problems
that all appeared with the same commit that originally introduced
HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably
also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy
a 'long' variable containing either 0 or 1 to user space.
The problems with this are:
* On big-endian machines, this will always write a zero because it
stores the wrong byte into user space.
* In compat mode, the upper three bytes of the variable are updated
by the compat_hdio_ioctl() function, but they now contain
uninitialized stack data.
* The hdparm tool calling this ioctl uses a 'static long' variable
to store the result. This means at least the upper bytes are
initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT
would fill them with data that remains stale when the low byte
is overwritten. Fortunately libata doesn't implement any of the
affected ioctl commands, so this would only happen when we query
both an IDE and an ATA device in the same command such as
"hdparm -N -c /dev/hda /dev/sda"
* The libata code for unknown reasons started using ATA_IOC_GET_IO32
and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT,
while the ioctl commands that were added later use the normal
HDIO_* names. This is harmless but rather confusing.
This addresses all four issues by changing the code to use put_user()
on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem
does, and by clarifying the names of the ioctl commands.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Soohoon Lee <Soohoon.Lee@f5.com>
Tested-by: Soohoon Lee <Soohoon.Lee@f5.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9d862ababb upstream.
Add refcount to the DASD device when a summary unit check worker is
scheduled. This prevents that the device is set offline with worker
in place.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 020bf042e5 upstream.
The channel checks the specified length and the provided amount of
data for CCWs and provides an incorrect length error if the size does
not match. Under z/VM with simulation activated the length may get
changed. Having the suppress length indication bit set is stated as
good CCW coding practice and avoids errors under z/VM.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4b550af519 upstream.
The setup_ntlmv2_rsp() function may return positive value ENOMEM instead
of -ENOMEM in case of kmalloc failure.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 50ab8ec74a upstream.
See http: //www.infradead.org/rpr.html
X-Evolution-Source: 1451162204.2173.11@leira.trondhjem.org
Content-Transfer-Encoding: 8bit
Mime-Version: 1.0
We support OFFSET_MAX just fine, so don't round down below it. Also
switch to using min_t to make the helper more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: 433c92379d ("NFS: Clean up nfs_size_to_loff_t()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f39ea2690b upstream.
Use kzalloc instead of kmalloc for struct tid_ampdu_rx to
initialize the "removed" field (all others are initialized
manually). That fixes:
UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29
load of value 2 is not a valid value for type '_Bool'
CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265
Workqueue: phy0 rt2x00usb_work_rxdone
0000000000000004 ffff880254a7ba50 ffffffff8181d866 0000000000000007
ffff880254a7ba78 ffff880254a7ba68 ffffffff8188422d ffffffff8379b500
ffff880254a7bab8 ffffffff81884747 0000000000000202 0000000348620032
Call Trace:
[<ffffffff8181d866>] dump_stack+0x45/0x5f
[<ffffffff8188422d>] ubsan_epilogue+0xd/0x40
[<ffffffff81884747>] __ubsan_handle_load_invalid_value+0x67/0x70
[<ffffffff82227b4d>] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730
[<ffffffff8222ca14>] ieee80211_prepare_and_rx_handle+0xd04/0x1c00
[<ffffffff8222db03>] __ieee80211_rx_handle_packet+0x1f3/0x750
[<ffffffff8222e4a7>] ieee80211_rx_napi+0x447/0x990
While at it, convert to use sizeof(*tid_agg_rx) instead.
Fixes: 788211d81b ("mac80211: fix RX A-MPDU session reorder timer deletion")
Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com>
[reword commit message, use sizeof(*tid_agg_rx)]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb150b9d23 upstream.
Since cfg80211 frequently takes actions from its netdev notifier
call, wireless extensions messages could still be ordered badly
since the wext netdev notifier, since wext is built into the
kernel, runs before the cfg80211 netdev notifier. For example,
the following can happen:
5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
5: wlan1: <BROADCAST,MULTICAST,UP>
link/ether
when setting the interface down causes the wext message.
To also fix this, export the wireless_nlevent_flush() function
and also call it from the cfg80211 notifier.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2:
- Add default case in cfg80211_netdev_notifier_call() which bypasses
the added wireless_nlevent_flush() (added upstream by commit
6784c7db8d "cfg80211: change return value of notifier function")
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8bf862739a upstream.
Beniamino reported that he was getting an RTM_NEWLINK message for a
given interface, after the RTM_DELLINK for it. It turns out that the
message is a wireless extensions message, which was sent because the
interface had been connected and disconnection while it was deleted
caused a wext message.
For its netlink messages, wext uses RTM_NEWLINK, but the message is
without all the regular rtnetlink attributes, so "ip monitor link"
prints just rudimentary information:
5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default
link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
Deleted 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff
5: wlan1: <BROADCAST,MULTICAST,UP>
link/ether
(from my hwsim reproduction)
This can cause userspace to get confused since it doesn't expect an
RTM_NEWLINK message after RTM_DELLINK.
The reason for this is that wext schedules a worker to send out the
messages, and the scheduling delay can cause the messages to get out
to userspace in different order.
To fix this, have wext register a netdevice notifier and flush out
any pending messages when netdevice state changes. This fixes any
ordering whenever the original message wasn't sent by a notifier
itself.
Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fda3bec12d upstream.
This is a 32-bit register. Apparently harmless on real hardware, but
causing justified warnings in simulation.
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit a1383cd86a ("crypto: skcipher - Add crypto_skcipher_has_setkey")
was incorrectly backported to the 3.2.y and 3.16.y stable branches.
We need to set ablkcipher_tfm::has_setkey in the
crypto_init_blkcipher_ops_async() and crypto_init_givcipher_ops()
functions as well as crypto_init_ablkcipher_ops().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit c54ddfbb1b, which
was a poorly backported version of commit
6454c2b83f upstream. The small part I
was able to backport makes no sense by itself.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
__sched_setscheduler() may release rq->lock in pull_rt_task() as a task is
being changed rt -> fair class. load balancing may sneak in, move the task
behind __sched_setscheduler()'s back, which explodes in switched_to_fair()
when the passed but no longer valid rq is used. Tell can_migrate_task() to
say no if ->pi_lock is held.
@stable: Kernels that predate SCHED_DEADLINE can use this simple (and tested)
check in lieu of backport of the full 18 patch mainline treatment.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bwh: Backported to 3.2:
- Adjust numbering in the comment
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Willy Tarreau <w@1wt.eu>
Quoting the RHEL advisory:
> It was found that the fix for CVE-2015-1805 incorrectly kept buffer
> offset and buffer length in sync on a failed atomic read, potentially
> resulting in a pipe buffer state corruption. A local, unprivileged user
> could use this flaw to crash the system or leak kernel memory to user
> space. (CVE-2016-0774, Moderate)
The same flawed fix was applied to stable branches from 2.6.32.y to
3.14.y inclusive, and I was able to reproduce the issue on 3.2.y.
We need to give pipe_iov_copy_to_user() a separate offset variable
and only update the buffer offset if it succeeds.
References: https://rhn.redhat.com/errata/RHSA-2016-0103.html
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 759c01142a upstream.
On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.
This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.
The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).
Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 415e3d3e90 upstream.
The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.
To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.
Fixes: 712f4aad40 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 712f4aad40 upstream.
It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.
This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.
Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
[carnil: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07d86ca93d upstream.
The 'umidi' object will be free'd on the error path by snd_usbmidi_free()
when tearing down the rawmidi interface. So we shouldn't try to free it
in snd_usbmidi_create() after having registered the rawmidi interface.
Found by KASAN.
Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bc4ef7592f upstream.
The value of ctx->pos in the last readdir call is supposed to be set to
INT_MAX due to 32bit compatibility, unless 'pos' is intentially set to a
larger value, then it's LLONG_MAX.
There's a report from PaX SIZE_OVERFLOW plugin that "ctx->pos++"
overflows (https://forums.grsecurity.net/viewtopic.php?f=1&t=4284), on a
64bit arch, where the value is 0x7fffffffffffffff ie. LLONG_MAX before
the increment.
We can get to that situation like that:
* emit all regular readdir entries
* still in the same call to readdir, bump the last pos to INT_MAX
* next call to readdir will not emit any entries, but will reach the
bump code again, finds pos to be INT_MAX and sets it to LLONG_MAX
Normally this is not a problem, but if we call readdir again, we'll find
'pos' set to LLONG_MAX and the unconditional increment will overflow.
The report from Victor at
(http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500) with debugging
print shows that pattern:
Overflow: e
Overflow: 7fffffff
Overflow: 7fffffffffffffff
PAX: size overflow detected in function btrfs_real_readdir
fs/btrfs/inode.c:5760 cicus.935_282 max, count: 9, decl: pos; num: 0;
context: dir_context;
CPU: 0 PID: 2630 Comm: polkitd Not tainted 4.2.3-grsec #1
Hardware name: Gigabyte Technology Co., Ltd. H81ND2H/H81ND2H, BIOS F3 08/11/2015
ffffffff81901608 0000000000000000 ffffffff819015e6 ffffc90004973d48
ffffffff81742f0f 0000000000000007 ffffffff81901608 ffffc90004973d78
ffffffff811cb706 0000000000000000 ffff8800d47359e0 ffffc90004973ed8
Call Trace:
[<ffffffff81742f0f>] dump_stack+0x4c/0x7f
[<ffffffff811cb706>] report_size_overflow+0x36/0x40
[<ffffffff812ef0bc>] btrfs_real_readdir+0x69c/0x6d0
[<ffffffff811dafc8>] iterate_dir+0xa8/0x150
[<ffffffff811e6d8d>] ? __fget_light+0x2d/0x70
[<ffffffff811dba3a>] SyS_getdents+0xba/0x1c0
Overflow: 1a
[<ffffffff811db070>] ? iterate_dir+0x150/0x150
[<ffffffff81749b69>] entry_SYSCALL_64_fastpath+0x12/0x83
The jump from 7fffffff to 7fffffffffffffff happens when new dir entries
are not yet synced and are processed from the delayed list. Then the code
could go to the bump section again even though it might not emit any new
dir entries from the delayed list.
The fix avoids entering the "bump" section again once we've finished
emitting the entries, both for synced and delayed entries.
References: https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
Reported-by: Victor <services@swwu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2:
- s/ctx->pos/filp->f_pos/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e972c37459 upstream.
Since the dawn of time the ICST code has only supported divide
by one or hang in an eternal loop. Luckily we were always dividing
by one because the reference frequency for the systems using
the ICSTs is 24MHz and the [min,max] values for the PLL input
if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
will always terminate immediately without assigning any divisor
for the reference frequency.
But for the code to make sense, let's insert the missing i++
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4dff5c7b70 upstream.
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: there's no check for tu->connected to fix up]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed8b1d6d2c upstream.
A slave timer element also unlinks at snd_timer_stop() but it takes
only slave_active_lock. When a slave is assigned to a master,
however, this may become a race against the master's interrupt
handling, eventually resulting in a list corruption. The actual bug
could be seen with a syzkaller fuzzer test case in BugLink below.
As a fix, we need to take timeri->timer->lock when timer isn't NULL,
i.e. assigned to a master, while the assignment to a master itself is
protected by slave_active_lock.
BugLink: http://lkml.kernel.org/r/CACT4Y+Y_Bm+7epAb=8Wi=AaWd+DYS7qawX52qxdCfOfY49vozQ@mail.gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7a84bd4664 upstream.
Commit ed5a377d87 ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.
We fix it by changing hmacids to host order when users get them with
getsockopt.
Fixes: Commit ed5a377d87 ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5070fb14a0 upstream.
When trying to set the ICST 307 clock to 25174000 Hz I ran into
this arithmetic error: the icst_hz_to_vco() correctly figure out
DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of
25174000 Hz out of the VCO. (I replicated the icst_hz() function
in a spreadsheet to verify this.)
However, when I called icst_hz() on these VCO settings it would
instead return 4122709 Hz. This causes an error in the common
clock driver for ICST as the common clock framework will call
.round_rate() on the clock which will utilize icst_hz_to_vco()
followed by icst_hz() suggesting the erroneous frequency, and
then the clock gets set to this.
The error did not manifest in the old clock framework since
this high frequency was only used by the CLCD, which calls
clk_set_rate() without first calling clk_round_rate() and since
the old clock framework would not call clk_round_rate() before
setting the frequency, the correct values propagated into
the VCO.
After some experimenting I figured out that it was due to a simple
arithmetic overflow: the divisor for 24Mhz reference frequency
as reference becomes 24000000*2*(99+8)=0x132212400 and the "1"
in bit 32 overflows and is lost.
But introducing an explicit 64-by-32 bit do_div() and casting
the divisor into (u64) we get the right frequency back, and the
right frequency gets set.
Tested on the ARM Versatile.
Cc: linux-clk@vger.kernel.org
Cc: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ddce57a6f0 upstream.
Currently the selected timer backend is referred at any moment from
the running PCM callbacks. When the backend is switched, it's
possible to lead to inconsistency from the running backend. This was
pointed by syzkaller fuzzer, and the commit [7ee96216c3: ALSA:
dummy: Disable switching timer backend via sysfs] disabled the dynamic
switching for avoiding the crash.
This patch improves the handling of timer backend switching. It keeps
the reference to the selected backend during the whole operation of an
opened stream so that it won't be changed by other streams.
Together with this change, the hrtimer parameter is reenabled as
writable now.
NOTE: this patch also turned out to fix the still remaining race.
Namely, ops was still replaced dynamically at dummy_pcm_open:
static int dummy_pcm_open(struct snd_pcm_substream *substream)
{
....
dummy->timer_ops = &dummy_systimer_ops;
if (hrtimer)
dummy->timer_ops = &dummy_hrtimer_ops;
Since dummy->timer_ops is common among all streams, and when the
replacement happens during accesses of other streams, it may lead to a
crash. This was actually triggered by syzkaller fuzzer and KASAN.
This patch rewrites the code not to use the ops shared by all streams
any longer, too.
BugLink: http://lkml.kernel.org/r/CACT4Y+aZ+xisrpuM6cOXbL21DuM0yVxPYXf4cD4Md9uw0C3dBQ@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00cd29b799 upstream.
The starting node for a klist iteration is often passed in from
somewhere way above the klist infrastructure, meaning there's no
guarantee the node is still on the list. We've seen this in SCSI where
we use bus_find_device() to iterate through a list of devices. In the
face of heavy hotplug activity, the last device returned by
bus_find_device() can be removed before the next call. This leads to
Dec 3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec 3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec 3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
Dec 3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec 3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
Dec 3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
Dec 3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
Dec 3 13:22:02 localhost kernel: Call Trace:
Dec 3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec 3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec 3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec 3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec 3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
Dec 3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
Dec 3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
[...]
And an eventual crash. It can actually occur in any hotplug system
which has a device finder and a starting device.
We can fix this globally by making sure the starting node for
klist_iter_init_node() is actually a member of the list before using it
(and by starting from the beginning if it isn't).
Reported-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6454c2b83f upstream.
Any access to non-constant bits of the private context must be
done under the socket lock, in particular, this includes ctx->req.
This patch moves such accesses under the lock, and fetches the
tfm from the parent socket which is guaranteed to be constant,
rather than from ctx->req.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2:
- Drop changes to skcipher_recvmsg_async
- s/skcipher/ablkcipher/ in many places
- s/skc->skcipher/skc->base/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 564e81a57f upstream.
Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM. Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.
According to commit 373ccbe592 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1). We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.
Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.
Fixes: 373ccbe592 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d2d06d4fe0 upstream.
If MODE SELECT returns with sense '05/91/36' (command lock violation)
it should always be retried without counting the number of retries.
During an HBA upgrade or similar circumstances one might see a flood
of MODE SELECT command from various HBAs, which will easily trigger
the sense code and exceed the retry count.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 094fd3be87 upstream.
In ALSA timer core, the active timer instance is managed in
active_list linked list. Each element is added / removed dynamically
at timer start, stop and in timer interrupt. The problem is that
snd_timer_interrupt() has a thinko and leaves the element in
active_list when it's the last opened element. This eventually leads
to list corruption or use-after-free error.
This hasn't been revealed because we used to delete the list forcibly
in snd_timer_stop() in the past. However, the recent fix avoids the
double-stop behavior (in commit [f784beb75c: ALSA: timer: Fix link
corruption due to double start or stop]), and this leak hits reality.
This patch fixes the link management in snd_timer_interrupt(). Now it
simply unlinks no matter which stream is.
BugLink: http://lkml.kernel.org/r/CACT4Y+Yy2aukHP-EDp8-ziNqNNmb-NTf=jDWXMP7jB8HDa2vng@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8beb02343 upstream.
The tda1004x was updating the properties cache before locking.
If the device is not locked, the data at the registers are just
random values with no real meaning.
This caused the driver to fail with libdvbv5, as such library
calls GET_PROPERTY from time to time, in order to return the
DVB stats.
Tested with a saa7134 card 78:
ASUSTeK P7131 Dual, vendor PCI ID: 1043:4862
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c82171167 upstream.
xhci driver frees data for all devices, both usb2 and and usb3 the
first time usb_remove_hcd() is called, including td_list and and xhci_ring
structures.
When usb_remove_hcd() is called a second time for the second xhci bus it
will try to dequeue all pending urbs, and touches td_list which is already
freed for that endpoint.
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a683509071 upstream.
This reverts commit e210c422b6 ("xhci: don't finish a TD if we get a
short transfer event mid TD")
Turns out that most host controllers do not follow the xHCI specs and never
send the second event for the last TRB in the TD if there was a short event
mid-TD.
Returning the URB directly after the first short-transfer event is far
better than never returning the URB. (class drivers usually timeout
after 30sec). For the hosts that do send the second event we will go
back to treating it as misplaced event and print an error message for it.
The origial patch was sent to stable kernels and needs to be reverted from
there as well
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f0973e973 upstream.
The port subscription code uses double mutex locks for source and
destination ports, and this may become racy once when wrongly set up.
It leads to lockdep warning splat, typically triggered by fuzzer like
syzkaller, although the actual deadlock hasn't been seen, so far.
This patch simplifies the handling by reducing to two single locks, so
that no lockdep warning will be trigger any longer.
By splitting to two actions, a still-in-progress element shall be
added in one list while handling another. For ignoring this element,
a new check is added in deliver_to_subscribers().
Along with it, the code to add/remove the subscribers list element was
cleaned up and refactored.
BugLink: http://lkml.kernel.org/r/CACT4Y+aKQXV7xkBW9hpQbzaDO7LrUvohxWh-UwMxXjDy-yBD=A@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 81f577542a upstream.
The rawmidi read and write functions manage runtime stream status
such as runtime->appl_ptr and runtime->avail. These point where to
copy the new data and how many bytes have been copied (or to be
read). The problem is that rawmidi read/write call copy_from_user()
or copy_to_user(), and the runtime spinlock is temporarily unlocked
and relocked while copying user-space. Since the current code
advances and updates the runtime status after the spin unlock/relock,
the copy and the update may be asynchronous, and eventually
runtime->avail might go to a negative value when many concurrent
accesses are done. This may lead to memory corruption in the end.
For fixing this race, in this patch, the status update code is
performed in the same lock before the temporary unlock. Also, the
spinlock is now taken more widely in snd_rawmidi_kernel_read1() for
protecting more properly during the whole operation.
BugLink: http://lkml.kernel.org/r/CACT4Y+b-dCmNf1GpgPKfDO0ih+uZCL2JV4__j-r1kdhPLSgQCQ@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06ab30034e upstream.
A kernel WARNING in snd_rawmidi_transmit_ack() is triggered by
syzkaller fuzzer:
WARNING: CPU: 1 PID: 20739 at sound/core/rawmidi.c:1136
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<ffffffff84f80bd5>] snd_rawmidi_transmit_ack+0x275/0x400 sound/core/rawmidi.c:1136
[<ffffffff84fdb3c1>] snd_virmidi_output_trigger+0x4b1/0x5a0 sound/core/seq/seq_virmidi.c:163
[< inline >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
[<ffffffff84f87ed9>] snd_rawmidi_kernel_write1+0x549/0x780 sound/core/rawmidi.c:1223
[<ffffffff84f89fd3>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1273
[<ffffffff817b0323>] __vfs_write+0x113/0x480 fs/read_write.c:528
[<ffffffff817b1db7>] vfs_write+0x167/0x4a0 fs/read_write.c:577
[< inline >] SYSC_write fs/read_write.c:624
[<ffffffff817b50a1>] SyS_write+0x111/0x220 fs/read_write.c:616
[<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
Also a similar warning is found but in another path:
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82be2c0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff81355139>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
[<ffffffff81355369>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
[<ffffffff8527e69a>] rawmidi_transmit_ack+0x24a/0x3b0 sound/core/rawmidi.c:1133
[<ffffffff8527e851>] snd_rawmidi_transmit_ack+0x51/0x80 sound/core/rawmidi.c:1163
[<ffffffff852d9046>] snd_virmidi_output_trigger+0x2b6/0x570 sound/core/seq/seq_virmidi.c:185
[< inline >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
[<ffffffff85285a0b>] snd_rawmidi_kernel_write1+0x4bb/0x760 sound/core/rawmidi.c:1252
[<ffffffff85287b73>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1302
[<ffffffff817ba5f3>] __vfs_write+0x113/0x480 fs/read_write.c:528
[<ffffffff817bc087>] vfs_write+0x167/0x4a0 fs/read_write.c:577
[< inline >] SYSC_write fs/read_write.c:624
[<ffffffff817bf371>] SyS_write+0x111/0x220 fs/read_write.c:616
[<ffffffff86660276>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
In the former case, the reason is that virmidi has an open code
calling snd_rawmidi_transmit_ack() with the value calculated outside
the spinlock. We may use snd_rawmidi_transmit() in a loop just for
consuming the input data, but even there, there is a race between
snd_rawmidi_transmit_peek() and snd_rawmidi_tranmit_ack().
Similarly in the latter case, it calls snd_rawmidi_transmit_peek() and
snd_rawmidi_tranmit_ack() separately without protection, so they are
racy as well.
The patch tries to address these issues by the following ways:
- Introduce the unlocked versions of snd_rawmidi_transmit_peek() and
snd_rawmidi_transmit_ack() to be called inside the explicit lock.
- Rewrite snd_rawmidi_transmit() to be race-free (the former case).
- Make the split calls (the latter case) protected in the rawmidi spin
lock.
BugLink: http://lkml.kernel.org/r/CACT4Y+YPq1+cYLkadwjWa5XjzF1_Vki1eHnVn-Lm0hzhSpu5PA@mail.gmail.com
BugLink: http://lkml.kernel.org/r/CACT4Y+acG4iyphdOZx47Nyq_VHGbpJQK-6xNpiqUjaZYqsXOGw@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8eee1d3ed5 upstream.
The bulk of ATA host state machine is implemented by
ata_sff_hsm_move(). The function is called from either the interrupt
handler or, if polling, a work item. Unlike from the interrupt path,
the polling path calls the function without holding the host lock and
ata_sff_hsm_move() selectively grabs the lock.
This is completely broken. If an IRQ triggers while polling is in
progress, the two can easily race and end up accessing the hardware
and updating state machine state at the same time. This can put the
state machine in an illegal state and lead to a crash like the
following.
kernel BUG at drivers/ata/libata-sff.c:1302!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 1 PID: 10679 Comm: syz-executor Not tainted 4.5.0-rc1+ #300
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88002bd00000 ti: ffff88002e048000 task.ti: ffff88002e048000
RIP: 0010:[<ffffffff83a83409>] [<ffffffff83a83409>] ata_sff_hsm_move+0x619/0x1c60
...
Call Trace:
<IRQ>
[<ffffffff83a84c31>] __ata_sff_port_intr+0x1e1/0x3a0 drivers/ata/libata-sff.c:1584
[<ffffffff83a85611>] ata_bmdma_port_intr+0x71/0x400 drivers/ata/libata-sff.c:2877
[< inline >] __ata_sff_interrupt drivers/ata/libata-sff.c:1629
[<ffffffff83a85bf3>] ata_bmdma_interrupt+0x253/0x580 drivers/ata/libata-sff.c:2902
[<ffffffff81479f98>] handle_irq_event_percpu+0x108/0x7e0 kernel/irq/handle.c:157
[<ffffffff8147a717>] handle_irq_event+0xa7/0x140 kernel/irq/handle.c:205
[<ffffffff81484573>] handle_edge_irq+0x1e3/0x8d0 kernel/irq/chip.c:623
[< inline >] generic_handle_irq_desc include/linux/irqdesc.h:146
[<ffffffff811a92bc>] handle_irq+0x10c/0x2a0 arch/x86/kernel/irq_64.c:78
[<ffffffff811a7e4d>] do_IRQ+0x7d/0x1a0 arch/x86/kernel/irq.c:240
[<ffffffff86653d4c>] common_interrupt+0x8c/0x8c arch/x86/entry/entry_64.S:520
<EOI>
[< inline >] rcu_lock_acquire include/linux/rcupdate.h:490
[< inline >] rcu_read_lock include/linux/rcupdate.h:874
[<ffffffff8164b4a1>] filemap_map_pages+0x131/0xba0 mm/filemap.c:2145
[< inline >] do_fault_around mm/memory.c:2943
[< inline >] do_read_fault mm/memory.c:2962
[< inline >] do_fault mm/memory.c:3133
[< inline >] handle_pte_fault mm/memory.c:3308
[< inline >] __handle_mm_fault mm/memory.c:3418
[<ffffffff816efb16>] handle_mm_fault+0x2516/0x49a0 mm/memory.c:3447
[<ffffffff8127dc16>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
[<ffffffff8127e358>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
[<ffffffff8126f514>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
[<ffffffff86655578>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
Fix it by ensuring that the polling path is holding the host lock
before entering ata_sff_hsm_move() so that all hardware accesses and
state updates are performed under the host lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/g/CACT4Y+b_JsOxJu2EZyEf+mOXORc_zid5V1-pLZSroJVxyWdSpw@mail.gmail.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f784beb75c upstream.
Although ALSA timer code got hardening for races, it still causes
use-after-free error. This is however rather a corrupted linked list,
not actually the concurrent accesses. Namely, when timer start is
triggered twice, list_add_tail() is called twice, too. This ends
up with the link corruption and triggers KASAN error.
The simplest fix would be replacing list_add_tail() with
list_move_tail(), but fundamentally it's the problem that we don't
check the double start/stop correctly. So, the right fix here is to
add the proper checks to snd_timer_start() and snd_timer_stop() (and
their variants).
BugLink: http://lkml.kernel.org/r/CACT4Y+ZyPRoMQjmawbvmCEDrkBD2BQuH7R09=eOkf5ESK8kJAw@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2cdc7b636d upstream.
ALSA sequencer may open/close and control ALSA timer instance
dynamically either via sequencer events or direct ioctls. These are
done mostly asynchronously, and it may call still some timer action
like snd_timer_start() while another is calling snd_timer_close().
Since the instance gets removed by snd_timer_close(), it may lead to
a use-after-free.
This patch tries to address such a race by protecting each
snd_timer_*() call via the existing spinlock and also by avoiding the
access to timer during close call.
BugLink: http://lkml.kernel.org/r/CACT4Y+Z6RzW5MBr-HUdV-8zwg71WQfKTdPpYGvOeS7v4cyurNQ@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b248371628 upstream.
There are potential deadlocks in PCM OSS emulation code while
accessing read/write and mmap concurrently. This comes from the
infamous mmap_sem usage in copy_from/to_user(). Namely,
snd_pcm_oss_write() ->
&runtime->oss.params_lock ->
copy_to_user() ->
&mm->mmap_sem
mmap() ->
&mm->mmap_sem ->
snd_pcm_oss_mmap() ->
&runtime->oss.params_lock
Since we can't avoid taking params_lock from mmap code path, use
trylock variant and aborts with -EAGAIN as a workaround of this AB/BA
deadlock.
BugLink: http://lkml.kernel.org/r/CACT4Y+bVrBKDG0G2_AcUgUQa+X91VKTeS4v+wN7BSHwHtqn3kQ@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1d353ad3d upstream.
"count" is controlled by the user and it can be negative. Let's prevent
that by making it unsigned. You have to have CAP_SYS_RAWIO to call this
function so the bug is not as serious as it could be.
Fixes: 5369c02d95 ('intel_scu_ipc: Utility driver for intel scu ipc')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 742563777e upstream.
There are a couple of nasty truncation bugs lurking in the pageattr
code that can be triggered when mapping EFI regions, e.g. when we pass
a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
left by PAGE_SHIFT will truncate the resultant address to 32-bits.
Viorel-Cătălin managed to trigger this bug on his Dell machine that
provides a ~5GB EFI region which requires 1236992 pages to be mapped.
When calling populate_pud() the end of the region gets calculated
incorrectly in the following buggy expression,
end = start + (cpa->numpages << PAGE_SHIFT);
And only 188416 pages are mapped. Next, populate_pud() gets invoked
for a second time because of the loop in __change_page_attr_set_clr(),
only this time no pages get mapped because shifting the remaining
number of pages (1048576) by PAGE_SHIFT is zero. At which point the
loop in __change_page_attr_set_clr() spins forever because we fail to
map progress.
Hitting this bug depends very much on the virtual address we pick to
map the large region at and how many pages we map on the initial run
through the loop. This explains why this issue was only recently hit
with the introduction of commit
a5caa209ba ("x86/efi: Fix boot crash by mapping EFI memmap
entries bottom-up at runtime, instead of top-down")
It's interesting to note that safe uses of cpa->numpages do exist in
the pageattr code. If instead of shifting ->numpages we multiply by
PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
so the result is unsigned long.
To avoid surprises when users try to convert very large cpa->numpages
values to addresses, change the data type from 'int' to 'unsigned
long', thereby making it suitable for shifting by PAGE_SHIFT without
any type casting.
The alternative would be to make liberal use of casting, but that is
far more likely to cause problems in the future when someone adds more
code and fails to cast properly; this bug was difficult enough to
track down in the first place.
Reported-and-tested-by: Viorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7ee96216c3 upstream.
ALSA dummy driver can switch the timer backend between system timer
and hrtimer via its hrtimer module option. This can be also switched
dynamically via sysfs, but it may lead to a memory corruption when
switching is done while a PCM stream is running; the stream instance
for the newly switched timer method tries to access the memory that
was allocated by another timer method although the sizes differ.
As the simplest fix, this patch just disables the switch via sysfs by
dropping the writable bit.
BugLink: http://lkml.kernel.org/r/CACT4Y+ZGEeEBntHW5WHn2GoeE0G_kRrCmUh6=dWyy-wfzvuJLg@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00420a65fa upstream.
The has_key logic is wrong for shash algorithms as they always
have a setkey function. So we should instead be testing against
shash_no_setkey.
Fixes: a5596d6332 ("crypto: hash - Add crypto_ahash_has_setkey")
Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c17c861a3 upstream.
ioctl(TIOCGETD) retrieves the line discipline id directly from the
ldisc because the line discipline id (c_line) in termios is untrustworthy;
userspace may have set termios via ioctl(TCSETS*) without actually
changing the line discipline via ioctl(TIOCSETD).
However, directly accessing the current ldisc via tty->ldisc is
unsafe; the ldisc ptr dereferenced may be stale if the line discipline
is changing via ioctl(TIOCSETD) or hangup.
Wait for the line discipline reference (just like read() or write())
to retrieve the "current" line discipline id.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13b4389143 upstream.
Runtime suspend during driver probe and removal can cause problems.
The driver's runtime_suspend or runtime_resume callbacks may invoked
before the driver has finished binding to the device or after the
driver has unbound from the device.
This problem shows up with the sd and sr drivers, and can cause disk
or CD/DVD drives to become unusable as a result. The fix is simple.
The drivers store a pointer to the scsi_disk or scsi_cd structure as
their private device data when probing is finished, so we simply have
to be sure to clear the private data during removal and test it during
runtime suspend/resume.
This fixes <https://bugs.debian.org/801925>.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Paul Menzel <paul.menzel@giantmonkey.de>
Reported-by: Erich Schubert <erich@debian.org>
Reported-by: Alexandre Rossi <alexandre.rossi@gmail.com>
Tested-by: Paul Menzel <paul.menzel@giantmonkey.de>
Tested-by: Erich Schubert <erich@debian.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[bwh: Backported to 3.2: drop changes to sr as it doesn't support runtime PM]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6736fde967 upstream.
The code within wait_event_interruptible() is called with
!TASK_RUNNING, so mustn't call any functions that can sleep,
like mutex_lock().
Since we re-check the list_empty() in a loop after the wait,
it's safe to simply use list_empty() without locking.
This bug has existed forever, but was only discovered now
because all userspace implementations, including the default
'rfkill' tool, use poll() or select() to get a readable fd
before attempting to read.
Fixes: c64fb01627 ("rfkill: create useful userspace interface")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2989be09a8 upstream.
KASan detected a use-after-free error in virtio-pci remove code. In
virtio_pci_remove(), vp_dev is still used after being freed in
unregister_virtio_device() (in virtio_pci_release_dev() more
precisely).
To fix, keep a reference until cleanup is done.
Fixes: 63bd62a08c ("virtio_pci: defer kfree until release callback")
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ae2182b1e upstream.
A Root Port's AER structure (rpc) contains a queue of events. aer_irq()
enqueues AER status information and schedules aer_isr() to dequeue and
process it. When we remove a device, aer_remove() waits for the queue to
be empty, then frees the rpc struct.
But aer_isr() references the rpc struct after dequeueing and possibly
emptying the queue, which can cause a use-after-free error as in the
following scenario with two threads, aer_isr() on the left and a
concurrent aer_remove() on the right:
Thread A Thread B
-------- --------
aer_irq():
rpc->prod_idx++
aer_remove():
wait_event(rpc->prod_idx == rpc->cons_idx)
# now blocked until queue becomes empty
aer_isr(): # ...
rpc->cons_idx++ # unblocked because queue is now empty
... kfree(rpc)
mutex_unlock(&rpc->rpc_mutex)
To prevent this problem, use flush_work() to wait until the last scheduled
instance of aer_isr() has completed before freeing the rpc struct in
aer_remove().
I reproduced this use-after-free by flashing a device FPGA and
re-enumerating the bus to find the new device. With SLUB debug, this
crashes with 0x6b bytes (POISON_FREE, the use-after-free magic number) in
GPR25:
pcieport 0000:00:00.0: AER: Multiple Corrected error received: id=0000
Unable to handle kernel paging request for data at address 0x27ef9e3e
Workqueue: events aer_isr
GPR24: dd6aa000 6b6b6b6b 605f8378 605f8360 d99b12c0 604fc674 606b1704 d99b12c0
NIP [602f5328] pci_walk_bus+0xd4/0x104
[bhelgaas: changelog, stable tag]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e03cdf22a2 upstream.
Harald Linden reports that the ftdi_sio driver works properly for the
Yaesu SCU-18 cable if the device ids are added to the driver. So let's
add them.
Reported-by: Harald Linden <harald.linden@7183.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da10816e3d upstream.
ALSA OSS sequencer spews a kernel error message ("ALSA: seq_oss: too
many applications") when user-space tries to open more than the
limit. This means that it can easily fill the log buffer.
Since it's merely a normal error, it's safe to suppress it via
pr_debug() instead.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: this was still using snd_printk()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5991513366 upstream.
ALSA sequencer OSS emulation code has a sanity check for currently
opened devices, but there is a thinko there, eventually it spews
warnings and skips the operation wrongly like:
WARNING: CPU: 1 PID: 7573 at sound/core/seq/oss/seq_oss_synth.c:311
Fix this off-by-one error.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb3232138e upstream.
The visor driver crashes in clie_5_attach() when a specially crafted USB
device without bulk-out endpoint is detected. This fix adds a check that
the device has proper configuration expected by the driver.
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Fixes: cfb8da8f69 ("USB: visor: fix initialisation of UX50/TH55 devices")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cac9b50b0d upstream.
Fix null-pointer dereference at probe should a (malicious) Treo device
lack the expected endpoints.
Specifically, the Treo port-setup hack was dereferencing the bulk-in and
interrupt-in urbs without first making sure they had been allocated by
core.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27f7ed2b11 upstream.
This patch extends commit b93d647174 ("sctp: implement the sender side
for SACK-IMMEDIATELY extension") as it didn't white list
SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
understood as an invalid flag and returning -EINVAL to the application.
Note that the actual handling of the flag is already there in
sctp_datamsg_from_user().
https://tools.ietf.org/html/rfc7053#section-7
Fixes: b93d647174 ("sctp: implement the sender side for SACK-IMMEDIATELY extension")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: drop the second hunk as we don't have SCTP_SNDINFO
cmsg support]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9a368aff9c upstream.
Several times already this has been reported as kasan reports caused by
syzkaller and trinity and people always looked at RCU races, but it is
much more simple. :)
In case we bind a pptp socket multiple times, we simply add it to
the callid_sock list but don't remove the old binding. Thus the old
socket stays in the bucket with unused call_id indexes and doesn't get
cleaned up. This causes various forms of kasan reports which were hard
to pinpoint.
Simply don't allow multiple binds and correct error handling in
pptp_bind. Also keep sk_state bits in place in pptp_connect.
Fixes: 00959ade36 ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
Cc: Dmitry Kozlov <xeb@mail.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa0dc04df2 upstream.
Dmitry reported a struct pid leak detected by a syzkaller program.
Bug happens in unix_stream_recvmsg() when we break the loop when a
signal is pending, without properly releasing scm.
Fixes: b3ca9b02b0 ("net: fix multithreaded signal handling in unix recv routines")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: cookie is *sicob->scm, not scm]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e912e685f3 upstream.
This phone needs to be handled by a specialised firmware tool
and is reported to crash irrevocably if cdc-acm takes it.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ffdb1e369a upstream.
For Intel 7260 modem, it is needed for host side to send zero
packet if the BULK OUT size is equal to USB endpoint max packet
length. Otherwise, modem side may still wait for more data and
cannot give response to host side.
Signed-off-by: Konrad Leszczynski <konrad.leszczynski@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 203cbf77de upstream.
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.
Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.
Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
- Use #ifdef instead of IS_ENABLED() as that doesn't work for config
symbols that don't exist on the current architecture
- Use KTIME_LOW_RES directly instead of hrtimer_resolution
- Use ktime_sub() instead of modifying ktime::tv64 directly
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa52bd506f upstream.
The usbvision driver crashes when a specially crafted usb device with invalid
number of interfaces or endpoints is detected. This fix adds checks that the
device has proper configuration expected by the driver.
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backport to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit afd270d1a4 upstream.
There is no usb_put_dev() on failure paths in usbvision_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 090c65b694 upstream.
1. usbvision->alt_max_pkt_size is not deallocated anywhere.
2. if allocation of usbvision->alt_max_pkt_size fails,
there is no proper deallocation of already acquired resources.
The patch adds kfree(usbvision->alt_max_pkt_size) to
usbvision_release() as soon as other deallocations happen there.
It calls usbvision_release() if allocation of
usbvision->alt_max_pkt_size fails as soon as usbvision_release()
is safe to work with incompletely initialized usbvision structure.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 635682a144 upstream.
A case can occur when sctp_accept() is called by the user during
a heartbeat timeout event after the 4-way handshake. Since
sctp_assoc_migrate() changes both assoc->base.sk and assoc->ep, the
bh_sock_lock in sctp_generate_heartbeat_event() will be taken with
the listening socket but released with the new association socket.
The result is a deadlock on any future attempts to take the listening
socket lock.
Note that this race can occur with other SCTP timeouts that take
the bh_lock_sock() in the event sctp_accept() is called.
BUG: soft lockup - CPU#9 stuck for 67s! [swapper:0]
...
RIP: 0010:[<ffffffff8152d48e>] [<ffffffff8152d48e>] _spin_lock+0x1e/0x30
RSP: 0018:ffff880028323b20 EFLAGS: 00000206
RAX: 0000000000000002 RBX: ffff880028323b20 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880028323be0 RDI: ffff8804632c4b48
RBP: ffffffff8100bb93 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880610662280 R11: 0000000000000100 R12: ffff880028323aa0
R13: ffff8804383c3880 R14: ffff880028323a90 R15: ffffffff81534225
FS: 0000000000000000(0000) GS:ffff880028320000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006df528 CR3: 0000000001a85000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880616b70000, task ffff880616b6cab0)
Stack:
ffff880028323c40 ffffffffa01c2582 ffff880614cfb020 0000000000000000
<d> 0100000000000000 00000014383a6c44 ffff8804383c3880 ffff880614e93c00
<d> ffff880614e93c00 0000000000000000 ffff8804632c4b00 ffff8804383c38b8
Call Trace:
<IRQ>
[<ffffffffa01c2582>] ? sctp_rcv+0x492/0xa10 [sctp]
[<ffffffff8148c559>] ? nf_iterate+0x69/0xb0
[<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148c716>] ? nf_hook_slow+0x76/0x120
[<ffffffff814974a0>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8149757d>] ? ip_local_deliver_finish+0xdd/0x2d0
[<ffffffff81497808>] ? ip_local_deliver+0x98/0xa0
[<ffffffff81496ccd>] ? ip_rcv_finish+0x12d/0x440
[<ffffffff81497255>] ? ip_rcv+0x275/0x350
[<ffffffff8145cfeb>] ? __netif_receive_skb+0x4ab/0x750
...
With lockdep debugging:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
CslRx/12087 is trying to release lock (slock-AF_INET) at:
[<ffffffffa01bcae0>] sctp_generate_timeout_event+0x40/0xe0 [sctp]
but there are no more locks to release!
other info that might help us debug this:
2 locks held by CslRx/12087:
#0: (&asoc->timers[i]){+.-...}, at: [<ffffffff8108ce1f>] run_timer_softirq+0x16f/0x3e0
#1: (slock-AF_INET){+.-...}, at: [<ffffffffa01bcac3>] sctp_generate_timeout_event+0x23/0xe0 [sctp]
Ensure the socket taken is also the same one that is released by
saving a copy of the socket before entering the timeout event
critical section.
Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Net namespaces are not used
- Keep using sctp_bh_{,un}lock_sock()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4f0414e54e upstream.
We need to load the TX SG list in sendmsg(2) after waiting for
incoming data, not before.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1822793a52 upstream.
We need to lock the child socket in skcipher_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad46d7e332 upstream.
We need to lock the child socket in hash_check_key as otherwise
two simultaneous calls can cause the parent socket to be freed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6a48c565f upstream.
This patch forbids the calling of bind(2) when there are child
sockets created by accept(2) in existence, even if they are created
on the nokey path.
This is needed as those child sockets have references to the tfm
object which bind(2) will destroy.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d7b65aee1e upstream.
This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f1d84af183 upstream.
This patch removes the custom release parent function as the
generic af_alg_release_parent now works for nokey sockets too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e8d8ecf43 upstream.
This patch adds an exception to the key check so that cipher_null
users may continue to use algif_skcipher without setting a key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: use crypto_ablkcipher_has_setkey()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1383cd86a upstream.
This patch adds a way for skcipher users to determine whether a key
is required by a transform.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: add to ablkcipher API instead]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6de62f15b5 upstream.
Hash implementations that require a key may crash if you use
them without setting a key. This patch adds the necessary checks
so that if you do attempt to use them without a key that we return
-ENOKEY instead of proceeding.
This patch also adds a compatibility path to support old applications
that do acept(2) before setkey.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2:
- Add struct kiocb * parameter to {recv,send}msg ops
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a5596d6332 upstream.
This patch adds a way for ahash users to determine whether a key
is required by a crypto_ahash transform.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0fa2d0371 upstream.
This patch adds a compatibility path to support old applications
that do acept(2) before setkey.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: add struct kiocb * parameter to {recv,send}msg ops]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37766586c9 upstream.
This patch adds a compatibility path to support old applications
that do acept(2) before setkey.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c840ac6af3 upstream.
Each af_alg parent socket obtained by socket(2) corresponds to a
tfm object once bind(2) has succeeded. An accept(2) call on that
parent socket creates a context which then uses the tfm object.
Therefore as long as any child sockets created by accept(2) exist
the parent socket must not be modified or freed.
This patch guarantees this by using locks and a reference count
on the parent socket. Any attempt to modify the parent socket will
fail with EBUSY.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dd50458957 upstream.
Some cipher implementations will crash if you try to use them
without calling setkey first. This patch adds a check so that
the accept(2) call will fail with -ENOKEY if setkey hasn't been
done on the socket yet.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
[bwh: Backported to 3.2: s/crypto_(alloc_|free_)?skcipher/crypto_\1ablkcipher/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1b1e15ef6 upstream.
NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
the hang is the global bit map inode lock. Node 1 is master, has the
lock granted in PR mode; Node 2 is in the converting list (PR -> EX).
There are no holders of the lock on the master node so it should
downconvert to NL and grant EX to node 2 but that does not happen.
BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
Threads are waiting in __ocfs2_cluster_lock on BLOCKED. One thread
wants EX, rest want PR. So it is as though the downconvert thread needs
to be kicked to complete the conv.
The hang is caused by an EX req coming into __ocfs2_cluster_lock on the
heels of a PR req after it sets BUSY (drops l_lock, releasing EX
thread), forcing the incoming EX to wait on BUSY without doing anything.
PR has called ocfs2_dlm_lock, which sets the node 1 lock from NL -> PR,
queues ast.
At this time, upconvert (PR ->EX) arrives from node 2, finds conflict
with node 1 lock in PR, so the lock res is put on dlm thread's dirty
listt.
After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
awoken by ast.
Now it is dlm_thread that serially runs dlm_shuffle_lists, ast, bast, in
that order. dlm_shuffle_lists ques a bast on behalf of node 2 (which
will be run by dlm_thread right after the ast). ast does its part, sets
UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
dlm_thread runs bast. It sets BLOCKED and kicks dc thread. dc thread
runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
anything and reques.
Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
of PR, it wakes up first, finds BLOCKED set and skips doing anything but
clearing UPCONVERT_FINISHING (which was actually "meant" for the PR
thread), and this time waits on BLOCKED. Next, the PR thread comes out
of wait but since UPCONVERT_FINISHING is not set, it skips updating the
l_ro_holders and goes straight to wait on BLOCKED. So there, we have a
hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock res in osb
blocked list. Only when dc thread is awoken, it will run
ocfs2_unblock_lock and things will unhang.
One way to fix this is to wake the dc thread on the flag after clearing
UPCONVERT_FINISHING
Orabug: 20933419
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e40816734 upstream.
Hop limit value wasn't copied from attributes when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.
Fixes: fa417f7b52 ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0bcdbdff3 upstream.
When a TLV ioctl with numid zero is handled, the driver may spew a
kernel warning with a stack trace at each call. The check was
intended obviously only for a kernel driver, but not for a user
interaction. Let's fix it.
This was spotted by syzkaller fuzzer.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9586495dc3 upstream.
This reverts one hunk of
commit ef44a1ec6e ("ALSA: sound/core: use memdup_user()"), which
replaced a number of kmalloc followed by memcpy with memdup calls.
In this case, we are copying from a struct snd_seq_port_info32 to a
struct snd_seq_port_info, but the latter is 4 bytes longer than the
32-bit version, so we need to separate kmalloc and copy calls.
Fixes: ef44a1ec6e ('ALSA: sound/core: use memdup_user()')
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43c54b8c7c upstream.
This reverts one hunk of
commit ef44a1ec6e ("ALSA: sound/core: use memdup_user()"), which
replaced a number of kmalloc followed by memcpy with memdup calls.
In this case, we are copying from a struct snd_pcm_hw_params32 to
a struct snd_pcm_hw_params, but the latter is 4 bytes longer than
the 32-bit version, so we need to separate kmalloc and copy calls.
This actually leads to an out-of-bounds memory access later on
in sound/soc/soc-pcm.c:soc_pcm_hw_params() (detected using KASan).
Fixes: ef44a1ec6e ('ALSA: sound/core: use memdup_user()')
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ba1fe7a06 upstream.
hrtimer_cancel() waits for the completion from the callback, thus it
must not be called inside the callback itself. This was already a
problem in the past with ALSA hrtimer driver, and the early commit
[fcfdebe707: ALSA: hrtimer - Fix lock-up] tried to address it.
However, the previous fix is still insufficient: it may still cause a
lockup when the ALSA timer instance reprograms itself in its callback.
Then it invokes the start function even in snd_timer_interrupt() that
is called in hrtimer callback itself, results in a CPU stall. This is
no hypothetical problem but actually triggered by syzkaller fuzzer.
This patch tries to fix the issue again. Now we call
hrtimer_try_to_cancel() at both start and stop functions so that it
won't fall into a deadlock, yet giving some chance to cancel the queue
if the functions have been called outside the callback. The proper
hrtimer_cancel() is called in anyway at closing, so this should be
enough.
Reported-and-tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a383292c86 upstream.
When we fail an accept(2) call we will end up freeing the socket
twice, once due to the direct sk_free call and once again through
newsock.
This patch fixes this by removing the sk_free call.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fe22cd9b7c upstream.
Currently, pr_debug and pr_devel will not elide function call arguments
appearing in calls to the no_printk for these macros. This is because
all side effects must be honored before proceeding to the 0-value
assignment in no_printk.
The behavior is contrary to documentation found in the CodingStyle and
the header file where these functions are declared.
This patch corrects that behavior by shunting out the call to no_printk
completely. The format string is still checked by gcc for correctness,
but no code seems to be emitted in common cases.
[akpm@linux-foundation.org: remove braces, per Joe]
Fixes: 5264f2f75d ("include/linux/printk.h: use and neaten no_printk")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6611d8d761 upstream.
A spare array holding mem cgroup threshold events is kept around to make
sure we can always safely deregister an event and have an array to store
the new set of events in.
In the scenario where we're going from 1 to 0 registered events, the
pointer to the primary array containing 1 event is copied to the spare
slot, and then the spare slot is freed because no events are left.
However, it is freed before calling synchronize_rcu(), which means
readers may still be accessing threshold->primary after it is freed.
Fixed by only freeing after synchronize_rcu().
Signed-off-by: Martijn Coenen <maco@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b5a663aa42 upstream.
A slave timer instance might be still accessible in a racy way while
operating the master instance as it lacks of locking. Since the
master operation is mostly protected with timer->lock, we should cope
with it while changing the slave instance, too. Also, some linked
lists (active_list and ack_list) of slave instances aren't unlinked
immediately at stopping or closing, and this may lead to unexpected
accesses.
This patch tries to address these issues. It adds spin lock of
timer->lock (either from master or slave, which is equivalent) in a
few places. For avoiding a deadlock, we ensure that the global
slave_active_lock is always locked at first before each timer lock.
Also, ack and active_list of slave instances are properly unlinked at
snd_timer_stop() and snd_timer_close().
Last but not least, remove the superfluous call of _snd_timer_stop()
at removing slave links. This is a noop, and calling it may confuse
readers wrt locking. Further cleanup will follow in a later patch.
Actually we've got reports of use-after-free by syzkaller fuzzer, and
this hopefully fixes these issues.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bef5502de0 upstream.
We have found that migration source will trigger a BUG that the refcount
of mle is already zero before put when the target is down during
migration. The situation is as follows:
dlm_migrate_lockres
dlm_add_migration_mle
dlm_mark_lockres_migrating
dlm_get_mle_inuse
<<<<<< Now the refcount of the mle is 2.
dlm_send_one_lockres and wait for the target to become the
new master.
<<<<<< o2hb detect the target down and clean the migration
mle. Now the refcount is 1.
dlm_migrate_lockres woken, and put the mle twice when found the target
goes down which trigger the BUG with the following message:
"ERROR: bad mle: ".
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72214a24a7 upstream.
In Python3+ print is a function so the old syntax is not correct
anymore:
$ ./scripts/bloat-o-meter vmlinux.o vmlinux.o.old
File "./scripts/bloat-o-meter", line 61
print "add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \
^
SyntaxError: invalid syntax
Fix by calling print as a function.
Tested on python 2.7.11, 3.5.1
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea535e418c upstream.
In include/asm-generic/sections.h:
/*
* Usage guidelines:
* _text, _data: architecture specific, don't use them in
* arch-independent code
* [_stext, _etext]: contains .text.* sections, may also contain
* .rodata.*
* and/or .init.* sections
_text is not guaranteed across architectures. Architectures such as ARM
may reuse parts which are not actually text and erroneously trigger a bug.
Switch to using _stext which is guaranteed to contain text sections.
Came out of https://lkml.kernel.org/g/<567B1176.4000106@redhat.com>
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 601f1db653 upstream.
The build of m32104ut_defconfig for m32r arch was failing for long long
time with the error:
ERROR: "memory_start" [fs/udf/udf.ko] undefined!
ERROR: "memory_end" [fs/udf/udf.ko] undefined!
ERROR: "memory_end" [drivers/scsi/sg.ko] undefined!
ERROR: "memory_start" [drivers/scsi/sg.ko] undefined!
ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined!
ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined!
As done in other architectures export the symbols to fix the error.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01b9b0b286 upstream.
In some cases tmp_bug can be not filled in cifs_filldir and stay uninitialized,
therefore its printk with "%s" modifier can leak content of kernelspace memory.
If old content of this buffer does not contain '\0' access bejond end of
allocated object can crash the host.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 820962dc70 upstream.
cifs_call_async() queues the MID to the pending list and calls
smb_send_rqst(). If smb_send_rqst() performs a partial send, it sets
the tcpStatus to CifsNeedReconnect and returns an error code to
cifs_call_async(). In this case, cifs_call_async() removes the MID
from the list and returns to the caller.
However, cifs_call_async() releases the server mutex _before_ removing
the MID. This means that a cifs_reconnect() can race with this function
and manage to remove the MID from the list and delete the entry before
cifs_call_async() calls cifs_delete_mid(). This leads to various
crashes due to the use after free in cifs_delete_mid().
Task1 Task2
cifs_call_async():
- rc = -EAGAIN
- mutex_unlock(srv_mutex)
cifs_reconnect():
- mutex_lock(srv_mutex)
- mutex_unlock(srv_mutex)
- list_delete(mid)
- mid->callback()
cifs_writev_callback():
- mutex_lock(srv_mutex)
- delete(mid)
- mutex_unlock(srv_mutex)
- cifs_delete_mid(mid) <---- use after free
Fix this by removing the MID in cifs_call_async() before releasing the
srv_mutex. Also hold the srv_mutex in cifs_reconnect() until the MIDs
are moved out of the pending list.
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@localhost.localdomain>
[bwh: Backported to 3.2:
- In cifs_call_async() there are two error paths jumping to 'out_err';
fix both of them
- s/cifs_delete_mid/delete_mid/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ec7147a99e upstream.
Under some conditions, CIFS can repeatedly call the cifs_dbg() logging
wrapper. If done rapidly enough, the console framebuffer can softlockup
or "rcu_sched self-detected stall". Apply the built-in log ratelimiters
to prevent such hangs.
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- cifs_dbg() and cifs_vfs_err() do not exist, but make similar changes
to cifsfyi(), cifswarn() and cifserror()]
- Include <linux/ratelimit.h> explicitly]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 525fd5a94e upstream.
The value returned by sys_personality has type "long int".
It is saved to a variable of type "int", which is not a problem
yet because the type of task_struct->pesonality is "unsigned int".
The problem is the sign extension from "int" to "long int"
that happens on return from sys_sparc64_personality.
For example, a userspace call personality((unsigned) -EINVAL) will
result to any subsequent personality call, including absolutely
harmless read-only personality(0xffffffff) call, failing with
errno set to EINVAL.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af368027a4 upstream.
ALSA timer ioctls have an open race and this may lead to a
use-after-free of timer instance object. A simplistic fix is to make
each ioctl exclusive. We have already tread_sem for controlling the
tread, and extend this as a global mutex to be applied to each ioctl.
The downside is, of course, the worse concurrency. But these ioctls
aren't to be parallel accessible, in anyway, so it should be fine to
serialize there.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee8413b010 upstream.
ALSA timer instance object has a couple of linked lists and they are
unlinked unconditionally at snd_timer_stop(). Meanwhile
snd_timer_interrupt() unlinks it, but it calls list_del() which leaves
the element list itself unchanged. This ends up with unlinking twice,
and it was caught by syzkaller fuzzer.
The fix is to use list_del_init() variant properly there, too.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e60fc5aa60 upstream.
On a 64bit kernel build the compiler aligns the _sifields union in the
struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
compensates for this alignment and thus fixes the wait testcase of the
strace package.
The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
_sigchld.si_stime variable is missed to be copied and thus after a
copy_siginfo() will have uninitialized values.
Signed-off-by: Helge Deller <deller@gmx.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3567eb6af6 upstream.
ALSA sequencer code has an open race between the timer setup ioctl and
the close of the client. This was triggered by syzkaller fuzzer, and
a use-after-free was caught there as a result.
This patch papers over it by adding a proper queue->timer_mutex lock
around the timer-related calls in the relevant code path.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 030e2c78d3 upstream.
snd_seq_ioctl_remove_events() calls snd_seq_fifo_clear()
unconditionally even if there is no FIFO assigned, and this leads to
an Oops due to NULL dereference. The fix is just to add a proper NULL
check.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e4006f0b8 upstream.
When first SYNACK is sent, we already hold rcu_read_lock(), but this
is not true if a SYNACK is retransmitted, as a timer (soft) interrupt
does not hold rcu_read_lock()
Fixes: 45f6fad84c ("ipv6: add complete rcu protection around np->opt")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0754fb298f upstream.
I was seeing some really weird behaviour where piping UML's output
somewhere would cause output to get duplicated:
$ ./vmlinux | head -n 40
Checking that ptrace can change system call numbers...Core dump limits :
soft - 0
hard - NONE
OK
Checking syscall emulation patch for ptrace...Core dump limits :
soft - 0
hard - NONE
OK
Checking advanced syscall emulation patch for ptrace...Core dump limits :
soft - 0
hard - NONE
OK
Core dump limits :
soft - 0
hard - NONE
This is because these tests do a fork() which duplicates the non-empty
stdout buffer, then glibc flushes the duplicated buffer as each child
exits.
A simple workaround is to flush before forking.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f2dfda2f2 upstream.
An inverted return value check in hostfs_mknod() caused the function
to return success after handling it as an error (and cleaning up).
It resulted in the following segfault when trying to bind() a named
unix socket:
Pid: 198, comm: a.out Not tainted 4.4.0-rc4
RIP: 0033:[<0000000061077df6>]
RSP: 00000000daae5d60 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000006092a460 RCX: 00000000dfc54208
RDX: 0000000061073ef1 RSI: 0000000000000070 RDI: 00000000e027d600
RBP: 00000000daae5de0 R08: 00000000da980ac0 R09: 0000000000000000
R10: 0000000000000003 R11: 00007fb1ae08f72a R12: 0000000000000000
R13: 000000006092a460 R14: 00000000daaa97c0 R15: 00000000daaa9a88
Kernel panic - not syncing: Kernel mode fault at addr 0x40, ip 0x61077df6
CPU: 0 PID: 198 Comm: a.out Not tainted 4.4.0-rc4 #1
Stack:
e027d620 dfc54208 0000006f da981398
61bee000 0000c1ed daae5de0 0000006e
e027d620 dfcd4208 00000005 6092a460
Call Trace:
[<60dedc67>] SyS_bind+0xf7/0x110
[<600587be>] handle_syscall+0x7e/0x80
[<60066ad7>] userspace+0x3e7/0x4e0
[<6006321f>] ? save_registers+0x1f/0x40
[<6006c88e>] ? arch_prctl+0x1be/0x1f0
[<60054985>] fork_handler+0x85/0x90
Let's also get rid of the "cosmic ray protection" while we're at it.
Fixes: e9193059b1 "hostfs: fix races in dentry_name() and inode_name()"
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 385277bfb5 upstream.
When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.
The function copy_callback sets pe->error if there was error copying the
chunk, and then calls complete_exception. complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).
The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps->pending_count and
persistent_commit_exception decreases it.
If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not. This results in the variable
ps->pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.
Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful. A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store. Also, remove commit_callback now that
it is merely a wrapper around pending_complete.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f3697e24d upstream.
Dmitry reported that he was able to reproduce the WARN_ON_ONCE that
fires in locks_free_lock_context when the flc_posix list isn't empty.
The problem turns out to be that we're basically rebuilding the
file_lock from scratch in fcntl_setlk when we discover that the setlk
has raced with a close. If the l_whence field is SEEK_CUR or SEEK_END,
then we may end up with fl_start and fl_end values that differ from
when the lock was initially set, if the file position or length of the
file has changed in the interim.
Fix this by just reusing the same lock request structure, and simply
override fl_type value with F_UNLCK as appropriate. That ensures that
we really are unlocking the lock that was initially set.
While we're there, make sure that we do pop a WARN_ON_ONCE if the
removal ever fails. Also return -EBADF in this event, since that's
what we would have returned if the close had happened earlier.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Fixes: c293621bbf (stale POSIX lock handling)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
[bwh: Backported to 3.2: s/i_flctx->flc_posix/inode->i_flock/ in comments]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b9140f39c upstream.
Writing 0 length data into test_power makes it access an invalid array
location and kill the system.
Fixes: f17ef9b2d ("power: Make test_power driver more dynamic.")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bb00c898ad upstream.
If a name contains at least some characters with Unicode values
exceeding single byte, the CS0 output should have 2 bytes per character.
And if other input characters have single byte Unicode values, then
the single input byte is converted to 2 output bytes, and the length
of output becomes larger than the length of input. And if the input
name is long enough, the output length may exceed the allocated buffer
length.
All this means that conversion from UTF8 or NLS to CS0 requires
checking of output length in order to stop when it exceeds the given
output buffer size.
[JK: Make code return -ENAMETOOLONG instead of silently truncating the
name]
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad402b265e upstream.
udf_CS0toUTF8 function stops the conversion when the output buffer
length reaches UDF_NAME_LEN-2, which is correct maximum name length,
but, when checking, it leaves the space for a single byte only,
while multi-bytes output characters can take more space, causing
buffer overflow.
Similar error exists in udf_CS0toNLS function, that restricts
the output length to UDF_NAME_LEN, while actual maximum allowed
length is UDF_NAME_LEN-2.
In these cases the output can override not only the current buffer
length field, causing corruption of the name buffer itself, but also
following allocation structures, causing kernel crash.
Adjust the output length checks in both functions to prevent buffer
overruns in case of multi-bytes UTF8 or NLS characters.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6a1f513776 upstream.
On a cancelled suspend the vcpu_info location does not change (it's
still in the per-cpu area registered by xen_vcpu_setup()). So do not
call xen_hvm_init_shared_info() which would make the kernel think its
back in the shared info. With the wrong vcpu_info, events cannot be
received and the domain will hang after a cancelled suspend.
Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dd0d0d4de5 upstream.
Without i8042.nomux=1 the Elantech touch pad is not working at all on
a Fujitsu Lifebook U745. This patch does not seem necessary for all
U745 (maybe because of different BIOS versions?). However, it was
verified that the patch does not break those (see opensuse bug 883192:
https://bugzilla.opensuse.org/show_bug.cgi?id=883192).
Signed-off-by: Aurélien Francillon <aurelien@francillon.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ade14a7df7 upstream.
If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b68d0ae7e5 upstream.
This driver fails to copy the module parameter for software encryption
to the locations used by the main code.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b24f19f16b upstream.
The module parameter for software encryption was never transferred to
the location used by the driver.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7503efbd82 upstream.
Two of the module parameter descriptions show incorrect default values.
In addition the value for software encryption is not transferred to
the locations used by the driver.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d4d60b4caa upstream.
Two of the module parameters are listed with incorrect default values.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b9f23727a upstream.
The posix_clock_poll function is supposed to return a bit mask of
POLLxxx values. However, in case the hardware has disappeared (due to
hot plugging for example) this code returns -ENODEV in a futile
attempt to throw an error at the file descriptor level. The kernel's
file_operations interface does not accept such error codes from the
poll method. Instead, this function aught to return POLLERR.
The value -ENODEV does, in fact, contain the POLLERR bit (and almost
all the other POLLxxx bits as well), but only by chance. This patch
fixes code to return a proper bit mask.
Credit goes to Markus Elfring for pointing out the suspicious
signed/unsigned mismatch.
Reported-by: Markus Elfring <elfring@users.sourceforge.net>
igned-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Link: http://lkml.kernel.org/r/1450819198-17420-1-git-send-email-richardcochran@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0918d9f47 upstream.
udf_next_aext() just follows extent pointers while extents are marked as
indirect. This can loop forever for corrupted filesystem. Limit number
the of indirect extents we are willing to follow in a row.
[JK: Updated changelog, limit, style]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Jan Kara <jack@suse.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dbec6719ac upstream.
The commit d7ba858a7f (ALSA: fm801: implement TEA575x tuner autodetection)
brings autodetection to the driver. However the autodetection algorithm misses
the TUNER_ONLY bit if it is supplied by the user.
Thus, user gets weird messages and no card registered.
snd_fm801 0000:0d:01.0: detected TEA575x radio type SF64-PCR
snd_fm801 0000:0d:01.0: AC'97 interface is busy (1)
snd_fm801 0000:0d:01.0: AC'97 interface is busy (1)
...
snd_fm801 0000:0d:01.0: AC'97 0 does not respond - RESET
snd_fm801 0000:0d:01.0: AC'97 interface is busy (1)
snd_fm801 0000:0d:01.0: AC'97 interface is busy (1)
snd_fm801 0000:0d:01.0: AC'97 0 access is not valid [0x0], removing mixer.
snd_fm801: probe of 0000:0d:01.0 failed with error -5
Do a copy of TUNER_ONLY bit to be applied after autodetection is done.
Fixes: d7ba858a7f (ALSA: fm801: implement TEA575x tuner autodetection)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 81d7a3294d upstream.
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.
So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")
This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49e9cf3f0c upstream.
According to memory-barriers.txt:
> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...
Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:
https://lkml.org/lkml/2015/10/14/970
To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.
This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.
Fixes: b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fcd5c4dd82 upstream.
EDAC workqueue destruction is really fragile. We cancel delayed work
but if it is still running and requeues itself, we still go ahead and
destroy the workqueue and the queued work explodes when workqueue core
attempts to run it.
Make the destruction more robust by switching op_state to offline so
that requeuing stops. Cancel any pending work *synchronously* too.
EDAC i7core: Driver loaded.
general protection fault: 0000 [#1] SMP
CPU 12
Modules linked in:
Supported: Yes
Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7
RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0
< ... regs ...>
Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600)
Stack:
...
Call Trace:
call_timer_fn
run_timer_softirq
__do_softirq
call_softirq
do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
intel_idle
cpuidle_idle_call
cpu_idle
Code: ...
RIP __queue_work
RSP <...>
Signed-off-by: Borislav Petkov <bp@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4eeac22c15 upstream.
In corner case for wl12xx_spi_raw_write() when
len == SPI_AGGR_BUFFER_SIZE
we don't setup correctly spi transfer_list.
Next we will have garbage and strange errors
reported by SPI framework (eg. wrong speed_hz,
failed to transfer one message from queue)
when iterate transfer_list.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Luciano Coelho <luca@coelho.fi>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 17bc55864f upstream.
Free skb for received frames with a wrong checksum. This can happen
pretty rapidly, exhausting all memory.
This fixes a memleak (detected with kmemleak). Originally found while
using monitor mode, but it also appears during managed mode (once the
link is up).
Signed-off-by: Peter Wu <peter@lekensteyn.nl>
ACKed-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1eaf35e4dd upstream.
The module should fail to load.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: xhci_hcd_init() registers the PCI driver, so
check before doing that rather than at the end of the function]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dbb17a21c1 upstream.
Need to call this on resume if displays changes during
suspend in order to properly be notified of changes.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3a318426e0 upstream.
We check for overflow here, but we don't check for underflow so it
causes a static checker warning.
Fixes: fb9987d0f7 ('ath9k_htc: Support for AR9271 chipset.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aba2f06c07 upstream.
Poor #AC was so unimportant until a few days ago that we were
not even tracing its name correctly. But now it's all over
the place.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9dbe6cf941 upstream.
If we do not do this, it is not properly saved and restored across
migration. Windows notices due to its self-protection mechanisms,
and is very upset about it (blue screen of death).
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- We didn't yet have the switch() in kvm_init_msr_list as MPX is not
supported at all
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d282e2b383 upstream.
The initio driver has for many years had two copies of the
same module device table. One of them is also used for registering
the other driver, the other one is entirely useless after the
large scale cleanup that Alan Cox did back in 2007.
The compiler warns about this whenever the driver is built-in:
drivers/scsi/initio.c:131:29: warning: 'i91u_pci_devices' defined but not used [-Wunused-variable]
This removes the extraneous table and the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 72d39fea90 ("[SCSI] initio: Convert into a real Linux driver and update to modern style")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ff56fadd9 upstream.
rc-main mistakenly uses #ifdef MODULE to determine whether it should
load the rc keymap modules. This symbol is only defined if rc-main
is being built as a module itself, and bears no relation to whether
the rc keymaps are modules.
Fix this to use CONFIG_MODULES instead.
Fixes: 631493ecac ("[media] rc-core: merge rc-map.c into rc-main.c")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9d57de610 upstream.
When in FE_TUNE_MODE_ONESHOT the frontend must report
the actual capabilities so user can take appropriate
action.
With frontends that can't do auto inversion this is done
by dvb-core automatically so CAN_INVERSION_AUTO is valid.
However, when in FE_TUNE_MODE_ONESHOT this is not true.
So only set FE_CAN_INVERSION_AUTO in modes other than
FE_TUNE_MODE_ONESHOT
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 18339f59c3 upstream.
Fixed HID descriptor for DragonRise Joystick. Replaced default descriptor
which doubles Z axis and causes mixing values of X and Z axes.
Signed-off-by: Maciej Zuk <gzmlke@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f7e7868b47 upstream.
Recently, i bought a blu-ray writer and noticed that while cdrecord
worked perfectly, random writing didn't work on rewritable bd-re media.
For example, dd if=/dev/zero of=/dev/sr0 bs=32768 count=2 gave the usual
"read-only file system" message.
After checking if the problem lies with my burner or firmware, i grep-ed
the kernel source for EROFS. One of the results was in the cdrom driver.
I tried to follow the function chain and ended in the cdrom_is_dvd_rw
function where writing is permitted only for DVD-RAM and DVD+RW media.
I added a new case label for 0x43 which is the profile name of BD-RE
and now it works correctly for BD-RE too.
Maybe there is a better way of implementing this, like a new function
checking for blu-ray support and called from cdrom_open_write like
it happens for mrw and dvdram media, but adding the case label worked.
Thank you for your time.
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 84d7f2ebd7 upstream.
Intel DNV SoC has the same legacy SMBus host controller than Intel
Sunrisepoint PCH. It also has same iTCO watchdog on the bus.
Add DNV PCI ID to the list of supported devices.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
[bwh: Backported to 3.2: no FEATURE_IRQ or FEATURE_TCO here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ce3161106a upstream.
A long name broke the alignment, shift the columns a bit to fix it and
make the table look nice again. While we're here, switch to the
standard comment style to make checkpatch happy, and use tabs instead
of spaces for column alignment.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
[bwh: Backported to 3.2: "Interrupt processing" isn't mentioned]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4d92f0099a upstream.
This change was to preserve the ascending order of device IDs.
There was an exception with the first two Lewisburg device IDs to
keep all device IDs of the same kind grouped by code name.
Signed-off-by: Alexandra Yates <alexandra.yates@linux.intel.com>
signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1fefb8fdc6 upstream.
The JMicron JMB362 controller supports AHCI only, but some revisions
use the IDE class code. These need to be matched by device ID.
These additions have apparently been included by QNAP in their NAS
devices using these controllers.
References: http://bugs.debian.org/634180
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
commit efda332cb6 upstream.
This patch adds the RAID-mode SATA Device IDs for the Intel Wellsburg PCH
Signed-off-by: James Ralston <james.d.ralston@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 751e5f5c75 upstream.
kernel test robot has reported the following crash:
BUG: unable to handle kernel NULL pointer dereference at 00000100
IP: [<c1074df6>] __queue_work+0x26/0x390
*pdpt = 0000000000000000 *pde = f000ff53f000ff53 *pde = f000ff53f000ff53
Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP
CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe #1
Workqueue: events vmstat_shepherd
task: cb684600 ti: cb7ba000 task.ti: cb7ba000
EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0
EIP is at __queue_work+0x26/0x390
EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0
Stack:
Call Trace:
__queue_delayed_work+0xa1/0x160
queue_delayed_work_on+0x36/0x60
vmstat_shepherd+0xad/0xf0
process_one_work+0x1aa/0x4c0
worker_thread+0x41/0x440
kthread+0xb0/0xd0
ret_from_kernel_thread+0x21/0x40
The reason is that start_shepherd_timer schedules the shepherd work item
which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates
that workqueue so if the further initialization takes more than HZ we
might end up scheduling on a NULL vmstat_wq. This is really unlikely
but not impossible.
Fixes: 373ccbe592 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: This precise race condition doesn't exist, but there
is a similar potential race with CPU hotplug. So move the alloc_workqueue()
above register_cpu_notifier().]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 197c949e77 upstream.
Backport of this upstream commit into stable kernels :
89c22d8c3b ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.
In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
msg->msg_iov);
returns -EFAULT.
This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.
For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.
This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 127500d724. That fixed
the problem of buffer over-reads introduced by backporting commit
89c22d8c3b ("net: Fix skb csum races when peeking"), but resulted in
incorrect checksumming for short reads. It will be replaced with a
complete fix.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e5e57e7a03 upstream.
While setting the KVM PIT counters in 'kvm_pit_load_count', if
'hpet_legacy_start' is set, the function disables the timer on
channel[0], instead of the respective index 'channel'. This is
because channels 1-3 are not linked to the HPET. Fix the caller
to only activate the special HPET processing for channel 0.
Reported-by: P J P <pjp@fedoraproject.org>
Fixes: 0185604c2d
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0185604c2d upstream.
Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash. This will ensure
that 0 values are converted to 65536 as per the spec.
This is CVE-2015-7513.
Signed-off-by: Andy Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07a5d38453 upstream.
dst_release should not access dst->flags after decrementing
__refcnt to 0. The dst_entry may be in dst_busy_list and
dst_gc_task may dst_destroy it before dst_release gets a chance
to access dst->flags.
Fixes: d69bbf88c8 ("net: fix a race in dst_release()")
Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit abc7e40c81 upstream.
If a interrupt chip utilizes chip->buslock then free_irq() can
deadlock in the following way:
CPU0 CPU1
interrupt(X) (Shared or spurious)
free_irq(X) interrupt_thread(X)
chip_bus_lock(X)
irq_finalize_oneshot(X)
chip_bus_lock(X)
synchronize_irq(X)
synchronize_irq() waits for the interrupt thread to complete,
i.e. forever.
Solution is simple: Drop chip_bus_lock() before calling
synchronize_irq() as we do with the irq_desc lock. There is nothing to
be protected after the point where irq_desc lock has been released.
This adds chip_bus_lock/unlock() to the remove_irq() code path, but
that's actually correct in the case where remove_irq() is called on
such an interrupt. The current users of remove_irq() are not affected
as none of those interrupts is on a chip which requires buslock.
Reported-by: Fredrik Markström <fredrik.markstrom@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 17b85d29e8 upstream.
This reverts commit 00ee592717 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e967ef022e upstream.
When 32-bit MIPS userspace invokes a syscall indirectly via syscall(number,
arg1, ..., arg7), the kernel looks up the actual syscall based on the given
number, shifts the other arguments to the left, and jumps to the syscall.
If the syscall is interrupted by a signal and indicates it needs to be
restarted by the kernel (by returning ERESTARTNOINTR for example), the
syscall must be called directly, since the number is no longer the first
argument, and the other arguments are now staged for a direct call.
Before shifting the arguments, store the syscall number in pt_regs->regs[2].
This gets copied temporarily into pt_regs->regs[0] after the syscall returns.
If the syscall needs to be restarted, handle_signal()/do_signal() copies the
number back to pt_regs->reg[2], which ends up in $v0 once control returns to
userspace.
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8929/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f0f2887f4 upstream.
test_pages_in_a_zone() does not account for the possibility of missing
sections in the given pfn range. pfn_valid_within always returns 1 when
CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing
sections to pass the test, leading to a kernel oops.
Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check
for missing sections before proceeding into the zone-check code.
This also prevents a crash from offlining memory devices with missing
sections. Despite this, it may be a good idea to keep the related patch
'[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with
missing sections' because missing sections in a memory block may lead to
other problems not covered by the scope of this fix.
Signed-off-by: Andrew Banman <abanman@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c9ee4cbf2 upstream.
When resizing, it firstly extends the last gd. Once it should backup
super in the gd, it calculates new backup super and update the
corresponding value.
But it currently doesn't consider the situation that the backup super is
already done. And in this case, it still sets the bit in gd bitmap and
then decrease from bg_free_bits_count, which leads to a corrupted gd and
trigger the BUG in ocfs2_block_group_set_bits:
BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits);
So check whether the backup super is done and then do the updates.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e459dfeeb6 upstream.
ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded,
ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed,
ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer.
Fix this by inverting ip6addrlbl_hold() check.
Fixes: 2a8cc6c890 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71a71fb537 upstream.
On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02
("MIPS: Fix restart of indirect syscalls").
On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
ble 0x100(%sr2, %r0)
ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.
This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
ble 0x100(%sr2, %r0)
copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.
This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4a1b4f504 upstream.
This fixes CVE-2015-7550.
There's a race between keyctl_read() and keyctl_revoke(). If the revoke
happens between keyctl_read() checking the validity of a key and the key's
semaphore being taken, then the key type read method will see a revoked key.
This causes a problem for the user-defined key type because it assumes in
its read method that there will always be a payload in a non-revoked key
and doesn't check for a NULL pointer.
Fix this by making keyctl_read() check the validity of a key after taking
semaphore instead of before.
I think the bug was introduced with the original keyrings code.
This was discovered by a multithreaded test program generated by syzkaller
(http://github.com/google/syzkaller). Here's a cleaned up version:
#include <sys/types.h>
#include <keyutils.h>
#include <pthread.h>
void *thr0(void *arg)
{
key_serial_t key = (unsigned long)arg;
keyctl_revoke(key);
return 0;
}
void *thr1(void *arg)
{
key_serial_t key = (unsigned long)arg;
char buffer[16];
keyctl_read(key, buffer, 16);
return 0;
}
int main()
{
key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING);
pthread_t th[5];
pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key);
pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key);
pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key);
pthread_join(th[0], 0);
pthread_join(th[1], 0);
pthread_join(th[2], 0);
pthread_join(th[3], 0);
return 0;
}
Build as:
cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread
Run as:
while keyctl-race; do :; done
as it may need several iterations to crash the kernel. The crash can be
summarised as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff81279b08>] user_read+0x56/0xa3
...
Call Trace:
[<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7
[<ffffffff81277815>] SyS_keyctl+0x83/0xe0
[<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e50293ef97 upstream.
Commit 8520f38099 ("USB: change hub initialization sleeps to
delayed_work") changed the hub_activate() routine to make part of it
run in a workqueue. However, the commit failed to take a reference to
the usb_hub structure or to lock the hub interface while doing so. As
a result, if a hub is plugged in and quickly unplugged before the work
routine can run, the routine will try to access memory that has been
deallocated. Or, if the hub is unplugged while the routine is
running, the memory may be deallocated while it is in active use.
This patch fixes the problem by taking a reference to the usb_hub at
the start of hub_activate() and releasing it at the end (when the work
is finished), and by locking the hub interface while the work routine
is running. It also adds a check at the start of the routine to see
if the hub has already been disconnected, in which nothing should be
done.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Alexandru Cornea <alexandru.cornea@intel.com>
Tested-by: Alexandru Cornea <alexandru.cornea@intel.com>
Fixes: 8520f38099 ("USB: change hub initialization sleeps to delayed_work")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: add prototype for hub_release() before first use]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit abdc9a3b4b upstream.
The code expects the loop to end with "retries" set to zero but, because
it is a post-op, it will end set to -1. I have fixed this by moving the
decrement inside the loop.
Fixes: 014aa2a3c3 ('USB: ipaq: minor ipaq_open() cleanup.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 408fb0e5aa upstream.
commit f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way")
teaches us that dealing with MSI-X can be troublesome.
Further checks in the MSI-X architecture shows that if the
PCI_COMMAND_MEMORY bit is turned of in the PCI_COMMAND we
may not be able to access the BAR (since they are memory regions).
Since the MSI-X tables are located in there.. that can lead
to us causing PCIe errors. Inhibit us performing any
operation on the MSI-X unless the MEMORY bit is set.
Note that Xen hypervisor with:
"x86/MSI-X: access MSI-X table only after having enabled MSI-X"
will return:
xen_pciback: 0000:0a:00.1: error -6 enabling MSI-X for guest 3!
When the generic MSI code tries to setup the PIRQ without
MEMORY bit set. Which means with later versions of Xen
(4.6) this patch is not neccessary.
This is part of XSA-157
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cfb905b96 upstream.
Otherwise just continue on, returning the same values as
previously (return of 0, and op->result has the PIRQ value).
This does not change the behavior of XEN_PCI_OP_disable_msi[|x].
The pci_disable_msi or pci_disable_msix have the checks for
msi_enabled or msix_enabled so they will error out immediately.
However the guest can still call these operations and cause
us to disable the 'ack_intr'. That means the backend IRQ handler
for the legacy interrupt will not respond to interrupts anymore.
This will lead to (if the device is causing an interrupt storm)
for the Linux generic code to disable the interrupt line.
Naturally this will only happen if the device in question
is plugged in on the motherboard on shared level interrupt GSI.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a396f3a210 upstream.
Otherwise an guest can subvert the generic MSI code to trigger
an BUG_ON condition during MSI interrupt freeing:
for (i = 0; i < entry->nvec_used; i++)
BUG_ON(irq_has_action(entry->irq + i));
Xen PCI backed installs an IRQ handler (request_irq) for
the dev->irq whenever the guest writes PCI_COMMAND_MEMORY
(or PCI_COMMAND_IO) to the PCI_COMMAND register. This is
done in case the device has legacy interrupts the GSI line
is shared by the backend devices.
To subvert the backend the guest needs to make the backend
to change the dev->irq from the GSI to the MSI interrupt line,
make the backend allocate an interrupt handler, and then command
the backend to free the MSI interrupt and hit the BUG_ON.
Since the backend only calls 'request_irq' when the guest
writes to the PCI_COMMAND register the guest needs to call
XEN_PCI_OP_enable_msi before any other operation. This will
cause the generic MSI code to setup an MSI entry and
populate dev->irq with the new PIRQ value.
Then the guest can write to PCI_COMMAND PCI_COMMAND_MEMORY
and cause the backend to setup an IRQ handler for dev->irq
(which instead of the GSI value has the MSI pirq). See
'xen_pcibk_control_isr'.
Then the guest disables the MSI: XEN_PCI_OP_disable_msi
which ends up triggering the BUG_ON condition in 'free_msi_irqs'
as there is an IRQ handler for the entry->irq (dev->irq).
Note that this cannot be done using MSI-X as the generic
code does not over-write dev->irq with the MSI-X PIRQ values.
The patch inhibits setting up the IRQ handler if MSI or
MSI-X (for symmetry reasons) code had been called successfully.
P.S.
Xen PCIBack when it sets up the device for the guest consumption
ends up writting 0 to the PCI_COMMAND (see xen_pcibk_reset_device).
XSA-120 addendum patch removed that - however when upstreaming said
addendum we found that it caused issues with qemu upstream. That
has now been fixed in qemu upstream.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e0ce1455c upstream.
The guest sequence of:
a) XEN_PCI_OP_enable_msix
b) XEN_PCI_OP_enable_msix
results in hitting an NULL pointer due to using freed pointers.
The device passed in the guest MUST have MSI-X capability.
The a) constructs and SysFS representation of MSI and MSI groups.
The b) adds a second set of them but adding in to SysFS fails (duplicate entry).
'populate_msi_sysfs' frees the newly allocated msi_irq_groups (note that
in a) pdev->msi_irq_groups is still set) and also free's ALL of the
MSI-X entries of the device (the ones allocated in step a) and b)).
The unwind code: 'free_msi_irqs' deletes all the entries and tries to
delete the pdev->msi_irq_groups (which hasn't been set to NULL).
However the pointers in the SysFS are already freed and we hit an
NULL pointer further on when 'strlen' is attempted on a freed pointer.
The patch adds a simple check in the XEN_PCI_OP_enable_msix to guard
against that. The check for msi_enabled is not stricly neccessary.
This is part of XSA-157
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 56441f3c8e upstream.
The guest sequence of:
a) XEN_PCI_OP_enable_msi
b) XEN_PCI_OP_enable_msi
c) XEN_PCI_OP_disable_msi
results in hitting an BUG_ON condition in the msi.c code.
The MSI code uses an dev->msi_list to which it adds MSI entries.
Under the above conditions an BUG_ON() can be hit. The device
passed in the guest MUST have MSI capability.
The a) adds the entry to the dev->msi_list and sets msi_enabled.
The b) adds a second entry but adding in to SysFS fails (duplicate entry)
and deletes all of the entries from msi_list and returns (with msi_enabled
is still set). c) pci_disable_msi passes the msi_enabled checks and hits:
BUG_ON(list_empty(dev_to_msi_list(&dev->dev)));
and blows up.
The patch adds a simple check in the XEN_PCI_OP_enable_msi to guard
against that. The check for msix_enabled is not stricly neccessary.
This is part of XSA-157.
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8135cf8b09 upstream.
Double fetch vulnerabilities that happen when a variable is
fetched twice from shared memory but a security check is only
performed the first time.
The xen_pcibk_do_op function performs a switch statements on the op->cmd
value which is stored in shared memory. Interestingly this can result
in a double fetch vulnerability depending on the performed compiler
optimization.
This patch fixes it by saving the xen_pci_op command before
processing it. We also use 'barrier' to make sure that the
compiler does not perform any optimization.
This is part of XSA155.
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f13d75ccb upstream.
A compiler may load a switch statement value multiple times, which could
be bad when the value is in memory shared with the frontend.
When converting a non-native request to a native one, ensure that
src->operation is only loaded once by using READ_ONCE().
This is part of XSA155.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2:
- s/READ_ONCE/ACCESS_ONCE/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 68a33bfd84 upstream.
Instead of open-coding memcpy()s and directly accessing Tx and Rx
requests, use the new RING_COPY_REQUEST() that ensures the local copy
is correct.
This is more than is strictly necessary for guest Rx requests since
only the id and gref fields are used and it is harmless if the
frontend modifies these.
This is part of XSA155.
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2:
- s/queue/vif/g
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0f589967a7 upstream.
The last from guest transmitted request gives no indication about the
minimum amount of credit that the guest might need to send a packet
since the last packet might have been a small one.
Instead allow for the worst case 128 KiB packet.
This is part of XSA155.
CC: stable@vger.kernel.org
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2: s/queue/vif/g]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 454d5d882c upstream.
Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a request
generally requires taking a local copy.
Provide a RING_COPY_REQUEST() macro to use instead of
RING_GET_REQUEST() and an open-coded memcpy(). This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.
Use a volatile source to prevent the compiler from reordering or
omitting the copy.
This is part of XSA155.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 272fa59ccb upstream.
The print_insn() function returns strings like "lghi %r1,0". To escape the
'%' character in sprintf() a second '%' is used. For example "lghi %%r1,0"
is converted into "lghi %r1,0".
After print_insn() the output string is passed to printk(). Because format
specifiers like "%r" or "%f" are ignored by printk() this works by chance
most of the time. But for instructions with control registers like
"lctl %c6,%c6,780" this fails because printk() interprets "%c" as
character format specifier.
Fix this problem and escape the '%' characters twice.
For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf()
into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780".
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: drop the OPERAND_VR case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a50bd43935 upstream.
Russell King found that he had weird side effects when compiling the kernel
with hard linked ccache. The reason was that recordmcount modified the
kernel in place via mmap, and when a file gets modified twice by
recordmcount, it will complain about it. To fix this issue, Russell wrote a
patch that checked if the file was hard linked more than once and would
unlink it if it was.
Linus Torvalds was not happy with the fact that recordmcount does this in
place modification. Instead of doing the unlink only if the file has two or
more hard links, it does the unlink all the time. In otherwords, it always
does a copy if it changed something. That is, it does the write out if a
change was made.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7bbadd2d10 upstream.
Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.
Fixes: 79462ad02e ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: keep open-coding U8_MAX]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 157f38f993 upstream.
Fix parent-device reference leak due to SPI-core taking an unnecessary
reference to the parent when allocating the master structure, a
reference that was never released.
Note that driver core takes its own reference to the parent when the
master device is registered.
Fixes: 49dce689ad ("spi doesn't need class_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4c5e354a97 upstream.
When shutting down the device, the struct ser_cardstate must not be
kfree()d immediately after the call to platform_device_unregister()
since the embedded struct platform_device is still in use.
Move the kfree() call to the release method instead.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Fixes: 2869b23e4b ("drivers/isdn/gigaset: new M101 driver (v2)")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 40d24c4d8a upstream.
There are two issue here.
1) cnt starts as maxloop + 1 so all these loops iterate one more time
than intended.
2) At the end of the loop we test for "if (maxloop && !cnt)" but for
the first two loops, we end with cnt equal to -1. Changing this to
a pre-op means we end with cnt set to 0.
Fixes: cae86d4a4e ('mISDN: Add driver for Infineon ISDN chipset family')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e2309937f upstream.
For the little-endian SH771x kernels the driver has to byte-swap the RX/TX
buffers, however yet unset physcial address from the TX descriptor is used
to call sh_eth_soft_swap(). Use 'skb->data' instead...
Fixes: 31fcb99d99 ("net: sh_eth: remove __flush_purge_region")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 42e3121d90 upstream.
AudioQuest DragonFly DAC reports a volume control range of 0..50
(0x0000..0x0032) which in USB Audio means a range of 0 .. 0.2dB, which
is obviously incorrect and would cause software using the dB information
in e.g. volume sliders to have a massive volume difference in 100..102%
range.
Commit 2d1cb7f658 ("ALSA: usb-audio: add dB range mapping for some
devices") added a dB range mapping for it with range 0..50 dB.
However, the actual volume mapping seems to be neither linear volume nor
linear dB scale, but instead quite close to the cubic mapping e.g.
alsamixer uses, with a range of approx. -53...0 dB.
Replace the previous quirk with a custom dB mapping based on some basic
output measurements, using a 10-item range TLV (which will still fit in
alsa-lib MAX_TLV_RANGE_SIZE).
Tested on AudioQuest DragonFly HW v1.2. The quirk is only applied if the
range is 0..50, so if this gets fixed/changed in later HW revisions it
will no longer be applied.
v2: incorporated Takashi Iwai's suggestion for the quirk application
method
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: open-code usb_audio_info()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bf1d1c9b61 upstream.
Add a DECLARE_TLV_DB_RANGE() macro so that dB range information
can be specified without having to count the items manually for
TLV_DB_RANGE_HEAD().
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b5b9eb5467 upstream.
Add helper macros with a little bit of preprocessor magic to
automatically compute the length of a TLV item. This lets us avoid
having to compute this by hand, and will allow to use items that do
not use a fixed length.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e1033561d upstream.
KASAN found that our additional element processing scripts drop off
the end of the VPD page into unallocated space. The reason is that
not every element has additional information but our traversal
routines think they do, leading to them expecting far more additional
information than is present. Fix this by adding a gate to the
traversal routine so that it only processes elements that are expected
to have additional information (list is in SES-2 section 6.1.13.1:
Additional Element Status diagnostic page overview)
Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3417c1b5cb upstream.
Simple enclosure implementations (mostly USB) are allowed to return only
page 8 to every diagnostic query. That really confuses our
implementation because we assume the return is the page we asked for and
end up doing incorrect offsets based on bogus information leading to
accesses outside of allocated ranges. Fix that by checking the page
code of the return and giving an error if it isn't the one we asked for.
This should fix reported bugs with USB storage by simply refusing to
attach to enclosures that behave like this. It's also good defensive
practise now that we're starting to see more USB enclosures.
Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b7bb110008 upstream.
Some users of rfkill, like NFC and cfg80211, use a dynamic name when
allocating rfkill, in those cases dev_name(). Therefore, the pointer
passed to rfkill_alloc() might not be valid forever, I specifically
found the case that the rfkill name was quite obviously an invalid
pointer (or at least garbage) when the wiphy had been renamed.
Fix this by making a copy of the rfkill name in rfkill_alloc().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70d906bc17 upstream.
Some ciphers actually support encrypting zero length plaintexts. For
example, many AEAD modes support this. The resulting ciphertext for
those winds up being only the authentication tag, which is a result of
the key, the iv, the additional data, and the fact that the plaintext
had zero length. The blkcipher constructors won't copy the IV to the
right place, however, when using a zero length input, resulting in
some significant problems when ciphers call their initialization
routines, only to find that the ->iv parameter is uninitialized. One
such example of this would be using chacha20poly1305 with a zero length
input, which then calls chacha20, which calls the key setup routine,
which eventually OOPSes due to the uninitialized ->iv member.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit acfc1cc13f upstream.
If diu_ops is not implemented on platform, kernel will access a NULL
pointer. We need to check this pointer in DIU initialization.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Acked-by: Timur Tabi <timur@tabi.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 69ce6487dc upstream.
While cooking the sctp np->opt rcu fixes, I forgot to move
one rcu_read_unlock() after the added rcu_dereference() in
sctp_v6_get_dst()
This gave lockdep warnings reported by Dave Jones.
Fixes: c836a8ba93 ("ipv6: sctp: add rcu protection around np->opt")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8a0d19c5ed upstream.
when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING
state, if B neither claim his rwnd is 0 nor send SACK for this data, A
will keep retransmitting this data until t5 timeout, Max.Retrans times
can't work anymore, which is bad.
if B's rwnd is not 0, it should send abort after Max.Retrans times, only
when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A
will start t5 timer, which is also commit f8d9605243 ("sctp: Enforce
retransmission limit during shutdown") means, but it lacks the condition
peer rwnd == 0.
so fix it by adding a bit (zero_window_announced) in peer to record if
the last rwnd is 0. If it was, zero_window_announced will be set. and use
this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING.
Fixes: commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: change sack_needed to bitfield as done earlier upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ab42d78e3 upstream.
Currently slhc_init() treats out-of-range values of rslots and tslots
as equivalent to 0, except that if tslots is too large it will
dereference a null pointer (CVE-2015-7799).
Add a range-check at the top of the function and make it return an
ERR_PTR() on error instead of NULL. Change the callers accordingly.
Compile-tested only.
Reported-by: 郭永刚 <guoyonggang@360.cn>
References: http://article.gmane.org/gmane.comp.security.oss.general/17908
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust indentation]
commit 60bc851ae5 upstream.
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.
This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.
This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.
A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.
As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().
recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.
[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Reported-by: Ambrose Feinstein <ambrose@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3822b5c2fc ]
With b3ca9b02b0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(&u->readlock) to
mutex_lock_interruptible(&u->readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 248be83dcb ]
In a low memory situation the following kernel oops occurs:
Unable to handle kernel NULL pointer dereference at virtual address 00000050
pgd = 8490c000
[00000050] *pgd=4651e831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 Not tainted (3.4-at16 #9)
PC is at skb_put+0x10/0x98
LR is at sh_eth_poll+0x2c8/0xa10
pc : [<8035f780>] lr : [<8028bf50>] psr: 60000113
sp : 84eb1a90 ip : 84eb1ac8 fp : 84eb1ac4
r10: 0000003f r9 : 000005ea r8 : 00000000
r7 : 00000000 r6 : 940453b0 r5 : 00030000 r4 : 9381b180
r3 : 00000000 r2 : 00000000 r1 : 000005ea r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 4248c059 DAC: 00000015
Process klogd (pid: 2046, stack limit = 0x84eb02e8)
[...]
This is because netdev_alloc_skb() fails and 'mdp->rx_skbuff[entry]' is left
NULL but sh_eth_rx() later uses it without checking. Add such check...
Reported-by: Yasushi SHOJI <yashi@atmark-techno.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 79462ad02e ]
郭永刚 reported that one could simply crash the kernel as root by
using a simple program:
int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;
socket_fd = socket(10,3,0x40000000);
connect(socket_fd , &addr,16);
AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.
This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.
kernel: Call Trace:
kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110
kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10
kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89
I found no particular commit which introduced this problem.
CVE: CVE-2015-8543
Cc: Cong Wang <cwang@twopensource.com>
Reported-by: 郭永刚 <guoyonggang@360.cn>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: open-code U8_MAX]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 9470e24f35 ]
SCTP is lacking proper np->opt cloning at accept() time.
TCP and DCCP use ipv6_dup_options() helper, do the same
in SCTP.
We might later factorize this code in a common helper to avoid
future mistakes.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 01ce63c901 ]
Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
related to disabling sock timestamp.
When SCTP accepts an association or peel one off, it copies sock flags
but forgot to call net_enable_timestamp() if a packet timestamping flag
was copied, leading to extra calls to net_disable_timestamp() whenever
such clones were closed.
The fix is to call net_enable_timestamp() whenever we copy a sock with
that flag on, like tcp does.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: SK_FLAGS_TIMESTAMP is newly defined]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f2a3771ae8 ]
atl1c driver is doing order-4 allocation with GFP_ATOMIC
priority. That often breaks networking after resume. Switch to
GFP_KERNEL. Still not ideal, but should be significantly better.
atl1c_setup_ring_resources() is called from .open() function, and
already uses GFP_KERNEL, so this change is safe.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 602dd62dfb ]
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 45f6fad84c ]
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop changes to l2tp
- Fix an additional use of np->opt in tcp_v6_send_synack()
- Fold in commit 43264e0bd9 ("ipv6: remove unnecessary codes in tcp_ipv6.c")
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0979e465c5 upstream.
opt always equals np->opts, so it is meaningless to define opt, and
check if opt does not equal np->opts and then try to free opt.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 264640fc2c ]
If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.
To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.
Note: similar patch has been already submitted by Yoshifuji Hideaki in
http://patchwork.ozlabs.org/patch/220979/
but got lost and forgotten for some reason.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6900317f5e ]
David and HacKurx reported a following/similar size overflow triggered
in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:
(Already fixed in later grsecurity versions by Brad and PaX Team.)
[ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
[ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
[ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
[ 1002.296153] ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
[ 1002.296162] ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
[ 1002.296169] ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
[ 1002.296176] Call Trace:
[ 1002.296190] [<ffffffff818129ba>] dump_stack+0x45/0x57
[ 1002.296200] [<ffffffff8121f838>] report_size_overflow+0x38/0x60
[ 1002.296209] [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
[ 1002.296220] [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
[ 1002.296228] [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
[ 1002.296236] [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
[ 1002.296243] [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
[ 1002.296248] [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
[ 1002.296257] [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
[ 1002.296263] [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
[ 1002.296271] [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85
Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.
In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.
When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():
int cmlen = CMSG_LEN(i * sizeof(int)); <--- cmlen w/o tail-padding
err = put_user(SOL_SOCKET, &cm->cmsg_level);
if (!err)
err = put_user(SCM_RIGHTS, &cm->cmsg_type);
if (!err)
err = put_user(cmlen, &cm->cmsg_len);
if (!err) {
cmlen = CMSG_SPACE(i * sizeof(int)); <--- cmlen w/ 4 byte extra tail-padding
msg->msg_control += cmlen;
msg->msg_controllen -= cmlen; <--- iff no tail-padding space here ...
} ... wrap-around
F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.
In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.
In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.
Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).
Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!
Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).
Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad2
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").
RFC3542, section 20.2. says:
The fields shown as "XX" are possible padding, between the cmsghdr
structure and the data, and between the data and the next cmsghdr
structure, if required by the implementation. While sending an
application may or may not include padding at the end of last
ancillary data in msg_controllen and implementations must accept both
as valid. On receiving a portable application must provide space for
padding at the end of the last ancillary data as implementations may
copy out the padding at the end of the control message buffer and
include it in the received msg_controllen. When recvmsg() is called
if msg_controllen is too small for all the ancillary data items
including any trailing padding after the last item an implementation
may set MSG_CTRUNC.
Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad2 to avoid an overflow.
Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).
No Fixes tag this time as pre 2002 (that is, pre history tree).
Reported-by: David Sterba <dave@jikos.cz>
Reported-by: HacKurx <hackurx@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 142a2e7ece ]
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :
WARN_ON(tp->copied_seq != tp->rcv_nxt &&
!(flags & (MSG_PEEK | MSG_TRUNC)));
His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.
Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d33fa1059 upstream.
According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't. Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.
This bug was found by strace test suite.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0d777df5d8 upstream.
Currently at the beginning of hugetlb_fault(), we call huge_pte_offset()
and check whether the obtained *ptep is a migration/hwpoison entry or
not. And if not, then we get to call huge_pte_alloc(). This is racy
because the *ptep could turn into migration/hwpoison entry after the
huge_pte_offset() check. This race results in BUG_ON in
huge_pte_alloc().
We don't have to call huge_pte_alloc() when the huge_pte_offset()
returns non-NULL, so let's fix this bug with moving the code into else
block.
Note that the *ptep could turn into a migration/hwpoison entry after
this block, but that's not a problem because we have another
!pte_present check later (we never go into hugetlb_no_page() in that
case.)
Fixes: 290408d4a2 ("hugetlb: hugepage migration core")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 373ccbe592 upstream.
Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.
The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context. If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much. WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation. This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.
In order to fix this issue we need to do two things. First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep. In order to prevent from issues handled by 0e093d9976
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.
The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e46e31a369 upstream.
When using the Promise TX2+ SATA controller on PA-RISC, the system often
crashes with kernel panic, for example just writing data with the dd
utility will make it crash.
Kernel panic - not syncing: drivers/parisc/sba_iommu.c: I/O MMU @ 000000000000a000 is out of mapping resources
CPU: 0 PID: 18442 Comm: mkspadfs Not tainted 4.4.0-rc2 #2
Backtrace:
[<000000004021497c>] show_stack+0x14/0x20
[<0000000040410bf0>] dump_stack+0x88/0x100
[<000000004023978c>] panic+0x124/0x360
[<0000000040452c18>] sba_alloc_range+0x698/0x6a0
[<0000000040453150>] sba_map_sg+0x260/0x5b8
[<000000000c18dbb4>] ata_qc_issue+0x264/0x4a8 [libata]
[<000000000c19535c>] ata_scsi_translate+0xe4/0x220 [libata]
[<000000000c19a93c>] ata_scsi_queuecmd+0xbc/0x320 [libata]
[<0000000040499bbc>] scsi_dispatch_cmd+0xfc/0x130
[<000000004049da34>] scsi_request_fn+0x6e4/0x970
[<00000000403e95a8>] __blk_run_queue+0x40/0x60
[<00000000403e9d8c>] blk_run_queue+0x3c/0x68
[<000000004049a534>] scsi_run_queue+0x2a4/0x360
[<000000004049be68>] scsi_end_request+0x1a8/0x238
[<000000004049de84>] scsi_io_completion+0xfc/0x688
[<0000000040493c74>] scsi_finish_command+0x17c/0x1d0
The cause of the crash is not exhaustion of the IOMMU space, there is
plenty of free pages. The function sba_alloc_range is called with size
0x11000, thus the pages_needed variable is 0x11. The function
sba_search_bitmap is called with bits_wanted 0x11 and boundary size is
0x10 (because dma_get_seg_boundary(dev) returns 0xffff).
The function sba_search_bitmap attempts to allocate 17 pages that must not
cross 16-page boundary - it can't satisfy this requirement
(iommu_is_span_boundary always returns true) and fails even if there are
many free entries in the IOMMU space.
How did it happen that we try to allocate 17 pages that don't cross
16-page boundary? The cause is in the function iommu_coalesce_chunks. This
function tries to coalesce adjacent entries in the scatterlist. The
function does several checks if it may coalesce one entry with the next,
one of those checks is this:
if (startsg->length + dma_len > max_seg_size)
break;
When it finishes coalescing adjacent entries, it allocates the mapping:
sg_dma_len(contig_sg) = dma_len;
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
sg_dma_address(contig_sg) =
PIDE_FLAG
| (iommu_alloc_range(ioc, dev, dma_len) << IOVP_SHIFT)
| dma_offset;
It is possible that (startsg->length + dma_len > max_seg_size) is false
(we are just near the 0x10000 max_seg_size boundary), so the funcion
decides to coalesce this entry with the next entry. When the coalescing
succeeds, the function performs
dma_len = ALIGN(dma_len + dma_offset, IOVP_SIZE);
And now, because of non-zero dma_offset, dma_len is greater than 0x10000.
iommu_alloc_range (a pointer to sba_alloc_range) is called and it attempts
to allocate 17 pages for a device that must not cross 16-page boundary.
To fix the bug, we must make sure that dma_len after addition of
dma_offset and alignment doesn't cross the segment boundary. I.e. change
if (startsg->length + dma_len > max_seg_size)
break;
to
if (ALIGN(dma_len + dma_offset + startsg->length, IOVP_SIZE) > max_seg_size)
break;
This patch makes this change (it precalculates max_seg_boundary at the
beginning of the function iommu_coalesce_chunks). I also added a check
that the mapping length doesn't exceed dma_get_seg_boundary(dev) (it is
not needed for Promise TX2+ SATA, but it may be needed for other devices
that have dma_get_seg_boundary lower than dma_get_max_seg_size).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f5bd30818 upstream.
There are few defects in vga_get() related to signal hadning:
- we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
case;
- if we found pending signal we must remove ourself from wait queue
and change task state back to running;
- -ERESTARTSYS is more appropriate, I guess.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed8b45a367 upstream.
If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.
Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
device-mapper: bufio: leaked buffer 3, hold count 1, list 0
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27f972d3e0 upstream.
We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.
static int smi_start_processing(void *send_info,
ipmi_smi_t intf)
{
/* Try to claim any interrupts. */
if (new_smi->irq_setup)
new_smi->irq_setup(new_smi);
--> IRQ arrives here and irq handler tries to modify uninitialized timer
which triggers BUG_ON(!timer->function) in __mod_timer().
Call Trace:
<IRQ>
[<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
[<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
[<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
[<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
[<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
[<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
[<ffffffff810f245e>] handle_edge_irq+0xde/0x180
[<ffffffff8100fc59>] handle_irq+0x49/0xa0
[<ffffffff8154643c>] do_IRQ+0x6c/0xf0
[<ffffffff8100ba53>] ret_from_intr+0x0/0x11
/* Set up the timer that drives the interface. */
setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);
The following patch fixes the problem.
To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ad7862844 upstream.
For block devices the pagecache is associated with the inode
on bdevfs, not with the aliasing ones on the mountable filesystems.
The latter have its own ->i_data empty and ->i_mapping pointing
to the (unique per major/minor) bdevfs inode. That guarantees
cache coherence between all block device inodes with the same
device number.
Eviction of an alias inode has no business trying to evict the
pages belonging to bdevfs one; moreover, ->i_mapping is only
safe to access when the thing is opened. At the time of
->evict_inode() the victim is definitely *not* opened. We are
about to kill the address space embedded into struct inode
(inode->i_data) and that's what we need to empty of any pages.
9p instance tries to empty inode->i_mapping instead, which is
both unsafe and bogus - if we have several device nodes with
the same device number in different places, closing one of them
should not try to empty the (shared) page cache.
Fortunately, other instances in the tree are OK; they are
evicting from &inode->i_data instead, as 9p one should.
Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Tested-by: "Suzuki K. Poulose" <Suzuki.Poulose@arm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a74a821624 upstream.
rme96 driver needs to reset DAC depending on the sample rate, and this
results in resetting to the max volume suddenly. It's because of the
missing call of snd_rme96_apply_dac_volume().
However, calling this function right after the DAC reset still may not
work, and we need some delay before this call. Since the DAC reset
and the procedure after that are performed in the spinlock, we delay
the DAC volume restore at the end after the spinlock.
Reported-and-tested-by: Sylvain LABOISNE <maeda1@free.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 096b110a3d upstream.
if a full speed hub connects to a high speed hub which
supports MTT, the MTT field of its slot context will be set
to 1 when xHCI driver setups an xHCI virtual device in
xhci_setup_addressable_virt_dev(); once usb core fetch its
hub descriptor, and need to update the xHC's internal data
structures for the device, the HUB field of its slot context
will be set to 1 too, meanwhile MTT is also set before,
this will cause configure endpoint command fail, so in the
case, we should clear MTT to 0 for full speed hub according
to section 6.2.2
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8295c69925 upstream.
root_domain::rto_mask allocated through alloc_cpumask_var()
contains garbage data, this may cause problems. For instance,
When doing pull_rt_task(), it may do useless iterations if
rto_mask retains some extra garbage bits. Worse still, this
violates the isolated domain rule for clustered scheduling
using cpuset, because the tasks(with all the cpus allowed)
belongs to one root domain can be pulled away into another
root domain.
The patch cleans the garbage by using zalloc_cpumask_var()
instead of alloc_cpumask_var() for root_domain::rto_mask
allocation, thereby addressing the issues.
Do the same thing for root_domain's other cpumask memembers:
dlo_mask, span, and online.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1449057179-29321-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- There's no dlo_mask to initialise
- Adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0af2e538c upstream.
A client calling drmSetMaster() using a file descriptor that was opened
when another client was master would inherit the latter client's master
object and all its authenticated clients.
This is unwanted behaviour, and when this happens, instead allocate a
brand new master object for the client calling drmSetMaster().
Fixes a BUG() throw in vmw_master_set().
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2:
- s/master_mutex/struct_mutex/
- drm_new_set_master() must drop struct_mutex while calling
drm_driver::master_create
- Adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9a37110d20 upstream.
An interface may need to assert a lock invariant and not flood the
system logs; add a lockdep helper macro equivalent to
lockdep_assert_held() which only WARNs once.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4eba7bb1d7 upstream.
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a53 ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a53 "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30ce6e1cc5 upstream.
The block allocated at the start of btree_split_sibling() is never
released if later insert_at() fails.
Fix this by releasing the previously allocated bufio block using
unlock_block().
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5377adb092 upstream.
usb_parse_ss_endpoint_companion() now decodes the burst multiplier
correctly in order to check that it's <= 3, but still uses the wrong
expression if warning that it's > 3.
Fixes: ff30cbc8da ("usb: Use the USB_SS_MULT() macro to get the ...")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f9fa1887dc upstream.
qset_fill_page_list() do not check for dma mapping errors.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee9159ddce upstream.
The N_X25 line discipline may access the previous line discipline's closed
and already-freed private data on open [1].
The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).
[1]
[ 634.336761] ==================================================================
[ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
[ 634.339558] Read of size 4 by task syzkaller_execu/8981
[ 634.340359] =============================================================================
[ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
...
[ 634.405018] Call Trace:
[ 634.405277] dump_stack (lib/dump_stack.c:52)
[ 634.405775] print_trailer (mm/slub.c:655)
[ 634.406361] object_err (mm/slub.c:662)
[ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
[ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
[ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
[ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
[ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
[ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
[ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
[ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
[ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d98f1cd0a3 upstream.
When I connect an Intel SSD to SATA SIL controller (PCI ID 1095:3114), any
TRIM command results in I/O errors being reported in the log. There is
other similar error reported with TRIM and the SIL controller:
https://bugs.centos.org/view.php?id=5880
Apparently the controller doesn't support TRIM commands. This patch
disables TRIM support on the SATA SIL controller.
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata7.00: BMDMA2 stat 0x50001
ata7.00: failed command: DATA SET MANAGEMENT
ata7.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 0 dma 512 out
res 51/04:01:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
ata7.00: status: { DRDY ERR }
ata7.00: error: { ABRT }
ata7.00: device reported invalid CHS sector 0
sd 8:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 8:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current] [descriptor]
sd 8:0:0:0: [sdb] tag#0 Add. Sense: Unaligned write command
sd 8:0:0:0: [sdb] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 21 95 88 00 20 00 00 00 00
blk_update_request: I/O error, dev sdb, sector 2200968
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 023113d24e upstream.
Current code doesn't update port value of Port Multiplier(PM) when
sending FIS of softreset to device, command will fail if FBS is
enabled.
There are two ways to fix the issue: the first is to disable FBS
before sending softreset command to PM device and the second is
to update port value of PM when sending command.
For the first way, i can't find any related rule in AHCI Spec. The
second way can avoid disabling FBS and has better performance.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 025af189fb upstream.
In ttm_write_lock(), the uninterruptible path should call
__ttm_write_lock() not __ttm_read_lock(). This fixes a vmwgfx hang
on F23 start up.
syeh: Extracted this from one of Thomas' internal patches.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c812012f9c upstream.
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.
Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c7188b234 upstream.
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket. The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket. This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().
Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.
I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.
Complete earlier incomplete fix to CVE-2015-6937:
74e98eb085 ("RDS: verify the underlying transport exists before creating a connection")
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bc23f0c8d7 upstream.
Ted and Namjae have reported that truncated pages don't get timely
reclaimed after being truncated in data=journal mode. The following test
triggers the issue easily:
for (i = 0; i < 1000; i++) {
pwrite(fd, buf, 1024*1024, 0);
fsync(fd);
fsync(fd);
ftruncate(fd, 0);
}
The reason is that journal_unmap_buffer() finds that truncated buffers
are not journalled (jh->b_transaction == NULL), they are part of
checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
been already written out (!buffer_dirty(bh)). We clean such buffers but
we leave them in the checkpoint list. Since checkpoint transaction holds
a reference to the journal head, these buffers cannot be released until
the checkpoint transaction is cleaned up. And at that point we don't
call release_buffer_page() anymore so pages detached from mapping are
lingering in the system waiting for reclaim to find them and free them.
Fix the problem by removing buffers from transaction checkpoint lists
when journal_unmap_buffer() finds out they don't have to be there
anymore.
Reported-and-tested-by: Namjae Jeon <namjae.jeon@samsung.com>
Fixes: de1b794130
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4dad1ae24 upstream.
In ext4, the bottom two bits of {a,c,m}time_extra are used to extend
the {a,c,m}time fields, deferring the year 2038 problem to the year
2446.
When decoding these extended fields, for times whose bottom 32 bits
would represent a negative number, sign extension causes the 64-bit
extended timestamp to be negative as well, which is not what's
intended. This patch corrects that issue, so that the only negative
{a,c,m}times are those between 1901 and 1970 (as per 32-bit signed
timestamps).
Some older kernels might have written pre-1970 dates with 1,1 in the
extra bits. This patch treats those incorrectly-encoded dates as
pre-1970, instead of post-2311, until kernel 4.20 is released.
Hopefully by then e2fsck will have fixed up the bad data.
Also add a comment explaining the encoding of ext4's extra {a,c,m}time
bits.
Signed-off-by: David Turner <novalis@novalis.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Mark Harris <mh8928@yahoo.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b81f472a20 upstream.
Do not update the read stamp after swapping out the reader page from the
write buffer. If the reader page is swapped out of the buffer before an
event is written to it, then the read_stamp may get an out of date
timestamp, as the page timestamp is updated on the first commit to that
page.
rb_get_reader_page() only returns a page if it has an event on it, otherwise
it will return NULL. At that point, check if the page being returned has
events and has not been read yet. Then at that point update the read_stamp
to match the time stamp of the reader page.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c25a860d1 upstream.
Commit fcb26ec5b1 ("broadcom: move all PHY_ID's to header")
updated broadcom_tbl to use PHY_IDs, but incorrectly replaced 0x0143bca0
with PHY_ID_BCM5482 (making a duplicate entry, and completely omitting
the original). Fix that.
Fixes: fcb26ec5b1 ("broadcom: move all PHY_ID's to header")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c2489e07c0 upstream.
The following test program from Dmitry can cause softlockups or RCU
stalls as it copies 1GB from tmpfs into eventfd and we don't have any
scheduling point at that path in sendfile(2) implementation:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
Add cond_resched() into __splice_from_pipe() to fix the problem.
CC: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c725bfce79 upstream.
Commit 296291cdd1 (mm: make sendfile(2) killable) fixed an issue where
sendfile(2) was doing a lot of tiny writes into a filesystem and thus
was unkillable for a long time. However sendfile(2) can be (mis)used to
issue lots of writes into arbitrary file descriptor such as evenfd or
similar special file descriptors which never hit the standard filesystem
write path and thus are still unkillable. E.g. the following example
from Dmitry burns CPU for ~16s on my test system without possibility to
be killed:
int r1 = eventfd(0, 0);
int r2 = memfd_create("", 0);
unsigned long n = 1<<30;
fallocate(r2, 0, 0, n);
sendfile(r1, r2, 0, n);
There are actually quite a few tests for pending signals in sendfile
code however we data to write is always available none of them seems to
trigger. So fix the problem by adding a test for pending signal into
splice_from_pipe_next() also before the loop waiting for pipe buffers to
be available. This should fix all the lockup issues with sendfile of the
do-ton-of-tiny-writes nature.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ebf7f10d6 upstream.
The thing got broken back in 2002 - sysvfs does *not* have inline
symlinks; even short ones have bodies stored in the first block
of file. sysv_symlink() handles that correctly; unfortunately,
attempting to look an existing symlink up will end up confusing
them for inline symlinks, and interpret the block number containing
the body as the body itself.
Nobody has noticed until now, which says something about the level
of testing sysvfs gets ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- Adjust context
- Also delete unused sysv_fast_symlink_inode_operations]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7d267278a9 upstream.
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.
Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f94 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f33a7f72e5 upstream.
Some modems, such as the Telit UE910, are using an Infineon Flash Loader
utility. It has two interfaces, 2/2/0 (Abstract Modem) and 10/0/0 (CDC
Data). The latter can be used as a serial interface to upgrade the
firmware of the modem. However, that isn't possible when the cdc-acm
driver takes control of the device.
The following is an explanation of the behaviour by Daniele Palmas during
discussion on linux-usb.
"This is what happens when the device is turned on (without modifying
the drivers):
[155492.352031] usb 1-3: new high-speed USB device number 27 using ehci-pci
[155492.485429] usb 1-3: config 1 interface 0 altsetting 0 endpoint 0x81 has an invalid bInterval 255, changing to 11
[155492.485436] usb 1-3: New USB device found, idVendor=058b, idProduct=0041
[155492.485439] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[155492.485952] cdc_acm 1-3:1.0: ttyACM0: USB ACM device
This is the flashing device that is caught by the cdc-acm driver. Once
the ttyACM appears, the application starts sending a magic string
(simple write on the file descriptor) to keep the device in flashing
mode. If this magic string is not properly received in a certain time
interval, the modem goes on in normal operative mode:
[155493.748094] usb 1-3: USB disconnect, device number 27
[155494.916025] usb 1-3: new high-speed USB device number 28 using ehci-pci
[155495.059978] usb 1-3: New USB device found, idVendor=1bc7, idProduct=0021
[155495.059983] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[155495.059986] usb 1-3: Product: 6 CDC-ACM + 1 CDC-ECM
[155495.059989] usb 1-3: Manufacturer: Telit
[155495.059992] usb 1-3: SerialNumber: 359658044004697
[155495.138958] cdc_acm 1-3:1.0: ttyACM0: USB ACM device
[155495.140832] cdc_acm 1-3:1.2: ttyACM1: USB ACM device
[155495.142827] cdc_acm 1-3:1.4: ttyACM2: USB ACM device
[155495.144462] cdc_acm 1-3:1.6: ttyACM3: USB ACM device
[155495.145967] cdc_acm 1-3:1.8: ttyACM4: USB ACM device
[155495.147588] cdc_acm 1-3:1.10: ttyACM5: USB ACM device
[155495.154322] cdc_ether 1-3:1.12 wwan0: register 'cdc_ether' at usb-0000:00:1a.7-3, Mobile Broadband Network Device, 00:00:11:12:13:14
Using the cdc-acm driver, the string, though being sent in the same way
than using the usb-serial-simple driver (I can confirm that the data is
passing properly since I used an hw usb sniffer), does not make the
device to stay in flashing mode."
Signed-off-by: Jonas Jonsson <jonas@ludd.ltu.se>
Tested-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7c90e610b6 upstream.
CP2110 ID (0x10c4, 0xea80) doesn't belong here because it's a HID
and completely different from CP210x devices.
Signed-off-by: Konstantin Shkolnyy <konstantin.shkolnyy@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cecd9ab80 upstream.
According to SJA1000 data sheet error-warning (EI) interrupt is not
cleared by setting the controller in to reset-mode.
Then if we have the following case:
- system is suspended (echo mem > /sys/power/state) and SJA1000 is left
in operating state
- A bus error condition occurs which activates EI interrupt, system is
still suspended which means EI interrupt will be not be handled nor
cleared.
If the above two events occur, on resume there is no way to return the
SJA1000 to operating state, except to cycle power to it.
By simply reading the IR register on start we will clear any previous
conditions that could be present.
Signed-off-by: Mirza Krak <mirza.krak@hostmobility.com>
Reported-by: Christian Magnusson <Christian.Magnusson@semcon.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: s/SJA1000_IR/REG_IR/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4c6980462f upstream.
Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.
Fixes: 8229efdaef ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 02e2a5bfeb upstream.
If md->signature == MAC_DRIVER_MAGIC and md->block_size == 1023, a single
512 byte sector would be read (secsize / 512). However the partition
structure would be located past the end of the buffer (secsize % 512).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 19cd80a214 upstream.
It is not permitted to set task state before lock. usblp_wwait sets
the state to TASK_INTERRUPTIBLE and calls mutex_lock_interruptible.
Upon return from that function, the state will be TASK_RUNNING again.
This is clearly a bug and a warning is generated with LOCKDEP too:
WARNING: CPU: 1 PID: 5109 at kernel/sched/core.c:7404 __might_sleep+0x7d/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffa0c588d0>] usblp_wwait+0xa0/0x310 [usblp]
Modules linked in: ...
CPU: 1 PID: 5109 Comm: captmon Tainted: G W 4.2.5-0.gef2823b-default #1
Hardware name: LENOVO 23252SG/23252SG, BIOS G2ET33WW (1.13 ) 07/24/2012
ffffffff81a4edce ffff880236ec7ba8 ffffffff81716651 0000000000000000
ffff880236ec7bf8 ffff880236ec7be8 ffffffff8106e146 0000000000000282
ffffffff81a50119 000000000000028b 0000000000000000 ffff8802dab7c508
Call Trace:
...
[<ffffffff8106e1c6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff8109a8bd>] __might_sleep+0x7d/0x90
[<ffffffff8171b20f>] mutex_lock_interruptible_nested+0x2f/0x4b0
[<ffffffffa0c588fc>] usblp_wwait+0xcc/0x310 [usblp]
[<ffffffffa0c58bb2>] usblp_write+0x72/0x350 [usblp]
[<ffffffff8121ed98>] __vfs_write+0x28/0xf0
...
Commit 7f477358e2 (usblp: Implement the
ENOSPC convention) moved the set prior locking. So move it back after
the lock.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: 7f477358e2 ("usblp: Implement the ENOSPC convention")
Acked-By: Pete Zaitcev <zaitcev@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 638148e20c upstream.
Thomas reports
"
4gsystems sells two total different LTE-surfsticks under the same name.
..
The newer version of XS Stick W100 is from "omega"
..
Under windows the driver switches to the same ID, and uses MI03\6 for
network and MI01\6 for modem.
..
echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
T: Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#= 4 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1c9e ProdID=9b01 Rev=02.32
S: Manufacturer=USB Modem
S: Product=USB Modem
S: SerialNumber=
C: #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I: If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
Now all important things are there:
wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at)
There is also ttyUSB0, but it is not usable, at least not for at.
The device works well with qmi and ModemManager-NetworkManager.
"
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a596439619 upstream.
Existing Intel xHCI controllers require a delay of 1 mS,
after setting the CMD_RESET bit in command register, before
accessing any HC registers. This allows the HC to complete
the reset operation and be ready for HC register access.
Without this delay, the subsequent HC register access,
may result in a system hang, very rarely.
Verified CherryView / Braswell platforms go through over
5000 warm reboot cycles (which was not possible without
this patch), without any xHCI reset hang.
Signed-off-by: Rajmohan Mani <rajmohan.mani@intel.com>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e639b8d8a7 upstream.
Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.
Fixes: 8a4eb5734e ("net: introduce rx_handler results and logic around that")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c2e703a552 upstream.
When using call_rcu(), the called function may be delayed quite
significantly, and without a matching rcu_barrier() there's no
way to be sure it has finished.
Therefore, global state that could be gone/freed/reused should
never be touched in the callback.
Fix this in mesh by moving the atomic_dec() into the caller;
that's not really a problem since we already unlinked the path
and it will be destroyed anyway.
This fixes a crash Jouni observed when running certain tests in
a certain order, in which the mesh interface was torn down, the
memory reused for a function pointer (work struct) and running
that then crashed since the pointer had been decremented by 1,
resulting in an invalid instruction byte stream.
Fixes: eb2b9311fd ("mac80211: mesh path table implementation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cf89752645 upstream.
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
this function
If the jump to label "error" is taken, "ret" will indeed be
uninitialized, and random stack data may be printed by the debug code.
Fixes: 102f4d900c ("FS-Cache: Handle a write to the page immediately beyond the EOF marker")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00ee592717 upstream.
If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.
Fixes: 6cb6a27c45 ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9f96bc53c upstream.
From datasheet:
R17408 (4400h) HPF_C_1
R17409 (4401h) HPF_C_0
17048 -> 17408 (0x4400)
17049 -> 17409 (0x4401)
Signed-off-by: Sachin Pandhare <sachinpandhare@gmail.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: wm8962_reg is a sparse array, not an array of
{ offset, value } pairs]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 705e63d2b2 upstream.
There is a bit of a mess in the order of arguments to the ulpi write
callback. There is
int ulpi_write(struct ulpi *ulpi, u8 addr, u8 val)
in drivers/usb/common/ulpi.c;
struct usb_phy_io_ops {
...
int (*write)(struct usb_phy *x, u32 val, u32 reg);
}
in include/linux/usb/phy.h.
The callback registered by the musb driver has to comply to the latter,
but up to now had "offset" first which effectively made the function
broken for correct users. So flip the order and while at it also
switch to the parameter names of struct usb_phy_io_ops's write.
Fixes: ffb865b1e4 ("usb: musb: add ulpi access operations")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1bcb49e663 upstream.
The Honeywell HGI80 is a wireless interface to the evohome connected
thermostat. It uses a TI 3410 USB-serial port.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context; update array sizes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 35a2fbc941 ("USB: serial: ti_usb_3410_5052: new device id for
Abbot strip port cable") failed to update the size of the
ti_id_table_3410 array. This doesn't need to be fixed upstream
following commit d7ece6515e ("USB: ti_usb_3410_5052: remove
vendor/product module parameters") but should be fixed in stable
branches older than 3.12.
Backports of commit c9d09dc7ad ("USB: serial: ti_usb_3410_5052: add
Abbott strip port ID to combined table as well.") similarly failed to
update the size of the ti_id_table_combined array.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9d09dc7ad upstream.
Without this change, the USB cable for Freestyle Option and compatible
glucometers will not be detected by the driver.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e07af133c3 upstream.
Also known as Verizon U620L.
The device is modeswitched from 1410:9020 to 1410:9022 by selecting the
4th USB configuration:
$ sudo usb_modeswitch –v 0x1410 –p 0x9020 –u 4
This configuration provides a ECM interface as well as TTYs ('Enterprise
Mode' according to the U620 Linux integration guide).
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a91e627e3f upstream.
One of the many faults of the QinHeng CH345 USB MIDI interface chip is
that it does not handle received SysEx messages correctly -- every second
event packet has a wrong code index number, which is the one from the last
seen message, instead of 4. For example, the two messages "FE F0 01 02 03
04 05 06 07 08 09 0A 0B 0C 0D 0E F7" result in the following event
packets:
correct: CH345:
0F FE 00 00 0F FE 00 00
04 F0 01 02 04 F0 01 02
04 03 04 05 0F 03 04 05
04 06 07 08 04 06 07 08
04 09 0A 0B 0F 09 0A 0B
04 0C 0D 0E 04 0C 0D 0E
05 F7 00 00 05 F7 00 00
A class-compliant driver must interpret an event packet with CIN 15 as
having a single data byte, so the other two bytes would be ignored. The
message received by the host would then be missing two bytes out of six;
in this example, "F0 01 02 03 06 07 08 09 0C 0D 0E F7".
These corrupted SysEx event packages contain only data bytes, while the
CH345 uses event packets with a correct CIN value only for messages with
a status byte, so it is possible to distinguish between these two cases by
checking for the presence of this status byte.
(Other bugs in the CH345's input handling, such as the corruption resulting
from running status, cannot be worked around.)
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1ca8b20130 upstream.
The CH345 USB MIDI chip has two output ports. However, they are
multiplexed through one pin, and the number of ports cannot be reduced
even for hardware that implements only one connector, so for those
devices, data sent to either port ends up on the same hardware output.
This becomes a problem when both ports are used at the same time, as
longer MIDI commands (such as SysEx messages) are likely to be
interrupted by messages from the other port, and thus to get lost.
It would not be possible for the driver to detect how many ports the
device actually has, except that in practice, _all_ devices built with
the CH345 have only one port. So we can just ignore the device's
descriptors, and hardcode one output port.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed5a377d87 upstream.
now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.
even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():
if (id > SCTP_AUTH_HMAC_ID_MAX)
return -EOPNOTSUPP;
so this wasn't working before and thus it won't break compatibility.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3ca8138f01 upstream.
I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.
Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.
A similar problem is described in 124d3b7041 ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.
Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.
Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Maxim Patlasov <mpatlasov@parallels.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ff28d9f46 upstream.
Using sendfile with below small program to get MD5 sums of some files,
it appear that big files (over 64kbytes with 4k pages system) get a
wrong MD5 sum while small files get the correct sum.
This program uses sendfile() to send a file to an AF_ALG socket
for hashing.
/* md5sum2.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <linux/if_alg.h>
int main(int argc, char **argv)
{
int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct stat st;
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "hash",
.salg_name = "md5",
};
int n;
bind(sk, (struct sockaddr*)&sa, sizeof(sa));
for (n = 1; n < argc; n++) {
int size;
int offset = 0;
char buf[4096];
int fd;
int sko;
int i;
fd = open(argv[n], O_RDONLY);
sko = accept(sk, NULL, 0);
fstat(fd, &st);
size = st.st_size;
sendfile(sko, fd, &offset, size);
size = read(sko, buf, sizeof(buf));
for (i = 0; i < size; i++)
printf("%2.2x", buf[i]);
printf(" %s\n", argv[n]);
close(fd);
close(sko);
}
exit(0);
}
Test below is done using official linux patch files. First result is
with a software based md5sum. Second result is with the program above.
root@vgoip:~# ls -l patch-3.6.*
-rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz
-rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz
root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz
root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz
After investivation, it appears that sendfile() sends the files by blocks
of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
is reset as if it was the end of the file.
This patch adds SPLICE_F_MORE to the flags when more data is pending.
With the patch applied, we get the correct sums:
root@vgoip:~# md5sum patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz
root@vgoip:~# ./md5sum2 patch-3.6.*
b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz
c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8fa677d270 ]
Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8ce675ff39 ]
Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.
Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fbb1816942 upstream.
It was possible for an attacking user to trick root (or another user) into
writing his coredumps into an attacker-readable, pre-existing file using
rename() or link(), causing the disclosure of secret data from the victim
process' virtual memory. Depending on the configuration, it was also
possible to trick root into overwriting system files with coredumps. Fix
that issue by never writing coredumps into existing files.
Requirements for the attack:
- The attack only applies if the victim's process has a nonzero
RLIMIT_CORE and is dumpable.
- The attacker can trick the victim into coredumping into an
attacker-writable directory D, either because the core_pattern is
relative and the victim's cwd is attacker-writable or because an
absolute core_pattern pointing to a world-writable directory is used.
- The attacker has one of these:
A: on a system with protected_hardlinks=0:
execute access to a folder containing a victim-owned,
attacker-readable file on the same partition as D, and the
victim-owned file will be deleted before the main part of the attack
takes place. (In practice, there are lots of files that fulfill
this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
This does not apply to most Linux systems because most distros set
protected_hardlinks=1.
B: on a system with protected_hardlinks=1:
execute access to a folder containing a victim-owned,
attacker-readable and attacker-writable file on the same partition
as D, and the victim-owned file will be deleted before the main part
of the attack takes place.
(This seems to be uncommon.)
C: on any system, independent of protected_hardlinks:
write access to a non-sticky folder containing a victim-owned,
attacker-readable file on the same partition as D
(This seems to be uncommon.)
The basic idea is that the attacker moves the victim-owned file to where
he expects the victim process to dump its core. The victim process dumps
its core into the existing file, and the attacker reads the coredump from
it.
If the attacker can't move the file because he does not have write access
to the containing directory, he can instead link the file to a directory
he controls, then wait for the original link to the file to be deleted
(because the kernel checks that the link count of the corefile is 1).
A less reliable variant that requires D to be non-sticky works with link()
and does not require deletion of the original link: link() the file into
D, but then unlink() it directly before the kernel performs the link count
check.
On systems with protected_hardlinks=0, this variant allows an attacker to
not only gain information from coredumps, but also clobber existing,
victim-writable files with coredumps. (This could theoretically lead to a
privilege escalation.)
Signed-off-by: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9520628e8c upstream.
When the suid_dumpable sysctl is set to "2", and there is no core dump
pipe defined in the core_pattern sysctl, a local user can cause core files
to be written to root-writable directories, potentially with
user-controlled content.
This means an admin can unknowningly reintroduce a variation of
CVE-2006-2451, allowing local users to gain root privileges.
$ cat /proc/sys/fs/suid_dumpable
2
$ cat /proc/sys/kernel/core_pattern
core
$ ulimit -c unlimited
$ cd /
$ ls -l core
ls: cannot access core: No such file or directory
$ touch core
touch: cannot touch `core': Permission denied
$ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
$ pid=$!
$ sleep 1
$ kill -SEGV $pid
$ ls -l core
-rw------- 1 root kees 458752 Jun 21 11:35 core
$ sudo strings core | grep evil
OHAI=evil-string-here
While cron has been fixed to abort reading a file when there is any
parse error, there are still other sensitive directories that will read
any file present and skip unparsable lines.
Instead of introducing a suid_dumpable=3 mode and breaking all users of
mode 2, this only disables the unsafe portion of mode 2 (writing to disk
via relative path). Most users of mode 2 (e.g. Chrome OS) already use
a core dump pipe handler, so this change will not break them. For the
situations where a pipe handler is not defined but mode 2 is still
active, crash dumps will only be written to fully qualified paths. If a
relative path is defined (e.g. the default "core" pattern), dump
attempts will trigger a printk yelling about the lack of a fully
qualified path.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: James Morris <james.l.morris@oracle.com>
commit b582ef5c53 upstream.
Do not clobber the buffer space passed from `search_binary_handler' and
originally preloaded by `prepare_binprm' with the executable's file
header by overwriting it with its interpreter's file header. Instead
keep the buffer space intact and directly use the data structure locally
allocated for the interpreter's file header, fixing a bug introduced in
2.1.14 with loadable module support (linux-mips.org commit beb11695
[Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
Adjust the amount of data read from the interpreter's file accordingly.
This was not an issue before loadable module support, because back then
`load_elf_binary' was executed only once for a given ELF executable,
whether the function succeeded or failed.
With loadable module support supported and enabled, upon a failure of
`load_elf_binary' -- which may for example be caused by architecture
code rejecting an executable due to a missing hardware feature requested
in the file header -- a module load is attempted and then the function
reexecuted by `search_binary_handler'. With the executable's file
header replaced with its interpreter's file header the executable can
then be erroneously accepted in this subsequent attempt.
Signed-off-by: Maciej W. Rozycki <macro@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 102f4d900c upstream.
Handle a write being requested to the page immediately beyond the EOF
marker on a cache object. Currently this gets an assertion failure in
CacheFiles because the EOF marker is used there to encode information about
a partial page at the EOF - which could lead to an unknown blank spot in
the file if we extend the file over it.
The problem is actually in fscache where we check the index of the page
being written against store_limit. store_limit is set to the number of
pages that we're allowed to store by fscache_set_store_limit() - which
means it's one more than the index of the last page we're allowed to store.
The problem is that we permit writing to a page with an index _equal_ to
the store limit - when we should reject that case.
Whilst we're at it, change the triggered assertion in CacheFiles to just
return -ENOBUFS instead.
The assertion failure looks something like this:
CacheFiles: Assertion failed
1000 < 7b1 is false
------------[ cut here ]------------
kernel BUG at fs/cachefiles/rdwr.c:962!
...
RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: we don't have __kernel_write() so keep using the
open-coded equivalent]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 86108c2e34 upstream.
If netfs exist, fscache should not increase the reference of parent's
usage and n_children, otherwise, never be decreased.
v2: thanks David's suggest,
move increasing reference of parent if success
use kmem_cache_free() freeing primary_index directly
v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cbdb967af3 upstream.
This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).
VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2, with thanks to Paolo:
- update_db_bp_intercept() was called update_db_intercept()
- The remaining call is in svm_guest_debug() rather than through svm_x86_ops]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d69bbf88c8 upstream.
Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.
Otherwise an other cpu might already have freed the dst.
Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f1cd1f0b7d upstream.
When listing a inode's xattrs we have a time window where we race against
a concurrent operation for adding a new hard link for our inode that makes
us not return any xattr to user space. In order for this to happen, the
first xattr of our inode needs to be at slot 0 of a leaf and the previous
leaf must still have room for an inode ref (or extref) item, and this can
happen because an inode's listxattrs callback does not lock the inode's
i_mutex (nor does the VFS does it for us), but adding a hard link to an
inode makes the VFS lock the inode's i_mutex before calling the inode's
link callback.
If we have the following leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 XATTR_ITEM 12345), ... ]
slot N - 2 slot N - 1 slot 0
The race illustrated by the following sequence diagram is possible:
CPU 1 CPU 2
btrfs_listxattr()
searches for key (257 XATTR_ITEM 0)
gets path with path->nodes[0] == leaf X
and path->slots[0] == N
because path->slots[0] is >=
btrfs_header_nritems(leaf X), it calls
btrfs_next_leaf()
btrfs_next_leaf()
releases the path
adds key (257 INODE_REF 666)
to the end of leaf X (slot N),
and leaf X now has N + 1 items
searches for the key (257 INODE_REF 256),
with path->keep_locks == 1, because that
is the last key it saw in leaf X before
releasing the path
ends up at leaf X again and it verifies
that the key (257 INODE_REF 256) is no
longer the last key in leaf X, so it
returns with path->nodes[0] == leaf X
and path->slots[0] == N, pointing to
the new item with key (257 INODE_REF 666)
btrfs_listxattr's loop iteration sees that
the type of the key pointed by the path is
different from the type BTRFS_XATTR_ITEM_KEY
and so it breaks the loop and stops looking
for more xattr items
--> the application doesn't get any xattr
listed for our inode
So fix this by breaking the loop only if the key's type is greater than
BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[bwh: Backported to 3.2: old code used the trivial accessor btrfs_key_type()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 863e02d0e1 upstream.
Writing a number to /sys/bus/scsi/devices/<sdev>/queue_ramp_up_period
returns the value of that number instead of the number of bytes written.
This behavior can confuse programs expecting POSIX write() semantics.
Fix this by returning the number of bytes written instead.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d512cb77b upstream.
If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.
This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).
The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
(Note the implicit hole for inode 257 regarding the [0, 8K[ range)
CPU 1 CPU 2
run_dealloc_nocow()
btrfs_lookup_file_extent()
--> searches for a key with value
(257 EXTENT_DATA 4096) in the
fs/subvol tree
--> returns us a path with
path->nodes[0] == leaf X and
path->slots[0] == N
because path->slots[0] is >=
btrfs_header_nritems(leaf X), it
calls btrfs_next_leaf()
btrfs_next_leaf()
--> releases the path
hard link added to our inode,
with key (257 INODE_REF 500)
added to the end of leaf X,
so leaf X now has N + 1 keys
--> searches for the key
(257 INODE_REF 256), because
it was the last key in leaf X
before it released the path,
with path->keep_locks set to 1
--> ends up at leaf X again and
it verifies that the key
(257 INODE_REF 256) is no longer
the last key in the leaf, so it
returns with path->nodes[0] ==
leaf X and path->slots[0] == N,
pointing to the new item with
key (257 INODE_REF 500)
the loop iteration of run_dealloc_nocow()
does not break out the loop and continues
because the key referenced in the path
at path->nodes[0] and path->slots[0] is
for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
and its offset (500) is less then our delalloc
range's end (8192)
the item pointed by the path, an inode reference item,
is (incorrectly) interpreted as a file extent item and
we get an invalid extent type, leading to the BUG_ON(1):
if (extent_type == BTRFS_FILE_EXTENT_REG ||
extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
} else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
} else {
BUG_ON(1)
}
The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.
So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aeafbf8486 upstream.
While running a stress test I got the following warning triggered:
[191627.672810] ------------[ cut here ]------------
[191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
(...)
[191627.701485] Call Trace:
[191627.702037] [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[191627.702992] [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[191627.704091] [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[191627.705380] [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.706637] [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
[191627.707789] [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
[191627.709155] [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
[191627.712444] [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
[191627.714162] [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
[191627.715887] [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
[191627.717287] [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
[191627.728865] [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
[191627.730045] [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
[191627.731256] [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[191627.732661] [<ffffffff81061119>] process_one_work+0x24c/0x4ae
[191627.733822] [<ffffffff810615b0>] worker_thread+0x206/0x2c2
[191627.734857] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.736052] [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
[191627.737349] [<ffffffff810669a6>] kthread+0xef/0xf7
[191627.738267] [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
[191627.739330] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.741976] [<ffffffff81465592>] ret_from_fork+0x42/0x70
[191627.743080] [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
[191627.744206] ---[ end trace bbfddacb7aaada8d ]---
$ cat -n fs/btrfs/file.c
691 int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
(...)
758 btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
759 if (key.objectid > ino ||
760 key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
761 break;
762
763 fi = btrfs_item_ptr(leaf, path->slots[0],
764 struct btrfs_file_extent_item);
765 extent_type = btrfs_file_extent_type(leaf, fi);
766
767 if (extent_type == BTRFS_FILE_EXTENT_REG ||
768 extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
(...)
774 } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
(...)
778 } else {
779 WARN_ON(1);
780 extent_end = search_start;
781 }
(...)
This happened because the item we were processing did not match a file
extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
case we cast the item to a struct btrfs_file_extent_item pointer and
then find a type field value that does not match any of the expected
values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
due to a tiny time window where a race can happen as exemplified below.
For example, consider the following scenario where we're using the
NO_HOLES feature and we have the following two neighbour leafs:
Leaf X (has N items) Leaf Y
[ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ] [ (257 EXTENT_DATA 8192), ... ]
slot N - 2 slot N - 1 slot 0
Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
than explicit because NO_HOLES is enabled). Now if our inode has an
ordered extent for the range [4K, 8K[ that is finishing, the following
can happen:
CPU 1 CPU 2
btrfs_finish_ordered_io()
insert_reserved_file_extent()
__btrfs_drop_extents()
Searches for the key
(257 EXTENT_DATA 4096) through
btrfs_lookup_file_extent()
Key not found and we get a path where
path->nodes[0] == leaf X and
path->slots[0] == N
Because path->slots[0] is >=
btrfs_header_nritems(leaf X), we call
btrfs_next_leaf()
btrfs_next_leaf() releases the path
inserts key
(257 INODE_REF 4096)
at the end of leaf X,
leaf X now has N + 1 keys,
and the new key is at
slot N
btrfs_next_leaf() searches for
key (257 INODE_REF 256), with
path->keep_locks set to 1,
because it was the last key it
saw in leaf X
finds it in leaf X again and
notices it's no longer the last
key of the leaf, so it returns 0
with path->nodes[0] == leaf X and
path->slots[0] == N (which is now
< btrfs_header_nritems(leaf X)),
pointing to the new key
(257 INODE_REF 4096)
__btrfs_drop_extents() casts the
item at path->nodes[0], slot
path->slots[0], to a struct
btrfs_file_extent_item - it does
not skip keys for the target
inode with a type less than
BTRFS_EXTENT_DATA_KEY
(BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
sees a bogus value for the type
field triggering the WARN_ON in
the trace shown above, and sets
extent_end = search_start (4096)
does the if-then-else logic to
fixup 0 length extent items created
by a past bug from hole punching:
if (extent_end == key.offset &&
extent_end >= search_start)
goto delete_extent_item;
that evaluates to true and it ends
up deleting the key pointed to by
path->slots[0], (257 INODE_REF 4096),
from leaf X
The same could happen for example for a xattr that ends up having a key
with an offset value that matches search_start (very unlikely but not
impossible).
So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
skipped, never casted to struct btrfs_file_extent_item and never deleted
by accident. Also protect against the unexpected case of getting a key
for a lower inode number by skipping that key and issuing a warning.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[bwh: Backported to 3.2: drop use of ASSERT()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 04633df0c4 upstream.
When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.
For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.
A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:
# rdmsr -a 0x1a0
400850089
850089
850089
850089
I know, right?!
There's not even an off switch in there.
So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.
Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4099819356 upstream.
When dropping a lock while iterating a list we must restart the search
as other threads could have manipulated the list under us. Without this
we can get stuck in an endless loop. This bug was introduced by
commit bc3f02a795
Author: Dan Williams <djbw@fb.com>
Date: Tue Aug 28 22:12:10 2012 -0700
[SCSI] scsi_remove_target: fix softlockup regression on hot remove
Which was itself trying to fix a reported soft lockup issue
http://thread.gmane.org/gmane.linux.kernel/1348679
However, we believe even with this revert of the original patch, the soft
lockup problem has been fixed by
commit f2495e228f
Author: James Bottomley <JBottomley@Parallels.com>
Date: Tue Jan 21 07:01:41 2014 -0800
[SCSI] dual scan thread bug fix
Thanks go to Dan Williams <dan.j.williams@intel.com> for tracking all this
prior history down.
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Fixes: bc3f02a795
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 100ceb66d5 upstream.
Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
controllers: Often or even most of the time, the controller is
initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
0 IT contexts, quirks 0x10". With 0 isochronous transmit DMA contexts
(IT contexts), applications like audio output are impossible.
However, OHCI-1394 demands that at least 4 IT contexts are implemented
by the link layer controller, and indeed JMicron JMB38x do implement
four of them. Only their IsoXmitIntMask register is unreliable at early
access.
With my own JMB381 single function controller I found:
- I can reproduce the problem with a lower probability than Craig's.
- If I put a loop around the section which clears and reads
IsoXmitIntMask, then either the first or the second attempt will
return the correct initial mask of 0x0000000f. I never encountered
a case of needing more than a second attempt.
- Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
before the first write, the subsequent read will return the correct
result.
- If I merely ignore a wrong read result and force the known real
result, later isochronous transmit DMA usage works just fine.
So let's just fix this chip bug up by the latter method. Tested with
JMB381 on kernel 3.13 and 4.3.
Since OHCI-1394 generally requires 4 IT contexts at a minium, this
workaround is simply applied whenever the initial read of IsoXmitIntMask
returns 0, regardless whether it's a JMicron chip or not. I never heard
of this issue together with any other chip though.
I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
and JMB388 combo controllers exactly the same as on the JMB381 single-
function controller, but so far I haven't had a chance to let an owner
of a combo chip run a patched kernel.
Strangely enough, IsoRecvIntMask is always reported correctly, even
though it is probed right before IsoXmitIntMask.
Reported-by: Clifford Dunn
Reported-by: Craig Moore <craig.moore@qenos.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
[bwh: Backported to 3.2: log with fw_notify() instead of ohci_notice()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c932b98c1e upstream.
HP ProBook 6550b needs the same pin fixup applied to other HP B-series
laptops with docks for making its headphone and dock headphone jacks
working properly. We just need to add the codec SSID to the list.
Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebac62fe3d upstream.
Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.
Fixes: 73d605d1ab ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4afa5f9617 upstream.
The hash_accept call fails to work on sockets that have not received
any data. For some algorithm implementations it may cause crashes.
This patch fixes this by ensuring that we only export and import on
sockets that have received data.
Reported-by: Harsh Jain <harshjain.prof@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3c63795e9 upstream.
Commit 073db4a51e ("mtd: fix: avoid race condition when accessing
mtd->usecount") fixed a race condition but due to poor ordering of the
mutex acquisition, introduced a potential deadlock.
The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
will delete one or more MTDs, along with any corresponding mtdblock
devices. This could potentially race with an acquisition of the block
device as follows.
-> blktrans_open()
-> mutex_lock(&dev->lock);
-> mutex_lock(&mtd_table_mutex);
-> del_mtd_device()
-> mutex_lock(&mtd_table_mutex);
-> blktrans_notify_remove() -> del_mtd_blktrans_dev()
-> mutex_lock(&dev->lock);
This is a classic (potential) ABBA deadlock, which can be fixed by
making the A->B ordering consistent everywhere. There was no real
purpose to the ordering in the original patch, AFAIR, so this shouldn't
be a problem. This ordering was actually already present in
del_mtd_blktrans_dev(), for one, where the function tried to ensure that
its caller already held mtd_table_mutex before it acquired &dev->lock:
if (mutex_trylock(&mtd_table_mutex)) {
mutex_unlock(&mtd_table_mutex);
BUG();
}
So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
we always acquire mtd_table_mutex first.
Snippets of the lockdep output follow:
# modprobe -r m25p80
[ 53.419251]
[ 53.420838] ======================================================
[ 53.427300] [ INFO: possible circular locking dependency detected ]
[ 53.433865] 4.3.0-rc6 #96 Not tainted
[ 53.437686] -------------------------------------------------------
[ 53.444220] modprobe/372 is trying to acquire lock:
[ 53.449320] (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
[ 53.457271]
[ 53.457271] but task is already holding lock:
[ 53.463372] (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
[ 53.471321]
[ 53.471321] which lock already depends on the new lock.
[ 53.471321]
[ 53.479856]
[ 53.479856] the existing dependency chain (in reverse order) is:
[ 53.487660]
-> #1 (mtd_table_mutex){+.+.+.}:
[ 53.492331] [<c043fc5c>] blktrans_open+0x34/0x1a4
[ 53.497879] [<c01afce0>] __blkdev_get+0xc4/0x3b0
[ 53.503364] [<c01b0bb8>] blkdev_get+0x108/0x320
[ 53.508743] [<c01713c0>] do_dentry_open+0x218/0x314
[ 53.514496] [<c0180454>] path_openat+0x4c0/0xf9c
[ 53.519959] [<c0182044>] do_filp_open+0x5c/0xc0
[ 53.525336] [<c0172758>] do_sys_open+0xfc/0x1cc
[ 53.530716] [<c000f740>] ret_fast_syscall+0x0/0x1c
[ 53.536375]
-> #0 (&new->lock){+.+...}:
[ 53.540587] [<c063f124>] mutex_lock_nested+0x38/0x3cc
[ 53.546504] [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
[ 53.552606] [<c043f164>] blktrans_notify_remove+0x7c/0x84
[ 53.558891] [<c04399f0>] del_mtd_device+0x74/0x100
[ 53.564544] [<c043c670>] del_mtd_partitions+0x80/0xc8
[ 53.570451] [<c0439aa0>] mtd_device_unregister+0x24/0x48
[ 53.576637] [<c046ce6c>] spi_drv_remove+0x1c/0x34
[ 53.582207] [<c03de0f0>] __device_release_driver+0x88/0x114
[ 53.588663] [<c03de19c>] device_release_driver+0x20/0x2c
[ 53.594843] [<c03dd9e8>] bus_remove_device+0xd8/0x108
[ 53.600748] [<c03dacc0>] device_del+0x10c/0x210
[ 53.606127] [<c03dadd0>] device_unregister+0xc/0x20
[ 53.611849] [<c046d878>] __unregister+0x10/0x20
[ 53.617211] [<c03da868>] device_for_each_child+0x50/0x7c
[ 53.623387] [<c046eae8>] spi_unregister_master+0x58/0x8c
[ 53.629578] [<c03e12f0>] release_nodes+0x15c/0x1c8
[ 53.635223] [<c03de0f8>] __device_release_driver+0x90/0x114
[ 53.641689] [<c03de900>] driver_detach+0xb4/0xb8
[ 53.647147] [<c03ddc78>] bus_remove_driver+0x4c/0xa0
[ 53.652970] [<c00cab50>] SyS_delete_module+0x11c/0x1e4
[ 53.658976] [<c000f740>] ret_fast_syscall+0x0/0x1c
[ 53.664621]
[ 53.664621] other info that might help us debug this:
[ 53.664621]
[ 53.672979] Possible unsafe locking scenario:
[ 53.672979]
[ 53.679169] CPU0 CPU1
[ 53.683900] ---- ----
[ 53.688633] lock(mtd_table_mutex);
[ 53.692383] lock(&new->lock);
[ 53.698306] lock(mtd_table_mutex);
[ 53.704658] lock(&new->lock);
[ 53.707946]
[ 53.707946] *** DEADLOCK ***
Fixes: 073db4a51e ("mtd: fix: avoid race condition when accessing mtd->usecount")
Reported-by: Felipe Balbi <balbi@ti.com>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 562b103a21 upstream.
The sizeof() is invoked on an incorrect variable, likely due to some
copy-paste error, and this might result in memory corruption. Fix this.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2:
- Keep using the old NLA_PUT macro
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cadd16ea33 upstream.
We've had many reports that some Creative sound cards with CA0132
don't work well. Some reported that it starts working after reloading
the module, while some reported it starts working when a 32bit kernel
is used. All these facts seem implying that the chip fails to
communicate when the buffer is located in 64bit address.
This patch addresses these issues by just adding AZX_DCAPS_NO_64BIT
flag to the corresponding PCI entries. I casually had a chance to
test an SB Recon3D board, and indeed this seems helping.
Although this hasn't been tested on all Creative devices, it's safer
to assume that this restriction applies to the rest of them, too. So
the flag is applied to all Creative entries.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: drop the change to AZX_DCAPS_PRESET_CTHDA]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49e4b84333 upstream.
Currently when the system is trying to uninstall the ACPI interrupt
handler, it uses acpi_gbl_FADT.sci_interrupt as the IRQ number.
However, the IRQ number that the ACPI interrupt handled is installed
for comes from acpi_gsi_to_irq() and that is the number that should
be used for the handler removal.
Fix this problem by using the mapped IRQ returned from acpi_gsi_to_irq()
as appropriate.
Acked-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4327ba52af upstream.
If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option. But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.
Task A Task B
ext4_handle_error()
-> jbd2_journal_abort()
-> __journal_abort_soft()
-> __jbd2_journal_abort_hard()
| -> journal->j_flags |= JBD2_ABORT;
|
| __ext4_abort()
| -> jbd2_journal_abort()
| | -> __journal_abort_soft()
| | -> if (journal->j_flags & JBD2_ABORT)
| | return;
| -> panic()
|
-> jbd2_journal_update_sb_errno()
Tested-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0305cd5f7f upstream.
When truncating a file to a smaller size which consists of an inline
extent that is compressed, we did not discard (or made unusable) the
data between the new file size and the old file size, wasting metadata
space and allowing for the truncated data to be leaked and the data
corruption/loss mentioned below.
We were also not correctly decrementing the number of bytes used by the
inode, we were setting it to zero, giving a wrong report for callers of
the stat(2) syscall. The fsck tool also reported an error about a mismatch
between the nbytes of the file versus the real space used by the file.
Now because we weren't discarding the truncated region of the file, it
was possible for a caller of the clone ioctl to actually read the data
that was truncated, allowing for a security breach without requiring root
access to the system, using only standard filesystem operations. The
scenario is the following:
1) User A creates a file which consists of an inline and compressed
extent with a size of 2000 bytes - the file is not accessible to
any other users (no read, write or execution permission for anyone
else);
2) The user truncates the file to a size of 1000 bytes;
3) User A makes the file world readable;
4) User B creates a file consisting of an inline extent of 2000 bytes;
5) User B issues a clone operation from user A's file into its own
file (using a length argument of 0, clone the whole range);
6) User B now gets to see the 1000 bytes that user A truncated from
its file before it made its file world readbale. User B also lost
the bytes in the range [1000, 2000[ bytes from its own file, but
that might be ok if his/her intention was reading stale data from
user A that was never supposed to be public.
Note that this contrasts with the case where we truncate a file from 2000
bytes to 1000 bytes and then truncate it back from 1000 to 2000 bytes. In
this case reading any byte from the range [1000, 2000[ will return a value
of 0x00, instead of the original data.
This problem exists since the clone ioctl was added and happens both with
and without my recent data loss and file corruption fixes for the clone
ioctl (patch "Btrfs: fix file corruption and data loss after cloning
inline extents").
So fix this by truncating the compressed inline extents as we do for the
non-compressed case, which involves decompressing, if the data isn't already
in the page cache, compressing the truncated version of the extent, writing
the compressed content into the inline extent and then truncate it.
The following test case for fstests reproduces the problem. In order for
the test to pass both this fix and my previous fix for the clone ioctl
that forbids cloning a smaller inline extent into a larger one,
which is titled "Btrfs: fix file corruption and data loss after cloning
inline extents", are needed. Without that other fix the test fails in a
different way that does not leak the truncated data, instead part of
destination file gets replaced with zeroes (because the destination file
has a larger inline extent than the source).
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
rm -f $seqres.full
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount "-o compress"
# Create our test files. File foo is going to be the source of a clone operation
# and consists of a single inline extent with an uncompressed size of 512 bytes,
# while file bar consists of a single inline extent with an uncompressed size of
# 256 bytes. For our test's purpose, it's important that file bar has an inline
# extent with a size smaller than foo's inline extent.
$XFS_IO_PROG -f -c "pwrite -S 0xa1 0 128" \
-c "pwrite -S 0x2a 128 384" \
$SCRATCH_MNT/foo | _filter_xfs_io
$XFS_IO_PROG -f -c "pwrite -S 0xbb 0 256" $SCRATCH_MNT/bar | _filter_xfs_io
# Now durably persist all metadata and data. We do this to make sure that we get
# on disk an inline extent with a size of 512 bytes for file foo.
sync
# Now truncate our file foo to a smaller size. Because it consists of a
# compressed and inline extent, btrfs did not shrink the inline extent to the
# new size (if the extent was not compressed, btrfs would shrink it to 128
# bytes), it only updates the inode's i_size to 128 bytes.
$XFS_IO_PROG -c "truncate 128" $SCRATCH_MNT/foo
# Now clone foo's inline extent into bar.
# This clone operation should fail with errno EOPNOTSUPP because the source
# file consists only of an inline extent and the file's size is smaller than
# the inline extent of the destination (128 bytes < 256 bytes). However the
# clone ioctl was not prepared to deal with a file that has a size smaller
# than the size of its inline extent (something that happens only for compressed
# inline extents), resulting in copying the full inline extent from the source
# file into the destination file.
#
# Note that btrfs' clone operation for inline extents consists of removing the
# inline extent from the destination inode and copy the inline extent from the
# source inode into the destination inode, meaning that if the destination
# inode's inline extent is larger (N bytes) than the source inode's inline
# extent (M bytes), some bytes (N - M bytes) will be lost from the destination
# file. Btrfs could copy the source inline extent's data into the destination's
# inline extent so that we would not lose any data, but that's currently not
# done due to the complexity that would be needed to deal with such cases
# (specially when one or both extents are compressed), returning EOPNOTSUPP, as
# it's normally not a very common case to clone very small files (only case
# where we get inline extents) and copying inline extents does not save any
# space (unlike for normal, non-inlined extents).
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
# Now because the above clone operation used to succeed, and due to foo's inline
# extent not being shinked by the truncate operation, our file bar got the whole
# inline extent copied from foo, making us lose the last 128 bytes from bar
# which got replaced by the bytes in range [128, 256[ from foo before foo was
# truncated - in other words, data loss from bar and being able to read old and
# stale data from foo that should not be possible to read anymore through normal
# filesystem operations. Contrast with the case where we truncate a file from a
# size N to a smaller size M, truncate it back to size N and then read the range
# [M, N[, we should always get the value 0x00 for all the bytes in that range.
# We expected the clone operation to fail with errno EOPNOTSUPP and therefore
# not modify our file's bar data/metadata. So its content should be 256 bytes
# long with all bytes having the value 0xbb.
#
# Without the btrfs bug fix, the clone operation succeeded and resulted in
# leaking truncated data from foo, the bytes that belonged to its range
# [128, 256[, and losing data from bar in that same range. So reading the
# file gave us the following content:
#
# 0000000 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1 a1
# *
# 0000200 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a
# *
# 0000400
echo "File bar's content after the clone operation:"
od -t x1 $SCRATCH_MNT/bar
# Also because the foo's inline extent was not shrunk by the truncate
# operation, btrfs' fsck, which is run by the fstests framework everytime a
# test completes, failed reporting the following error:
#
# root 5 inode 257 errors 400, nbytes wrong
status=0
exit
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[bwh: Backported to 3.2:
- Adjust parameters to btrfs_truncate_page() and btrfs_truncate_item()
- Pass transaction pointer into truncate_inline_extent()
- Add prototype of btrfs_truncate_page()
- s/test_bit(BTRFS_ROOT_REF_COWS, &root->state)/root->ref_cows/
- Keep using BUG_ON() for other error cases, as there is no
btrfs_abort_transaction()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 514ac8ad87 upstream.
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.
Reported-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2:
- Don't use btrfs_map_token API
- There are fewer callers of btrfs_file_extent_inline_len() to change
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 54c09889bf upstream.
The z2 machine calls pxa27x_set_pwrmode() in order to power off
the machine, but this function gets discarded early at boot because
it is marked __init, as pointed out by kbuild:
WARNING: vmlinux.o(.text+0x145c4): Section mismatch in reference from the function z2_power_off() to the function .init.text:pxa27x_set_pwrmode()
The function z2_power_off() references
the function __init pxa27x_set_pwrmode().
This is often because z2_power_off lacks a __init
annotation or the annotation of pxa27x_set_pwrmode is wrong.
This removes the __init section modifier to fix rebooting and the
build error.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: ba4a90a6d8 ("ARM: pxa/z2: fix building error of pxa27x_cpu_suspend() no longer available")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d14053b3c7 upstream.
The VT-d specification says that "Software must enable ATS on endpoint
devices behind a Root Port only if the Root Port is reported as
supporting ATS transactions."
We walk up the tree to find a Root Port, but for integrated devices we
don't find one — we get to the host bridge. In that case we *should*
allow ATS. Currently we don't, which means that we are incorrectly
failing to use ATS for the integrated graphics. Fix that.
We should never break out of this loop "naturally" with bus==NULL,
since we'll always find bridge==NULL in that case (and now return 1).
So remove the check for (!bridge) after the loop, since it can never
happen. If it did, it would be worthy of a BUG_ON(!bridge). But since
it'll oops anyway in that case, that'll do just as well.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- There's no (!bridge) check to remove]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8039d87d9e upstream.
Currently the clone ioctl allows to clone an inline extent from one file
to another that already has other (non-inlined) extents. This is a problem
because btrfs is not designed to deal with files having inline and regular
extents, if a file has an inline extent then it must be the only extent
in the file and must start at file offset 0. Having a file with an inline
extent followed by regular extents results in EIO errors when doing reads
or writes against the first 4K of the file.
Also, the clone ioctl allows one to lose data if the source file consists
of a single inline extent, with a size of N bytes, and the destination
file consists of a single inline extent with a size of M bytes, where we
have M > N. In this case the clone operation removes the inline extent
from the destination file and then copies the inline extent from the
source file into the destination file - we lose the M - N bytes from the
destination file, a read operation will get the value 0x00 for any bytes
in the the range [N, M] (the destination inode's i_size remained as M,
that's why we can read past N bytes).
So fix this by not allowing such destructive operations to happen and
return errno EOPNOTSUPP to user space.
Currently the fstest btrfs/035 tests the data loss case but it totally
ignores this - i.e. expects the operation to succeed and does not check
the we got data loss.
The following test case for fstests exercises all these cases that result
in file corruption and data loss:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
_require_btrfs_fs_feature "no_holes"
_require_btrfs_mkfs_feature "no-holes"
rm -f $seqres.full
test_cloning_inline_extents()
{
local mkfs_opts=$1
local mount_opts=$2
_scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
_scratch_mount $mount_opts
# File bar, the source for all the following clone operations, consists
# of a single inline extent (50 bytes).
$XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
| _filter_xfs_io
# Test cloning into a file with an extent (non-inlined) where the
# destination offset overlaps that extent. It should not be possible to
# clone the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo
# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "File foo data after clone operation:"
# All bytes should have the value 0xaa (clone operation failed and did
# not modify our file).
od -t x1 $SCRATCH_MNT/foo
$XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io
# Test cloning the inline extent against a file which has a hole in its
# first 4K followed by a non-inlined extent. It should not be possible
# as well to clone the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2
# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "File foo2 data after clone operation:"
# All bytes should have the value 0x00 (clone operation failed and did
# not modify our file).
od -t x1 $SCRATCH_MNT/foo2
$XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io
# Test cloning the inline extent against a file which has a size of zero
# but has a prealloc extent. It should not be possible as well to clone
# the inline extent from file bar into this file.
$XFS_IO_PROG -f -c "falloc -k 0 1M" $SCRATCH_MNT/foo3 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo3
# Doing IO against any range in the first 4K of the file should work.
# Due to a past clone ioctl bug which allowed cloning the inline extent,
# these operations resulted in EIO errors.
echo "First 50 bytes of foo3 after clone operation:"
# Should not be able to read any bytes, file has 0 bytes i_size (the
# clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo3
$XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io
# Test cloning the inline extent against a file which consists of a
# single inline extent that has a size not greater than the size of
# bar's inline extent (40 < 50).
# It should be possible to do the extent cloning from bar to this file.
$XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4
# Doing IO against any range in the first 4K of the file should work.
echo "File foo4 data after clone operation:"
# Must match file bar's content.
od -t x1 $SCRATCH_MNT/foo4
$XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io
# Test cloning the inline extent against a file which consists of a
# single inline extent that has a size greater than the size of bar's
# inline extent (60 > 50).
# It should not be possible to clone the inline extent from file bar
# into this file.
$XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
| _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5
# Reading the file should not fail.
echo "File foo5 data after clone operation:"
# Must have a size of 60 bytes, with all bytes having a value of 0x03
# (the clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo5
# Test cloning the inline extent against a file which has no extents but
# has a size greater than bar's inline extent (16K > 50).
# It should not be possible to clone the inline extent from file bar
# into this file.
$XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6
# Reading the file should not fail.
echo "File foo6 data after clone operation:"
# Must have a size of 16K, with all bytes having a value of 0x00 (the
# clone operation failed and did not modify our file).
od -t x1 $SCRATCH_MNT/foo6
# Test cloning the inline extent against a file which has no extents but
# has a size not greater than bar's inline extent (30 < 50).
# It should be possible to clone the inline extent from file bar into
# this file.
$XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7
# Reading the file should not fail.
echo "File foo7 data after clone operation:"
# Must have a size of 50 bytes, with all bytes having a value of 0xbb.
od -t x1 $SCRATCH_MNT/foo7
# Test cloning the inline extent against a file which has a size not
# greater than the size of bar's inline extent (20 < 50) but has
# a prealloc extent that goes beyond the file's size. It should not be
# possible to clone the inline extent from bar into this file.
$XFS_IO_PROG -f -c "falloc -k 0 1M" \
-c "pwrite -S 0x88 0 20" \
$SCRATCH_MNT/foo8 | _filter_xfs_io
$CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8
echo "File foo8 data after clone operation:"
# Must have a size of 20 bytes, with all bytes having a value of 0x88
# (the clone operation did not modify our file).
od -t x1 $SCRATCH_MNT/foo8
_scratch_unmount
}
echo -e "\nTesting without compression and without the no-holes feature...\n"
test_cloning_inline_extents
echo -e "\nTesting with compression and without the no-holes feature...\n"
test_cloning_inline_extents "" "-o compress"
echo -e "\nTesting without compression and with the no-holes feature...\n"
test_cloning_inline_extents "-O no-holes" ""
echo -e "\nTesting with compression and with the no-holes feature...\n"
test_cloning_inline_extents "-O no-holes" "-o compress"
status=0
exit
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[bwh: Backported to 3.2:
- Adjust parameters to btrfs_drop_extents()
- Drop use of ASSERT()
- Keep using BUG_ON() for other error cases, as there is no
btrfs_abort_transaction()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c7d22a3c3c upstream.
btrfs_next_item() makes the btrfs path point to the next item, crossing leaf
boundaries if needed.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
[bwh: Dependency of the following fix]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 161642e24f upstream.
Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv
But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".
We can read non existent memory and crash.
Fixes: c0de08d042 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f35d04a02 upstream.
The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
DEVICE_COUNT_RESOURCE (16). This bug was found using a static checker.
It may be that the "if (!(mask & (1 << i)))" check means we never
actually go past the end of the array in real life.
Fixes: ec04b07584 ('iomap: implement pcim_iounmap_regions()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e5bae86797 upstream.
If we fail to allocate a partition structure in the middle of the partition
creation process, the already allocated partitions are never removed, which
means they are still present in the partition list and their resources are
never freed.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f9c6e1bc1 upstream.
There were several bugs here.
1) The done label was in the wrong place so we didn't copy any
information out when there was no command given.
2) We were using PAGE_SIZE as the size of the buffer instead of
"PAGE_SIZE - pos".
3) snprintf() returns the number of characters that would have been
printed if there were enough space. If there was not enough space
(and we had fixed the memory corruption bug #2) then it would result
in an information leak when we do simple_read_from_buffer(). I've
changed it to use scnprintf() instead.
I also removed the initialization at the start of the function, because
I thought it made the code a little more clear.
Fixes: 5e6e3a92b9 ('wireless: mwifiex: initial commit for Marvell mwifiex driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90adf98d95 upstream.
Since commit 1c6c69525b ("genirq: Reject bogus threaded irq requests")
threaded IRQs without a primary handler need to be requested with
IRQF_ONESHOT, otherwise the request will fail.
scripts/coccinelle/misc/irqf_oneshot.cocci detected this issue.
Fixes: b5874f33bb ("wm831x_power: Use genirq")
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79b568b9d0 upstream.
hid_connect adds various strings to the buffer but they're all
conditional. You can find circumstances where nothing would be written
to it but the kernel will still print the supposedly empty buffer with
printk. This leads to corruption on the console/in the logs.
Ensure buf is initialized to an empty string.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
[dvhart: Initialize string to "" rather than assign buf[0] = NULL;]
Cc: Jiri Kosina <jikos@kernel.org>
Cc: linux-input@vger.kernel.org
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8ec6d97871 upstream.
The ifmgd->ave_beacon_signal value cannot be taken as is for
comparisons, it must be divided by since it's represented
like that for better accuracy of the EWMA calculations. This
would lead to invalid driver RSSI events. Fix the used value.
Fixes: 615f7b9bb1 ("mac80211: add driver RSSI threshold events")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da2d03ea27 upstream.
932c435cab ("PCI: Add dev_flags bit to access VPD through function 0")
added PCI_DEV_FLAGS_VPD_REF_F0. Previously, we set the flag on every
non-zero function of quirked devices. If a function turned out to be
different from function 0, i.e., it had a different class, vendor ID, or
device ID, the flag remained set but we didn't make VPD accessible at all.
Flip this around so we only set PCI_DEV_FLAGS_VPD_REF_F0 for functions that
are identical to function 0, and allow regular VPD access for any other
functions.
[bhelgaas: changelog, stable tag]
Fixes: 932c435cab ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9d9240756e upstream.
Commit 932c435cab ("PCI: Add dev_flags bit to access VPD through function
0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
Generally this works because we're fairly well guaranteed that a PCIe
device is at slot address 0, but for the general case, including
conventional PCI, it's incorrect. We need to get the slot and then convert
it back into a devfn.
Fixes: 932c435cab ("PCI: Add dev_flags bit to access VPD through function 0")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f05819df10 upstream.
The following sequence of commands:
i=`keyctl add user a a @s`
keyctl request2 keyring foo bar @t
keyctl unlink $i @s
tries to invoke an upcall to instantiate a keyring if one doesn't already
exist by that name within the user's keyring set. However, if the upcall
fails, the code sets keyring->type_data.reject_error to -ENOKEY or some
other error code. When the key is garbage collected, the key destroy
function is called unconditionally and keyring_destroy() uses list_empty()
on keyring->type_data.link - which is in a union with reject_error.
Subsequently, the kernel tries to unlink the keyring from the keyring names
list - which oopses like this:
BUG: unable to handle kernel paging request at 00000000ffffff8a
IP: [<ffffffff8126e051>] keyring_destroy+0x3d/0x88
...
Workqueue: events key_garbage_collector
...
RIP: 0010:[<ffffffff8126e051>] keyring_destroy+0x3d/0x88
RSP: 0018:ffff88003e2f3d30 EFLAGS: 00010203
RAX: 00000000ffffff82 RBX: ffff88003bf1a900 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000003bfc6901 RDI: ffffffff81a73a40
RBP: ffff88003e2f3d38 R08: 0000000000000152 R09: 0000000000000000
R10: ffff88003e2f3c18 R11: 000000000000865b R12: ffff88003bf1a900
R13: 0000000000000000 R14: ffff88003bf1a908 R15: ffff88003e2f4000
...
CR2: 00000000ffffff8a CR3: 000000003e3ec000 CR4: 00000000000006f0
...
Call Trace:
[<ffffffff8126c756>] key_gc_unused_keys.constprop.1+0x5d/0x10f
[<ffffffff8126ca71>] key_garbage_collector+0x1fa/0x351
[<ffffffff8105ec9b>] process_one_work+0x28e/0x547
[<ffffffff8105fd17>] worker_thread+0x26e/0x361
[<ffffffff8105faa9>] ? rescuer_thread+0x2a8/0x2a8
[<ffffffff810648ad>] kthread+0xf3/0xfb
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
[<ffffffff815f2ccf>] ret_from_fork+0x3f/0x70
[<ffffffff810647ba>] ? kthread_create_on_node+0x1c2/0x1c2
Note the value in RAX. This is a 32-bit representation of -ENOKEY.
The solution is to only call ->destroy() if the key was successfully
instantiated.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
[carnil: Backported for 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 94c4554ba0 upstream.
There appears to be a race between:
(1) key_gc_unused_keys() which frees key->security and then calls
keyring_destroy() to unlink the name from the name list
(2) find_keyring_by_name() which calls key_permission(), thus accessing
key->security, on a key before checking to see whether the key usage is 0
(ie. the key is dead and might be cleaned up).
Fix this by calling ->destroy() before cleaning up the core key data -
including key->security.
Reported-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[carnil: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 54a20552e1 upstream.
It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions. This causes the
microcode to enter an infinite loop where the core never receives
another interrupt. The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).
Signed-off-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- Add definition of AC_VECTOR
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a41cbe86df upstream.
A test case is as the description says:
open(foobar, O_WRONLY);
sleep() --> reboot the server
close(foobar)
The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
line before going to restart, there is
clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
out state and when we go to close it, “call_close” doesn’t get set as
state flag is not set and CLOSE doesn’t go on the wire.
Signed-off-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 436c2a5036 ]
commit 3cc81d85ee ("asix: Don't reset PHY on if_up for ASIX 88772")
causes the ethernet on Arndale to no longer function. This appears to
be because the Arndale ethernet requires a full reset before it will
function correctly, however simply reverting the above patch causes
problems with ethtool settings getting reset.
It seems the problem is that the ethernet is not properly reset during
bind, and indeed the code in ax88772_bind that resets the device is a
very small subset of the actual ax88772_reset function. This patch uses
ax88772_reset in place of the existing reset code in ax88772_bind which
removes some code duplication and fixes the ethernet on Arndale.
It is still possible that the original patch causes some issues with
suspend and resume but that seems like a separate issue and I haven't
had a chance to test that yet.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3cc81d85ee ]
I've noticed every time the interface is set to 'up,', the kernel
reports that the link speed is set to 100 Mbps/Full Duplex, even
when ethtool is used to set autonegotiation to 'off', half
duplex, 10 Mbps.
It can be tested by:
ifconfig eth0 down
ethtool -s eth0 autoneg off speed 10 duplex half
ifconfig eth0 up
Then checking 'dmesg' for the link speed.
Signed-off-by: Michel Stam <m.stam@fugro.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 077cb37fcf ]
It seems that kernel memory can leak into userspace by a
kmalloc, ethtool_get_strings, then copy_to_user sequence.
Avoid this by using kcalloc to zero fill the copied buffer.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 31b33dfb0a ]
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.
Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.
Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6ae459bdaa ]
VXLAN device can receive skb with checksum partial. But the checksum
offset could be in outer header which is pulled on receive. This results
in negative checksum offset for the skb. Such skb can cause the assert
failure in skb_checksum_help(). Following patch fixes the bug by setting
checksum-none while pulling outer header.
Following is the kernel panic msg from old kernel hitting the bug.
------------[ cut here ]------------
kernel BUG at net/core/dev.c:1906!
RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
Call Trace:
<IRQ>
[<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
[<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
[<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
[<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
[<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
[<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
[<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
[<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
[<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
[<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
[<ffffffff81550128>] ip_local_deliver+0x88/0x90
[<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
[<ffffffff81550365>] ip_rcv+0x235/0x300
[<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
[<ffffffff8151c360>] netif_receive_skb+0x80/0x90
[<ffffffff81459935>] virtnet_poll+0x555/0x6f0
[<ffffffff8151cd04>] net_rx_action+0x134/0x290
[<ffffffff810683d8>] __do_softirq+0xa8/0x210
[<ffffffff8162fe6c>] call_softirq+0x1c/0x30
[<ffffffff810161a5>] do_softirq+0x65/0xa0
[<ffffffff810687be>] irq_exit+0x8e/0xb0
[<ffffffff81630733>] do_IRQ+0x63/0xe0
[<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
Reported-by: Anupam Chanda <achanda@vmware.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Without this length argument, we can read past the end of the iovec in
memcpy_toiovec because we have no way of knowing the total length of the
iovec's buffers.
This is needed for stable kernels where 89c22d8c3b ("net: Fix skb
csum races when peeking") has been backported but that don't have the
ioviter conversion, which is almost all the stable trees <= 3.18.
This also fixes a kernel crash for NFS servers when the client uses
-onfsvers=3,proto=udp to mount the export.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
[bwh: Backported to 3.2: adjust context in include/linux/skbuff.h]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80e0b6e8a0 upstream.
We accidentally declared pid_alive without any extern/inline connotation.
Some platforms were fine with this, some like ia64 and mips were very angry.
If the function is inline, the prototype should be inline!
on ia64:
include/linux/sched.h:1718: warning: 'pid_alive' declared inline after
being called
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Neal Gompa <ngompa13@gmail.com>
commit c340702ca2 upstream.
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 95af587e95 ("md/raid10: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
Reported-by: Nate Dailey <nate.dailey@stratus.com>
Fixes: bd870a16c5 ("md/raid10: Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95af587e95 upstream.
When a write to one of the legs of a RAID10 fails, the failure is
recorded in the metadata of the other legs so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).
Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.
So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that experienced an error on a new queue which
is only processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd8688a199 upstream.
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 55ce74d4bf ("md/raid1: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
Reported-and-tested-by: Nate Dailey <nate.dailey@stratus.com>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
Fixes: cd5ff9a16f ("md/raid1: Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55ce74d4bf upstream.
When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).
Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.
Currently there is no interlock between the write request completing
and the metadata update. So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.
This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.
So:
- set MD_CHANGE_PENDING when requesting a metadata update for a
failed device, so we can know with certainty when it completes
- queue requests that experienced an error on a new queue which
is only processed after the metadata update completes
- call raid_end_bio_io() on bios in that queue when the time comes.
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4dcb8b57df upstream.
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.
Fix this by releasing the previously allocated bufio block using
unlock_block().
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2871c69e02 upstream.
Commit 4c7e309340 ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().
The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them. If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.
Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.
Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1acea4f6ce upstream.
We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.
Fixes: 2b018d57ff ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 296291cdd1 upstream.
Currently a simple program below issues a sendfile(2) system call which
takes about 62 days to complete in my test KVM instance.
int fd;
off_t off = 0;
fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644);
ftruncate(fd, 2);
lseek(fd, 0, SEEK_END);
sendfile(fd, fd, &off, 0xfffffff);
Now you should not ask kernel to do a stupid stuff like copying 256MB in
2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin
should have a way to stop you.
We actually do have a check for fatal_signal_pending() in
generic_perform_write() which triggers in this path however because we
always succeed in writing something before the check is done, we return
value > 0 from generic_perform_write() and thus the information about
signal gets lost.
Fix the problem by doing the signal check before writing anything. That
way generic_perform_write() returns -EINTR, the error gets propagated up
and the sendfile loop terminates early.
Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8832317f66 upstream.
Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.
Oops: Exception in kernel mode, sig: 4 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
MSR: 1000000000081000 <HV,ME> CR: 00000000 XER: 00000000
CFAR: c000000000009c0c SOFTE: 0
NIP [0000000000000000] (null)
LR [0000000000009c14] 0x9c14
Call Trace:
[c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
[c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
[c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
Fixes: 55190f8878 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2a6c521bb4 upstream.
On nv50+, we restrict the valid domains to just the one where the buffer
was originally created. However after the buffer is evicted to system
memory, we might move it back to a different domain that was not
originally valid. When sharing the buffer and retrieving its GEM_INFO
data, we still want the domain that will be valid for this buffer in a
pushbuf, not the one where it currently happens to be.
This resolves fdo#92504 and several others. These are due to suspend
evicting all buffers, making it more likely that they temporarily end up
in the wrong place.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ca81a2840 upstream.
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.
Fixes: a977049dac ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97aff2c03a upstream.
There are 24 EQ registers not 25, I suspect this bug came about because
the registers start at EQ1 not zero. The bug is relatively harmless as
the extra register written is an unused one.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3fc89adb9f upstream.
Currently a number of Crypto API operations may fail when a signal
occurs. This causes nasty problems as the caller of those operations
are often not in a good position to restart the operation.
In fact there is currently no need for those operations to be
interrupted by user signals at all. All we need is for them to
be killable.
This patch replaces the relevant calls of signal_pending with
fatal_signal_pending, and wait_for_completion_interruptible with
wait_for_completion_killable, respectively.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: drop change to crypto_user_skcipher_alg(), which
we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd7cd061ad upstream.
We received several reports of systems rebooting and powering on
after an attempted shutdown. Testing showed that setting
XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT
quirk allowed the system to shutdown as expected for LynxPoint-LP
xHCI controllers. Set the quirk back.
Note that the quirk was originally introduced for LynxPoint and
LynxPoint-LP just for this same reason. See:
commit 638298dc66 ("xhci: Fix spurious wakeups after S5 on Haswell")
It was later limited to only concern HP machines as it caused
regression on some machines, see both bug and commit:
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171
commit 6962d914f3 ("xhci: Limit the spurious wakeup fix only to HP machines")
Later it was discovered that the powering on after shutdown
was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP
machine suffered from spontaneous resume from S3 (which should
not be related to the SPURIOUS_WAKEUP quirk at all). An attempt
to fix this then removed the SPURIOUS_WAKEUP flag usage completely.
commit b45abacde3 ("xhci: no switching back on non-ULT Haswell")
Current understanding is that LynxPoint-LP (Haswell ULT) machines
need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and
plain Lynxpoint (Haswell) machines may _not_ have the quirk
set otherwise they again will restart.
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Oliver Neukum <oneukum@suse.com>
[Added more history to commit message -Mathias]
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commits c09ec25d36 and
0a939993bf upstream.
The same issue like with Panther Point chipsets. If the USB ports are
switched to xHCI on shutdown, the xHCI host will send a spurious interrupt,
which will wake the system. Some BIOS have work around for this, but not all.
One example is Compulab's mini-desktop, the Intense-PC2.
The bug can be avoided if the USB ports are switched back to EHCI on
shutdown.
This patch should be backported to stable kernels as old as 3.12,
that contain the commit 638298dc66
"xhci: Fix spurious wakeups after S5 on Haswell"
Signed-off-by: Denis Turischev <denis@compulab.co.il>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch "xhci: Switch Intel Lynx Point ports to EHCI on shutdown."
commit c09ec25d36 is not fully correct
It switches both Lynx Point and Lynx Point-LP ports to EHCI on shutdown.
On some Lynx Point machines it causes spurious interrupt,
which wake the system: bugzilla.kernel.org/show_bug.cgi?id=76291
On Lynx Point-LP on the contrary switching ports to EHCI seems to be
necessary to fix these spurious interrupts.
Signed-off-by: Denis Turischev <denis@compulab.co.il>
Reported-by: Wulf Richartz <wulf.richartz@gmail.com>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Combined the above commits and backported to 3.2: adjust context to
apply after "xhci: Limit the spurious wakeup fix only to HP machines" and
"xhci: no switching back on non-ULT Haswell"]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b4739b895 upstream.
If a host fails to wake up a isochronous SuperSpeed device from U1/U2
in time for a isoch transfer it will generate a "No ping response error"
Host will then move to the next transfer descriptor.
Handle this case in the same way as missed service errors, tag the
current TD as skipped and handle it on the next transfer event.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e210c422b6 upstream.
If the difference is big enough between the bytes asked and received
in a bulk transfer we can get a short transfer event pointing to a TRB in
the middle of the TD. We don't want to handle the TD yet as we will anyway
receive a new event for the last TRB in the TD.
Hold off from finishing the TD and removing it from the list until we
receive an event for the last TRB in the TD
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba2374fd2b upstream.
In preparation for the installation of a large page, any small page
tables that may still exist in the target IOV address range are
removed. However, if a scatter/gather list entry is large enough to
fit more than one large page, the address space for any subsequent
large pages is not cleared of conflicting small page tables.
This can cause legitimate mapping requests to fail with errors of the
form below, potentially followed by a series of IOMMU faults:
ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083)
In this example, a 4MiB scatter/gather list entry resulted in the
successful installation of a large page @ vPFN 0xfdc00, followed by
a failed attempt to install another large page @ vPFN 0xfde00, due to
the presence of a pointer to a small page table @ 0x7f83a4000.
To address this problem, compute the number of large pages that fit
into a given scatter/gather list entry, and use it to derive the
last vPFN covered by the large page(s).
Signed-off-by: Christian Zander <christian@nervanasys.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2:
- Add the lvl_pages variable, added by an earlier commit upstream
- Also change arguments to dma_pte_clear_range(), which is called by
dma_pte_free_pagetable() upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8996eafdcb upstream.
Unlike shash algorithms, ahash drivers must implement export
and import as their descriptors may contain hardware state and
cannot be exported as is. Unfortunately some ahash drivers did
not provide them and end up causing crashes with algif_hash.
This patch adds a check to prevent these drivers from registering
ahash algorithms until they are fixed.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a54c8f0f2d upstream.
xen-blkfront will crash if the check to talk_to_blkback()
in blkback_changed()(XenbusStateInitWait) returns an error.
The driver data is freed and info is set to NULL. Later during
the close process via talk_to_blkback's call to xenbus_dev_fatal()
the null pointer is passed to and dereference in blkfront_closing.
Signed-off-by: Cathy Avery <cathy.avery@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15e3d5a285 upstream.
3w controller don't dma map small single SGL entry commands but instead
bounce buffer them. Add a helper to identify these commands and don't
call scsi_dma_unmap for them.
Based on an earlier patch from James Bottomley.
Fixes: 118c85 ("3w-9xxx: fix command completion race")
Reported-by: Tóth Attila <atoth@atoth.sote.hu>
Tested-by: Tóth Attila <atoth@atoth.sote.hu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95913d9791 upstream.
So the problem this patch is trying to address is as follows:
CPU0 CPU1
context_switch(A, B)
ttwu(A)
LOCK A->pi_lock
A->on_cpu == 0
finish_task_switch(A)
prev_state = A->state <-.
WMB |
A->on_cpu = 0; |
UNLOCK rq0->lock |
| context_switch(C, A)
`-- A->state = TASK_DEAD
prev_state == TASK_DEAD
put_task_struct(A)
context_switch(A, C)
finish_task_switch(A)
A->state == TASK_DEAD
put_task_struct(A)
The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.
Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.
We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.
The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2:
- Adjust filename
- As smp_store_release() is not defined, use smp_mb()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 225db5762d upstream.
When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel
warnings like:
WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80()
sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0'
It's because both emux synth and opl3 drivers try to register their
OSS device object with the same static index number 0. This hasn't
been a big problem until the recent rewrite of device management code
(that exposes sysfs at the same time), but it's been an obvious bug.
This patch works around it just by using a different index number of
emux synth object. There can be a more elegant way to fix, but it's
enough for now, as this code won't be touched so often, in anyway.
Reported-and-tested-by: Michael Shell <list1@michaelshell.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5bd166872d upstream.
The code to send the RX PN data (for each TID) to the firmware
has a devastating bug: it overwrites the data for TID 0 with
all the TID data, leaving the remaining TIDs zeroed. This will
allow replays to actually be accepted by the firmware, which
could allow waking up the system.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0c55627167 upstream.
This is mostly a hardening fix, given that write-only access to other
users' ttys is usually only given through setgid tty executables.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- __proc_set_tty() also takes a task_struct pointer]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e81107d4c6 upstream.
My colleague ran into a program stall on a x86_64 server, where
n_tty_read() was waiting for data even if there was data in the buffer
in the pty. kernel stack for the stuck process looks like below.
#0 [ffff88303d107b58] __schedule at ffffffff815c4b20
#1 [ffff88303d107bd0] schedule at ffffffff815c513e
#2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
#3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
#4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
#5 [ffff88303d107dd0] tty_read at ffffffff81368013
#6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
#7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
#8 [ffff88303d107f00] sys_read at ffffffff811a4306
#9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7
There seems to be two problems causing this issue.
First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
updates ldata->commit_head using smp_store_release() and then checks
the wait queue using waitqueue_active(). However, since there is no
memory barrier, __receive_buf() could return without calling
wake_up_interactive_poll(), and at the same time, n_tty_read() could
start to wait in wait_woken() as in the following chart.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
if (waitqueue_active(&tty->read_wait))
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
add_wait_queue(&tty->read_wait, &wait);
...
if (!input_available_p(tty, 0)) {
smp_store_release(&ldata->commit_head,
ldata->read_head);
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
The second problem is that n_tty_read() also lacks a memory barrier
call and could also cause __receive_buf() to return without calling
wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
as in the chart below.
__receive_buf() n_tty_read()
------------------------------------------------------------------------
spin_lock_irqsave(&q->lock, flags);
/* from add_wait_queue() */
...
if (!input_available_p(tty, 0)) {
/* Memory operations issued after the
RELEASE may be completed before the
RELEASE operation has completed */
smp_store_release(&ldata->commit_head,
ldata->read_head);
if (waitqueue_active(&tty->read_wait))
__add_wait_queue(q, wait);
spin_unlock_irqrestore(&q->lock,flags);
/* from add_wait_queue() */
...
timeout = wait_woken(&wait,
TASK_INTERRUPTIBLE, timeout);
------------------------------------------------------------------------
There are also other places in drivers/tty/n_tty.c which have similar
calls to waitqueue_active(), so instead of adding many memory barrier
calls, this patch simply removes the call to waitqueue_active(),
leaving just wake_up*() behind.
This fixes both problems because, even though the memory access before
or after the spinlocks in both wake_up*() and add_wait_queue() can
sneak into the critical section, it cannot go past it and the critical
section assures that they will be serialized (please see "INTER-CPU
ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
better explanation). Moreover, the resulting code is much simpler.
Latency measurement using a ping-pong test over a pty doesn't show any
visible performance drop.
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Use wake_up_interruptible(), not wake_up_interruptible_poll()
- There are only two spurious uses of waitqueue_active() to remove]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72194739f5 upstream.
Add a device quirk for the Logitech PTZ Pro Camera and its sibling the
ConferenceCam CC3000e Camera.
This fixes the failed camera enumeration on some boot, particularly on
machines with fast CPU.
Tested by connecting a Logitech PTZ Pro Camera to a machine with a
Haswell Core i7-4600U CPU @ 2.10GHz, and doing thousands of reboot cycles
while recording the kernel logs and taking camera picture after each boot.
Before the patch, more than 7% of the boots show some enumeration transfer
failures and in a few of them, the kernel is giving up before actually
enumerating the webcam. After the patch, the enumeration has been correct
on every reboot.
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eda7d0f38a upstream.
"num_read" is in byte units but we are write u16s so we end up write
twice as much as intended.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66eefe5de1 upstream.
Calling e.g. blk_queue_max_hw_sectors() after calls to
disk_stack_limits() discards the settings determined by
disk_stack_limits().
So we need to make those calls first.
Fixes: 199dc6ed51 ("md/raid0: update queue parameter in a safer location.")
Reported-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 3.2: the code being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 199dc6ed51 upstream.
When a (e.g.) RAID5 array is reshaped to RAID0, the updating
of queue parameters (e.g. max number of sectors per bio) is
done in the wrong place.
It should be part of ->run, but it is actually part of ->takeover.
This means it happens before level_store() calls:
blk_set_stacking_limits(&mddev->queue->limits);
and so it ineffective. This can lead to errors from underlying
devices.
So move all the relevant settings out of create_stripe_zones()
and into raid0_run().
As this can lead to a bug-on it is suitable for any -stable
kernel which supports reshape to RAID0. So 2.6.35 or later.
As the bug has been present for five years there is no urgency,
so no need to rush into -stable.
Fixes: 9af204cf72 ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover")
Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 3.2:
- md has no discard or write-same support
- md is not used by dm-raid so mddev->queue is never null
- Open-code rdev_for_each()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 646200a041 upstream.
The error paths in set_file_size for cifs and smb3 are incorrect.
In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.
Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.
Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
Pointed out by PaX/grsecurity team
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: PaX Team <pageexec@freemail.hu>
CC: Emese Revfy <re.emese@gmail.com>
CC: Brad Spengler <spender@grsecurity.net>
[bwh: Backported to 3.2: deleted code looks slightly different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f84a8990e upstream.
SunDong reported the following on
https://bugzilla.kernel.org/show_bug.cgi?id=103841
I think I find a linux bug, I have the test cases is constructed. I
can stable recurring problems in fedora22(4.0.4) kernel version,
arch for x86_64. I construct transparent huge page, when the parent
and child process with MAP_SHARE, MAP_PRIVATE way to access the same
huge page area, it has the opportunity to lead to huge page copy on
write failure, and then it will munmap the child corresponding mmap
area, but then the child mmap area with VM_MAYSHARE attributes, child
process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
functions (vma - > vm_flags & VM_MAYSHARE).
There were a number of problems with the report (e.g. it's hugetlbfs that
triggers this, not transparent huge pages) but it was fundamentally
correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
looks like this
vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0
pgoff 0 file ffff88106bdb9800 private_data (null)
flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
------------
kernel BUG at mm/hugetlb.c:462!
SMP
Modules linked in: xt_pkttype xt_LOG xt_limit [..]
CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
set_vma_resv_flags+0x2d/0x30
The VM_BUG_ON is correct because private and shared mappings have
different reservation accounting but the warning clearly shows that the
VMA is shared.
When a private COW fails to allocate a new page then only the process
that created the VMA gets the page -- all the children unmap the page.
If the children access that data in the future then they get killed.
The problem is that the same file is mapped shared and private. During
the COW, the allocation fails, the VMAs are traversed to unmap the other
private pages but a shared VMA is found and the bug is triggered. This
patch identifies such VMAs and skips them.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: SunDong <sund_sky@126.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95c2b17534 upstream.
Per-IRQ directories in procfs are created only when a handler is first
added to the irqdesc, not when the irqdesc is created. In the case of
a shared IRQ, multiple tasks can race to create a directory. This
race condition seems to have been present forever, but is easier to
hit with async probing.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit eddd3826a1 upstream.
Dmitry Vyukov reported the following using trinity and the memory
error detector AddressSanitizer
(https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
[ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
address ffff88002e280000
[ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
[ 124.578633] Accessed by thread T10915:
[ 124.579295] inlined in describe_heap_address
./arch/x86/mm/asan/report.c:164
[ 124.579295] #0 ffffffff810dd277 in asan_report_error
./arch/x86/mm/asan/report.c:278
[ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
./arch/x86/mm/asan/asan.c:37
[ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
[ 124.581893] #3 ffffffff8107c093 in get_wchan
./arch/x86/kernel/process_64.c:444
The address checks in the 64bit implementation of get_wchan() are
wrong in several ways:
- The lower bound of the stack is not the start of the stack
page. It's the start of the stack page plus sizeof (struct
thread_info)
- The upper bound must be:
top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
The 2 * sizeof(unsigned long) is required because the stack pointer
points at the frame pointer. The layout on the stack is: ... IP FP
... IP FP. So we need to make sure that both IP and FP are in the
bounds.
Fix the bound checks and get rid of the mix of numeric constants, u64
and unsigned long. Making all unsigned long allows us to use the same
function for 32bit as well.
Use READ_ONCE() when accessing the stack. This does not prevent a
concurrent wakeup of the task and the stack changing, but at least it
avoids TOCTOU.
Also check task state at the end of the loop. Again that does not
prevent concurrent changes, but it avoids walking for nothing.
Add proper comments while at it.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
- s/READ_ONCE/ACCESS_ONCE/
- Remove use of TOP_OF_KERNEL_STACK_PADDING, not defined here and would
be defined as 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 53960059d5 upstream.
If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
will cause DMA memory allocated for devices with a 32..63-bit
coherent_dma_mask to fall back to using __GFP_DMA, even though there may
only be 32-bits of physical address available anyway.
Correct that case to compare against a mask the size of phys_addr_t
instead of always using a 64-bit mask.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Fixes: a2e715a86c ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9610/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7c7feb2ebf upstream.
UBI: attaching mtd1 to ubi0
UBI: scanning is finished
UBI error: init_volumes: not enough PEBs, required 706, available 686
UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
UBI error: ubi_init: cannot attach mtd1
If available PEBs are not enough when initializing volumes, return -ENOSPC
directly. If available PEBs are not enough when initializing WL, return
-ENOSPC instead of -ENOMEM.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: David Gstir <david@sigma-star.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 281fda2767 upstream.
Make sure that data_size is less than LEB size.
Otherwise a handcrafted UBI image is able to trigger
an out of bounds memory access in ubi_compare_lebs().
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: David Gstir <david@sigma-star.at>
[bwh: Backported to 3.2: drop first argument to ubi_err()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 64c98e7f49 upstream.
Sanitizing the e820 map may produce extra E820 entries which would result in
the topmost E820 entries being removed. The removed entries would typically
include the top E820 usable RAM region and thus result in the domain having
signicantly less RAM available to it.
Fix by allowing sanitize_e820_map to use the full size of the allocated E820
array.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[bwh: Backported to 3.2:
s/xen_e820_map_entries/memmap.nr_entries/; s/xen_e820_map/map/g]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8474ba7419 upstream.
Make sure the compiler does not modify arguments of syscall functions.
This can happen if the compiler generates a tailcall to another
function. For example, without asmlinkage_protect sys_openat is compiled
into this function:
sys_openat:
clr.l %d0
move.w 18(%sp),%d0
move.l %d0,16(%sp)
jbra do_sys_open
Note how the fourth argument is modified in place, modifying the register
%d4 that gets restored from this stack slot when the function returns to
user-space. The caller may expect the register to be unmodified across
system calls.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 029cd03702 upstream.
ath9k inserts padding between the 802.11 header and the data area (to
align it). Since it didn't declare this extra required headroom, this
led to some nasty issues like randomly dropped packets in some setups.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 176fc2d577 upstream.
The in kernel snprintf() will conveniently return the actual length of
the printed string even if not given an output beffer at all so just do
that rather than relying on the user to pass in a suitable buffer,
ensuring that we don't need to worry if the buffer was truncated due to
the size of the buffer passed in.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b763ec17ac upstream.
If a read is attempted which is smaller than the line length then we may
underflow the subtraction we're doing with the unsigned size_t type so
move some of the calculation to be additions on the right hand side
instead in order to avoid this.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 275d7d44d8 upstream.
Poma (on the way to another bug) reported an assertion triggering:
[<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
[<ffffffff81150822>] __module_address+0x32/0x150
[<ffffffff81150956>] __module_text_address+0x16/0x70
[<ffffffff81150f19>] symbol_put_addr+0x29/0x40
[<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]
Laura Abbott <labbott@redhat.com> produced a patch which lead us to
inspect symbol_put_addr(). This function has a comment claiming it
doesn't need to disable preemption around the module lookup
because it holds a reference to the module it wants to find, which
therefore cannot go away.
This is wrong (and a false optimization too, preempt_disable() is really
rather cheap, and I doubt any of this is on uber critical paths,
otherwise it would've retained a pointer to the actual module anyway and
avoided the second lookup).
While its true that the module cannot go away while we hold a reference
on it, the data structure we do the lookup in very much _CAN_ change
while we do the lookup. Therefore fix the comment and add the
required preempt_disable().
Reported-by: poma <pomidorabelisima@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: a6e6abd575 ("module: remove module_text_address()")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 41e3025eac, which
was commit 6f691251c0 upstream.
The fix is only needed after commit f8f559422b ("KVM: MMU: fast
invalidate all mmio sptes"), included in Linux 3.11.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 117b8a10fe, which
was commit 29c4afc4e9 upstream. The bug
it fixes upstream clearly doesn't exist in 3.2.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 841df7df19 upstream.
Commit 6f6a6fda29 "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.
Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Fixes: 6f6a6fda29
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Roland Dreier <roland@kernel.org>
commit b1b4e435e4 upstream.
When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.
So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).
This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.
This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware. But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.
For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49a18d86f6 upstream.
As pointed out by Eric Dumazet, net->ipv6.ip6_rt_last_gc should
hold the last time garbage collector was run so that we should
update it whenever fib6_run_gc() calls fib6_clean_all(), not only
if we got there from ip6_dst_gc().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
commit 2ac3ac8f86 upstream.
On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.
This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.
Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
commit 575bf1d04e upstream.
perl.h from new Perl release doesn't like -Wundef and -Wswitch-default:
/usr/lib/perl5/core_perl/CORE/perl.h:548:5: error: "SILENT_NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if SILENT_NO_TAINT_SUPPORT && !defined(NO_TAINT_SUPPORT)
^
/usr/lib/perl5/core_perl/CORE/perl.h:556:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3471:0,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/sv.h:1455:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3472:0,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/regexp.h:436:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/hv.h:592:0,
from /usr/lib/perl5/core_perl/CORE/perl.h:3480,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_siphash_2_4’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:222:3: error: switch missing default case [-Werror=switch-default]
switch( left )
^
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_superfast’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:274:5: error: switch missing default case [-Werror=switch-default]
switch (rem) { \
^
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_murmur3’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:398:5: error: switch missing default case [-Werror=switch-default]
switch(bytes_in_carry) { /* how many bytes in carry */
^
Let's disable the warnings for code which uses perl.h.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372063394-20126-1-git-send-email-kirill@shutemov.name
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Vinson Lee <vlee@twopensource.com>
[ Upstream commit 41fc014332 ]
dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.
This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- s/portid/pid/
- Check whether fib_nl_fill_rule() returns < 0, as it may return > 0 on
success (thanks to Roland Dreier)]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Roland Dreier <roland@purestorage.com>
[ Upstream commit 25b4a44c19 ]
In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.
This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.
Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a951bc1e6b ]
The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:
1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
bond0 has the same mac address with eth0, it is MAC1.
2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
eth1 is backup, eth1 has MAC2.
3) ifconfig eth0 down
eth1 became active slave, bond will swap MAC for eth0 and eth1,
so eth1 has MAC1, and eth0 has MAC2.
4) ifconfig eth1 down
there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
5) ifconfig eth0 up
the eth0 became active slave again, the bond set eth0 to MAC1.
Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.
This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- bond_for_each_slave() takes an extra int paramter
- Use compare_ether_addr() instead of ether_addr_equal()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 03645a11a5 ]
ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.
This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 89c22d8c3b ]
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.
This is in fact bogus for the MSG_PEEK case as this is done without
any locking. So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.
This patch fixes this by only storing the result if the skb is not
shared. This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fecdf8be2d ]
pktgen_thread_worker() is obviously racy, kthread_stop() can come
between the kthread_should_stop() check and set_current_state().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fdd75ea8df ]
Calling connect() with an AF_TIPC socket would trigger a series
of error messages from SELinux along the lines of:
SELinux: Invalid class 0
type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> }
for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
permissive=0
This was due to a failure to initialize the security state of the new
connection sock by the tipc code, leaving it with junk in the security
class field and an unlabeled secid. Add a call to security_sk_clone()
to inherit the security state from the parent socket.
Reported-by: Tim Shearer <tim.shearer@overturenetworks.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b9a5322779 upstream.
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state. Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.
We already did this for the IPC semaphore code (see commit e8577d1f03:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8577d1f03 upstream.
ipc_addid() makes a new ipc identifier visible to everyone. New objects
start as locked, so that the caller can complete the initialization
after the call. Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.
Thus: Move the ipc_addid() to the end of the initialization.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Rik van Riel <riel@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- The error path being moved looks a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 48900cb6af upstream.
virtio declares support for NETIF_F_FRAGLIST, but assumes
that there are at most MAX_SKB_FRAGS + 2 fragments which isn't
always true with a fraglist.
A longer fraglist in the skb will make the call to skb_to_sgvec overflow
the sg array, leading to memory corruption.
Drop NETIF_F_FRAGLIST so we only get what we can handle.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 77751427a1 upstream.
Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.
If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)
The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.
Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6878d9e03 upstream.
In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
mdu_bitmap_file_t called "file".
5769 file = kmalloc(sizeof(*file), GFP_NOIO);
5770 if (!file)
5771 return -ENOMEM;
This structure is copied to user space at the end of the function.
5786 if (err == 0 &&
5787 copy_to_user(arg, file, sizeof(*file)))
5788 err = -EFAULT
But if bitmap is disabled only the first byte of "file" is initialized
with zero, so it's possible to read some bytes (up to 4095) of kernel
space memory from user space. This is an information leak.
5775 /* bitmap disabled, zero the first byte and copy out */
5776 if (!mddev->bitmap_info.file)
5777 file->pathname[0] = '\0';
Signed-off-by: Benjamin Randazzo <benjamin@randazzo.fr>
Signed-off-by: NeilBrown <neilb@suse.com>
[bwh: Backported to 3.2: patch both possible allocation calls]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cbb4be652d upstream.
Fix potential null-pointer dereference at probe by making sure that the
required endpoints are present.
The whiteheat driver assumes there are at least five pairs of bulk
endpoints, of which the final pair is used for the "command port". An
attempt to bind to an interface with fewer bulk endpoints would
currently lead to an oops.
Fixes CVE-2015-5257.
Reported-by: Moein Ghasemzadeh <moein@istuary.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 012572d4fc upstream.
The order of the following three spinlocks should be:
dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
But dlm_dispatch_assert_master() is called while holding
dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
dlm_grab() which will take dlm_domain_lock.
Once another thread (for example, dlm_query_join_handler) has already
taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
happens.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc57a7c680 upstream.
PARAVIRT_ADJUST_EXCEPTION_FRAME generates this code (using nmi as an
example, trimmed for readability):
ff 15 00 00 00 00 callq *0x0(%rip) # 2796 <nmi+0x6>
2792: R_X86_64_PC32 pv_irq_ops+0x2c
That's a call through a function pointer to regular C function that
does nothing on native boots, but that function isn't protected
against kprobes, isn't marked notrace, and is certainly not
guaranteed to preserve any registers if the compiler is feeling
perverse. This is bad news for a CLBR_NONE operation.
Of course, if everything works correctly, once paravirt ops are
patched, it gets nopped out, but what if we hit this code before
paravirt ops are patched in? This can potentially cause breakage
that is very difficult to debug.
A more subtle failure is possible here, too: if _paravirt_nop uses
the stack at all (even just to push RBP), it will overwrite the "NMI
executing" variable if it's called in the NMI prologue.
The Xen case, perhaps surprisingly, is fine, because it's already
written in asm.
Fix all of the cases that default to paravirt_nop (including
adjust_exception_frame) with a big hammer: replace paravirt_nop with
an asm function that is just a ret instruction.
The Xen case may have other problems, so document them.
This is part of a fix for some random crashes that Sasha saw.
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/8f5d2ba295f9d73751c33d97fda03e0495d9ade0.1442791737.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98ce94c8df upstream.
Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
10.10.5) share fails in case the clocks differ more than +/-2h:
digest-service: digest-request: od failed with 2 proto=ntlmv2
digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
Fix this by (re-)using the given server timestamp for the
ntlmv2 authentication (as Windows 7 does).
A related problem was also reported earlier by Namjae Jaen (see below):
Windows machine has extended security feature which refuse to allow
authentication when there is time difference between server time and
client time when ntlmv2 negotiation is used. This problem is prevalent
in embedded enviornment where system time is set to default 1970.
Modern servers send the server timestamp in the TargetInfo Av_Pair
structure in the challenge message [see MS-NLMP 2.2.2.1]
In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
use the server provided timestamp if present OR current time if it is
not
Reported-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dca7794539 upstream.
Some changes between xhci 0.96 and xhci 1.0 specifications forced us to
check the hci version in code, some of these checks were implemented as
hci_version == 1.0, which will not work with new xhci 1.1 controllers.
xhci 1.1 behaves similar to xhci 1.0 in these cases, so change these
checks to hci_version >= 1.0
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6809ffd16 upstream.
We want to give the command abortion an additional try to stop
the command ring before we completely hose xhci.
Tested-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: call handshake() rather than xhci_handshake()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff30cbc8da upstream.
Bits 1:0 of the bmAttributes are used for the burst multiplier.
The rest of the bits used to be reserved (zero), but USB3.1 takes bit 7
into use.
Use the existing USB_SS_MULT() macro instead to make sure the mult value
and hence max packet calculations are correct for USB3.1 devices.
Note that burst multiplier in bmAttributes is zero based and that
the USB_SS_MULT() macro adds one.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3afb112180 upstream.
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.
Reported-by: M A Young <m.a.young@durham.ac.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8d4bd0ed04 upstream.
The uc_sigmask in the ucontext structure is an array of words to keep
the 64 signal bits (or 1024 if you ask glibc but the kernel sigset_t
only has 64 bits).
For 64 bit the sigset_t contains a single 8 byte word, but for 31 bit
there are two 4 byte words. The compat signal handler code uses a
simple copy of the 64 bit sigset_t to the 31 bit compat_sigset_t.
As s390 is a big-endian architecture this is incorrect, the two words
in the 31 bit sigset_t array need to be swapped.
Reported-by: Stefan Liebler <stli@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2:
- Introduce local compat_sigset_t in setup_frame32()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c8f7710c1 upstream.
The previous fix of pxa library support, which was introduced to fix the
library dependency, broke the previous SoC behavior, where a machine
code binding pxa2xx-ac97 with a coded relied on :
- sound/soc/pxa/pxa2xx-ac97.c
- sound/soc/codecs/XXX.c
For example, the mioa701_wm9713.c machine code is currently broken. The
"select ARM" statement wrongly selects the soc/arm/pxa2xx-ac97 for
compilation, as per an unfortunate fate SND_PXA2XX_AC97 is both declared
in sound/arm/Kconfig and sound/soc/pxa/Kconfig.
Fix this by ensuring that SND_PXA2XX_SOC correctly triggers the correct
pxa2xx-ac97 compilation.
Fixes: 846172dfe3 ("ASoC: fix SND_PXA2XX_LIB Kconfig warning")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9b55613f42 upstream.
When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
IT state when entering a signal handler. This can cause the first
few instructions to be conditionally executed depending on the parent
context.
In any case, the original test for >= ARMv7 is broken - ARMv6 can have
Thumb-2 support as well, and an ARMv6T2 specific build would omit this
code too.
Relax the test back to ARMv6 or greater. This results in us always
clearing the IT state bits in the PSR, even on CPUs where these bits
are reserved. However, they're reserved for the IT state, so this
should cause no harm.
Fixes: d71e1352e2 ("Clear the IT state when invoking a Thumb-2 signal handler")
Acked-by: Tony Lindgren <tony@atomide.com>
Tested-by: H. Nikolaus Schaller <hns@goldelico.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6ecf830e50 upstream.
The ARM architecture reference specifies that the IT state bits in the
PSR must be all zeros in ARM mode or behavior is unspecified. On the
Qualcomm Snapdragon S4/Krait architecture CPUs the processor continues
to consider the IT state bits while in ARM mode. This makes it so
that some instructions are skipped by the CPU.
Signed-off-by: T.J. Purtell <tj@mobisocial.us>
[rmk+kernel@arm.linux.org.uk: fixed whitespace formatting in patch]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a30e577c96 upstream.
In btrfs_evict_inode, we properly truncate the page cache for evicted
inodes but then we call btrfs_wait_ordered_range for every inode as well.
It's the right thing to do for regular files but results in incorrect
behavior for device inodes for block devices.
filemap_fdatawrite_range gets called with inode->i_mapping which gets
resolved to the block device inode before getting passed to
wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi. What happens
next depends on whether there's an open file handle associated with the
inode. If there is, we write to the block device, which is unexpected
behavior. If there isn't, we through normally and inode->i_data is used.
We can also end up racing against open/close which can result in crashes
when i_mapping points to a block device inode that has been closed.
Since there can't be any page cache associated with special file inodes,
it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
the problem.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 005efedf2c upstream.
If a file has a range pointing to a compressed extent, followed by
another range that points to the same compressed extent and a read
operation attempts to read both ranges (either completely or part of
them), the pages that correspond to the second range are incorrectly
filled with zeroes.
Consider the following example:
File layout
[0 - 8K] [8K - 24K]
| |
| |
points to extent X, points to extent X,
offset 4K, length of 8K offset 0, length 16K
[extent X, compressed length = 4K uncompressed length = 16K]
If a readpages() call spans the 2 ranges, a single bio to read the extent
is submitted - extent_io.c:submit_extent_page() would only create a new
bio to cover the second range pointing to the extent if the extent it
points to had a different logical address than the extent associated with
the first range. This has a consequence of the compressed read end io
handler (compression.c:end_compressed_bio_read()) finish once the extent
is decompressed into the pages covering the first range, leaving the
remaining pages (belonging to the second range) filled with zeroes (done
by compression.c:btrfs_clear_biovec_end()).
So fix this by submitting the current bio whenever we find a range
pointing to a compressed extent that was preceded by a range with a
different extent map. This is the simplest solution for this corner
case. Making the end io callback populate both ranges (or more, if we
have multiple pointing to the same extent) is a much more complex
solution since each bio is tightly coupled with a single extent map and
the extent maps associated to the ranges pointing to the shared extent
can have different offsets and lengths.
The following test case for fstests triggers the issue:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_cloner
rm -f $seqres.full
test_clone_and_read_compressed_extent()
{
local mount_opts=$1
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
# Create a test file with a single extent that is compressed (the
# data we write into it is highly compressible no matter which
# compression algorithm is used, zlib or lzo).
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K" \
-c "pwrite -S 0xbb 4K 8K" \
-c "pwrite -S 0xcc 12K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now clone our extent into an adjacent offset.
$CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Same as before but for this file we clone the extent into a lower
# file offset.
$XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
-c "pwrite -S 0xbb 12K 8K" \
-c "pwrite -S 0xcc 20K 4K" \
$SCRATCH_MNT/bar | _filter_xfs_io
$CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/bar
echo "File digests before unmounting filesystem:"
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
# Evicting the inode or clearing the page cache before reading
# again the file would also trigger the bug - reads were returning
# all bytes in the range corresponding to the second reference to
# the extent with a value of 0, but the correct data was persisted
# (it was a bug exclusively in the read path). The issue happened
# only if the same readpages() call targeted pages belonging to the
# first and second ranges that point to the same compressed extent.
_scratch_remount
echo "File digests after mounting filesystem again:"
# Must match the same digests we got before.
md5sum $SCRATCH_MNT/foo | _filter_scratch
md5sum $SCRATCH_MNT/bar | _filter_scratch
}
echo -e "\nTesting with zlib compression..."
test_clone_and_read_compressed_extent "-o compress=zlib"
_scratch_unmount
echo -e "\nTesting with lzo compression..."
test_clone_and_read_compressed_extent "-o compress=lzo"
status=0
exit
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
[bwh: Backported to 3.2:
- Maintain prev_em_start in both functions calling __extent_read_full_page()
in a loop
- Adjust context and order]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 19ab6bc567 upstream.
This is intended to add ZTE device PIDs on kernel.
Signed-off-by: Liu.Zhao <lzsos369@163.com>
[johan: sort the new entries ]
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cb74be6fd upstream.
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]
The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.
The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887https://bugzilla.kernel.org/show_bug.cgi?id=42342https://bugzilla.kernel.org/show_bug.cgi?id=63841https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit a5e3985fa0
Author: Roman Zippel <zippel@linux-m68k.org>
Date: Tue Sep 6 15:18:47 2005 -0700
[PATCH] hfs: remove debug code
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e297c939b7 upstream.
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts. The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.
The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use. Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using). CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.
To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping(). Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.
The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d7 ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.
Fixes: 05af7bd2d7 ("[POWERPC] MPIC U3/U4 MSI backend")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2:
- powernv uses a private functions instead of msi_bitmap_free_hwirqs()
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a077224fd3 upstream.
While working on the 32-bit ARM port of UEFI, I noticed a strange
corruption in the kernel log. The following snprintf() statement
(in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
was producing the following output in the log:
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | |WB|WT|WC|UC]*
| | | | | |WB|WT|WC|UC]
|RUN| | | | | | | |UC]
|RUN| | | | | | | |UC]
As it turns out, this is caused by incorrect code being emitted for
the string() function in lib/vsprintf.c. The following code
if (!(spec.flags & LEFT)) {
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
}
for (i = 0; i < len; ++i) {
if (buf < end)
*buf = *s;
++buf; ++s;
}
while (len < spec.field_width--) {
if (buf < end)
*buf = ' ';
++buf;
}
when called with len == 0, triggers an issue in the GCC SRA optimization
pass (Scalar Replacement of Aggregates), which handles promotion of signed
struct members incorrectly. This is a known but as yet unresolved issue.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
case, it is causing the second while loop to be executed erroneously a
single time, causing the additional space characters to be printed.
So disable the optimization by passing -fno-ipa-sra.
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a068acf2ee upstream.
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.
Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:
$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory
This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.
[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Drop changes to overlayfs, reiserfs
- Drop vers option from cifs
- ceph changes are all in one file
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71c6da846b upstream.
Currently context size (cra_ctxsize) doesn't specified for
ghash_async_alg. Which means it's zero. Thus crypto_create_tfm()
doesn't allocate needed space for ghash_async_ctx, so any
read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid.
Signed-off-by: Andrey Ryabinin <aryabinin@odin.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eb38f3a4f6 upstream.
We've got bug reports showing the old systemd-logind (at least
system-210) aborting unexpectedly, and this turned out to be because
of an invalid error code from close() call to evdev devices. close()
is supposed to return only either EINTR or EBADFD, while the device
returned ENODEV. logind was overreacting to it and decided to kill
itself when an unexpected error code was received. What a tragedy.
The bad error code comes from flush fops, and actually evdev_flush()
returns ENODEV when device is disconnected or client's access to it is
revoked. But in these cases the fact that flush did not actually happen is
not an error, but rather normal behavior. For non-disconnected devices
result of flush is also not that interesting as there is no potential of
data loss and even if it fails application has no way of handling the
error. Because of that we are better off always returning success from
evdev_flush().
Also returning EINTR from flush()/close() is discouraged (as it is not
clear how application should handle this error), so let's stop taking
evdev->mutex interruptibly.
Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=939834
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: there's no revoked flag to test]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b632ffa7ce upstream.
We have many WR opcodes that are only supported in kernel space
and/or require optional information to be copied into the WR
structure. Reject all those not explicitly handled so that we
can't pass invalid information to drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 09bfda10e6 upstream.
With the radeon driver loaded the HP Compaq dc5750
Small Form Factor machine fails to resume from suspend.
Adding a quirk similar to other devices avoids
the problem and the system resumes properly.
Signed-off-by: Jeffery Miller <jmiller@neverware.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 51bc140431 upstream.
There have been many hard to track down bugs whereby userspace forgot to
flag a write buffer and then cause graphics corruption or a hung GPU
when that buffer was later purged under memory pressure (as the buffer
appeared clean, its pages would have been evicted rather than preserved
and any changes more recent than in the backing storage would be lost).
In retrospect this is a rare optimisation against memory pressure,
already the slow path. If we always mark the buffer as dirty when
accessed by the GPU, anything not used can still be evicted cheaply
(ideal behaviour for mark-and-sweep eviction) but we do not run the risk
of corruption. For correct read serialisation, userspace still has to
notify when the GPU writes to an object. However, there are certain
situations under which userspace may wish to tell white lies to the
kernel...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Goel, Akash" <akash.goel@intel.co>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 02bc933ebb upstream.
On Intel Baytrail, there is case when interrupt handler get called, no SPI
message is captured. The RX FIFO is indeed empty when RX timeout pending
interrupt (SSSR_TINT) happens.
Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
drivers are using IRQF_SHARED when calling the request_irq function. When
running two separate and independent SPI and HSUART application that
generate data traffic on both components, user will see messages like
below on the console:
pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
This commit will fix this by first checking Receiver Time-out Interrupt,
if it is disabled, ignore the request and return without servicing.
Signed-off-by: Tan, Jui Nee <jui.nee.tan@intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 35d4a0b63d upstream.
Fixes: 2a72f21226 ("IB/uverbs: Remove dev_table")
Before this commit there was a device look-up table that was protected
by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When
it was dropped and container_of was used instead, it enabled the race
with remove_one as dev might be freed just after:
dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but
before the kref_get.
In addition, this buggy patch added some dead code as
container_of(x,y,z) can never be NULL and so dev can never be NULL.
As a result the comment above ib_uverbs_open saying "the open method
will either immediately run -ENXIO" is wrong as it can never happen.
The solution follows Jason Gunthorpe suggestion from below URL:
https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html
cdev will hold a kref on the parent (the containing structure,
ib_uverbs_device) and only when that kref is released it is
guaranteed that open will never be called again.
In addition, fixes the active count scheme to use an atomic
not a kref to prevent WARN_ON as pointed by above comment
from Jason.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e99b139f1 upstream.
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
Fixes: fa417f7b52 ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6f1c17e16 upstream.
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.
The underlying kernel code cannot allocate that many contiguous pages.
There is no reason the underlying memory needs to be physically
contiguous.
This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
o this matches the module parameter description
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Add definition of qib_dev_warn(), added upstream by commit ddb8876589
("IB/qib: Convert opcode counters to per-context")]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9eb256eda upstream.
There is an issue with xfs's error reporting in some cases of I/O partially
failing and partially succeeding. Calls like fsync() can report success even
though not all I/O was successful in partial-failure cases such as one disk of
a RAID0 array being offline.
The issue can occur when there are more than one bio per xfs_ioend struct.
Each call to xfs_end_bio() for a bio completing will write a value to
ioend->io_error. If a successful bio completes after any failed bio, no
error is reported do to it writing 0 over the error code set by any failed bio.
The I/O error information is now lost and when the ioend is completed
only success is reported back up the filesystem stack.
xfs_end_bio() should only set ioend->io_error in the case of BIO_UPTODATE
being clear. ioend->io_error is initialized to 0 at allocation so only needs
to be updated by a failed bio. Also check that ioend->io_error is 0 so that
the first error reported will be the error code returned.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f5dcaf1fd upstream.
The unregister path of platform_device is broken. On registration, it
will register all resources with either a parent already set, or
type==IORESOURCE_{IO,MEM}. However, on unregister it will release
everything with type==IORESOURCE_{IO,MEM}, but ignore the others. There
are also cases where resources don't get registered in the first place,
like with devices created by of_platform_populate()*.
Fix the unregister path to be symmetrical with the register path by
checking the parent pointer instead of the type field to decide which
resources to unregister. This is safe because the upshot of the
registration path algorithm is that registered resources have a parent
pointer, and non-registered resources do not.
* It can be argued that of_platform_populate() should be registering
it's resources, and they argument has some merit. However, there are
quite a few platforms that end up broken if we try to do that due to
overlapping resources in the device tree. Until that is fixed, we need
to solve the immediate problem.
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3a496b00b6 upstream.
If the internal call to of_address_to_resource() fails, we end up
looping forever in of_find_matching_node_by_address(). This can be
caused by a defective device tree, or calling with an incorrect
matches argument.
Fix by calling of_find_matching_node() unconditionally at the end of
the loop.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 924f92bf12 upstream.
Most of the time this isn't an issue since hotplugging an adaptor will
trigger a crtc mode change which in turn, causes the driver to probe
every DisplayPort for a dpcd. However, in cases where hotplugging
doesn't cause a mode change (specifically when one unplugs a monitor
from a DisplayPort connector, then plugs that same monitor back in
seconds later on the same port without any other monitors connected), we
never probe for the dpcd before starting the initial link training. What
happens from there looks like this:
- GPU has only one monitor connected. It's connected via
DisplayPort, and does not go through an adaptor of any sort.
- User unplugs DisplayPort connector from GPU.
- Change in HPD is detected by the driver, we probe every
DisplayPort for a possible connection.
- Probe the port the user originally had the monitor connected
on for it's dpcd. This fails, and we clear the first (and only
the first) byte of the dpcd to indicate we no longer have a
dpcd for this port.
- User plugs the previously disconnected monitor back into the
same DisplayPort.
- radeon_connector_hotplug() is called before everyone else,
and tries to handle the link training. Since only the first
byte of the dpcd is zeroed, the driver is able to complete
link training but does so against the wrong dpcd, causing it
to initialize the link with the wrong settings.
- Display stays blank (usually), dpcd is probed after the
initial link training, and the driver prints no obvious
messages to the log.
In theory, since only one byte of the dpcd is chopped off (specifically,
the byte that contains the revision information for DisplayPort), it's
not entirely impossible that this bug may not show on certain monitors.
For instance, the only reason this bug was visible on my ASUS PB238
monitor was due to the fact that this monitor using the enhanced framing
symbol sequence, the flag for which is ignored if the radeon driver
thinks that the DisplayPort version is below 1.1.
Signed-off-by: Stephen Chandler Paul <cpaul@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ffeecc5213 upstream.
struct xfs_attr_leafblock contains 'entries' array which is declared
with size 1 altough it can in fact contain much more entries. Since this
array is followed by further struct members, gcc (at least in version
4.8.3) thinks that the array has the fixed size of 1 element and thus
may optimize away all accesses beyond the end of array resulting in
non-working code. This problem was only observed with userspace code in
xfsprogs, however it's better to be safe in kernel as well and have
matching kernel and xfsprogs definitions.
Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5556e7e6d3 upstream.
Consider eCryptfs dcache entries to be stale when the corresponding
lower inode's i_nlink count is zero. This solves a problem caused by the
lower inode being directly modified, without going through the eCryptfs
mount, leaving stale eCryptfs dentries cached and the eCryptfs inode's
i_nlink count not being cleared.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2:
- Test d_revalidate pointer directly rather than a DCACHE_OP flag
- Open-code d_inode()
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0521cfd06e upstream.
The ehci platform device's drvdata is the pointer of struct usb_hcd
already, so we doesn't need to call bus_to_hcd conversion again.
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: Unfortunately some EHCI platform sub-drivers
point drvdata to a private structure, so only create and remove the
attributes if drvdata has been set as expected.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f8786a9154 upstream.
Incoming packets in high speed are randomly corrupted by h/w
resulting in multiple errors. This workaround makes FS as
default mode in all affected socs by disabling HS chirp
signalling.This errata does not affect FS and LS mode.
Forces all HS devices to connect in FS mode for all socs
affected by this erratum:
P3041 and P2041 rev 1.0 and 1.1
P5020 and P5010 rev 1.0 and 2.0
P5040, P1010 and T4240 rev 1.0
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Nikhil Badola <nikhil.badola@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit efcbc04e16 upstream.
It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.
[pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>
NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.
When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).
If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.
So only treat O_EXCL specially if O_CREAT was also given.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: we only check open_flags, not createmode as well]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fe2b592173 upstream.
wf_unregister_client() increments the client count when a client
unregisters. That is obviously incorrect. Decrement that client count
instead.
Fixes: 75722d3992 ("[PATCH] ppc64: Thermal control for SMU based machines")
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 64526370d1 upstream.
Currently, devres_get() passes devres_free() the pointer to devres,
but devres_free() should be given with the pointer to resource data.
Fixes: 9ac7849e35 ("devres: device resource management")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bab383de3b upstream.
parport_find_base() will implicitly do parport_get_port() which
increases the refcount. Then parport_register_device() will again
increment the refcount. But while unloading the module we are only
doing parport_unregister_device() decrementing the refcount only once.
We add an parport_put_port() to neutralize the effect of
parport_get_port().
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f691251c0 upstream.
We got the bug that qemu complained with "KVM: unknown exit, hardware
reason 31" and KVM shown these info:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1
This is because we got a mmio #PF and the handler see the mmio spte becomes
normal (points to the ram page)
However, this is valid after introducing fast mmio spte invalidation which
increases the generation-number instead of zapping mmio sptes, a example
is as follows:
1. QEMU drops mmio region by adding a new memslot
2. invalidate all mmio sptes
3.
VCPU 0 VCPU 1
access the invalid mmio spte
access the region originally was MMIO before
set the spte to the normal ram map
mmio #PF
check the spte and see it becomes normal ram mapping !!!
This patch fixes the bug just by dropping the check in mmio handler, it's
good for backport. Full check will be introduced in later patches
Reported-by: Pavel Shirshov <ru.pchel@gmail.com>
Tested-by: Pavel Shirshov <ru.pchel@gmail.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: error code from handle_mmio_page_fault_common()
was not named]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5feb5d2003 upstream.
There is an "&&" vs "||" typo here so this loops 3000 times or if we get
unlucky it could loop forever.
Fixes: ceaa0a6eea ('usb: gadget: m66592-udc: add support for TEST_MODE')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7aa6ca4d39 upstream.
Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device
functions other than function 0, so that on multi-function devices, we will
always read VPD from function 0 instead of from the other functions.
[bhelgaas: changelog]
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
[bwh: Backported to 3.2:
- Put the class check in the new function as there is no
DECLARE_PCI_FIXUP_CLASS_EARLY(
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 932c435cab upstream.
Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through
function 0 to provide VPD access on other functions. This is for hardware
devices that provide copies of the same VPD capability registers in
multiple functions. Because the kernel expects that each function has its
own registers, both the locking and the state tracking are affected by VPD
accesses to different functions.
On such devices for example, if a VPD write is performed on function 0,
*any* later attempt to read VPD from any other function of that device will
hang. This has to do with how the kernel tracks the expected value of the
F bit per function.
Concurrent accesses to different functions of the same device can not only
hang but also corrupt both read and write VPD data.
When hangs occur, typically the error message:
vpd r/w failed. This is likely a firmware bug on this device.
will be seen.
Never set this bit on function 0 or there will be an infinite recursion.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3633ebebab upstream.
We already set a station to be associated when peering completes, both
in user space and in the kernel. Thus we should always have an
associated sta before sending data frames to that station.
Failure to check assoc state can cause crashes in the lower-level driver
due to transmitting unicast data frames before driver sta structures
(e.g. ampdu state in ath9k) are initialized. This occurred when
forwarding in the presence of fixed mesh paths: frames were transmitted
to stations with whom we hadn't yet completed peering.
Reported-by: Alexis Green <agreen@cococorp.com>
Tested-by: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1541dc977 upstream.
In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO".
But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class
and needs to be shifted to make space for the low-order interface byte.
Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code.
Fixes: 63c4408074 ("PCI: Add quirk for setting valid class for TI816X Endpoint")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Hemant Pedanekar <hemantp@ti.com>
[bwh: Backported to 3.2: the class check is done in this function as there
is no DECLARE_PCI_FIXUP_CLASS_EARLY()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a66b0c41ad upstream.
The input_dev is already gone when the rc device is being unregistered
so checking for its presence only means that no remove uevent will be
generated.
Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 397d425dc2 upstream.
In rare cases a directory can be renamed out from under a bind mount.
In those cases without special handling it becomes possible to walk up
the directory tree to the root dentry of the filesystem and down
from the root dentry to every other file or directory on the filesystem.
Like division by zero .. from an unconnected path can not be given
a useful semantic as there is no predicting at which path component
the code will realize it is unconnected. We certainly can not match
the current behavior as the current behavior is a security hole.
Therefore when encounting .. when following an unconnected path
return -ENOENT.
- Add a function path_connected to verify path->dentry is reachable
from path->mnt.mnt_root. AKA to validate that rename did not do
something nasty to the bind mount.
To avoid races path_connected must be called after following a path
component to it's next path component.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cde93be45a upstream.
A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root. For lack of a better term
I call this an escaped path.
prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.
__d_path only wants to see paths are connected to the root it passes
in. So __d_path needs prepend_path to return an error.
d_absolute_path similarly wants to see paths that are connected to
some root. Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1. So escaped paths will be treated like paths on lazily
unmounted mounts.
getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.
d_path is the interesting hold out. d_path just wants to print
something, and does not care about the weird cases. Which raises
the question what should be printed?
Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths. As there are not really any meaninful path components when
considered from the perspective of a mount tree.
So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 44922150d8 upstream.
If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:
ETRAP
ETRAP
VIS_ENTRY(fprs=0x4)
VIS_EXIT
RTRAP (kernel FPU restore with fpu_saved=0x4)
RTRAP
We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.
Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.
This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.
But this is not how trap returns from kernel to kernel operate.
The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.
Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.
Longer term we need to do something smarter to reinstate the partial
save optimizations. Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state. Instead,
the VISEntry et al. calls should be doing that work.
This bug is about two decades old.
Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to NG4memcpy.S and ksyms.c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f648f807f6 upstream.
Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
resetting the association overall_error_count. This allowed the association
to better enforce assoc.max_retrans limit.
However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
state. In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.
This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING. As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.
Fixes: Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f2777f53e upstream.
Since fc_fcp_cleanup_cmd() can sleep this function must not
be called while holding a spinlock. This patch avoids that
fc_fcp_cleanup_each_cmd() triggers the following bug:
BUG: scheduling while atomic: sg_reset/1512/0x00000202
1 lock held by sg_reset/1512:
#0: (&(&fsp->scsi_pkt_lock)->rlock){+.-...}, at: [<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Preemption disabled at:[<ffffffffc0225cd5>] fc_fcp_cleanup_each_cmd.isra.21+0xa5/0x150 [libfc]
Call Trace:
[<ffffffff816c612c>] dump_stack+0x4f/0x7b
[<ffffffff810828bc>] __schedule_bug+0x6c/0xd0
[<ffffffff816c87aa>] __schedule+0x71a/0xa10
[<ffffffff816c8ad2>] schedule+0x32/0x80
[<ffffffffc0217eac>] fc_seq_set_resp+0xac/0x100 [libfc]
[<ffffffffc0218b11>] fc_exch_done+0x41/0x60 [libfc]
[<ffffffffc0225cff>] fc_fcp_cleanup_each_cmd.isra.21+0xcf/0x150 [libfc]
[<ffffffffc0225f43>] fc_eh_device_reset+0x1c3/0x270 [libfc]
[<ffffffff814a2cc9>] scsi_try_bus_device_reset+0x29/0x60
[<ffffffff814a3908>] scsi_ioctl_reset+0x258/0x2d0
[<ffffffff814a2650>] scsi_ioctl+0x150/0x440
[<ffffffff814b3a9d>] sd_ioctl+0xad/0x120
[<ffffffff8132f266>] blkdev_ioctl+0x1b6/0x810
[<ffffffff811da608>] block_ioctl+0x38/0x40
[<ffffffff811b4e08>] do_vfs_ioctl+0x2f8/0x530
[<ffffffff811b50c1>] SyS_ioctl+0x81/0xa0
[<ffffffff816cf8b2>] system_call_fastpath+0x16/0x7a
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 660d0831d1 upstream.
In case of hw iscsi offload, an host can have N-number of active
connections. There can be IO's running on some connections which
make host->host_busy always TRUE. Now if logout from a connection
is tried then the code gets into an infinite loop as host->host_busy
is always TRUE.
iscsi_conn_teardown(....)
{
.........
/*
* Block until all in-progress commands for this connection
* time out or fail.
*/
for (;;) {
spin_lock_irqsave(session->host->host_lock, flags);
if (!atomic_read(&session->host->host_busy)) { /* OK for ERL == 0 */
spin_unlock_irqrestore(session->host->host_lock, flags);
break;
}
spin_unlock_irqrestore(session->host->host_lock, flags);
msleep_interruptible(500);
iscsi_conn_printk(KERN_INFO, conn, "iscsi conn_destroy(): "
"host_busy %d host_failed %d\n",
atomic_read(&session->host->host_busy),
session->host->host_failed);
................
...............
}
}
This is not an issue with software-iscsi/iser as each cxn is a separate
host.
Fix:
Acquiring eh_mutex in iscsi_conn_teardown() before setting
session->state = ISCSI_STATE_TERMINATE.
Signed-off-by: John Soni Jose <sony.john@avagotech.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0dc3c8bc1 upstream.
When using nested btrees, the top leaves of the top levels contain
block addresses for the root of the next tree down. If we shadow a
shared leaf node the leaf values (sub tree roots) should be incremented
accordingly.
This is only an issue if there is metadata sharing in the top levels.
Which only occurs if metadata snapshots are being used (as is possible
with dm-thinp). And could result in a block from the thinp metadata
snap being reused early, thus corrupting the thinp metadata snap.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2:
- Drop change to remove_one()
- Remove const pointer qualifications]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0a2a66024 upstream.
The commit 738ac1ebb9 ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram. This is because skb_set_peeked may create
a new skb and free the existing one. As it stands the caller will
continue to use the old freed skb.
This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).
Fixes: 738ac1ebb9 ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 738ac1ebb9 upstream.
Shared skbs must not be modified and this is crucial for broadcast
and/or multicast paths where we use it as an optimisation to avoid
unnecessary cloning.
The function skb_recv_datagram breaks this rule by setting peeked
without cloning the skb first. This causes funky races which leads
to double-free.
This patch fixes this by cloning the skb and replacing the skb
in the list when setting skb->peeked.
Fixes: a59322be07 ("[UDP]: Only increment counter on first peek/recv")
Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 209f7512d0 upstream.
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
ocfs2_downconvert_thread_do_work can be triggered in the following case:
ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
processed, and then processes the dentry lockres. During the dentry
put, it calls iput and then deletes rw, inode and open lockres from
blocked list in ocfs2_mark_lockres_freeing. And this causes the
variable `processed' to not reflect the number of blocked lockres to be
processed, which triggers the BUG.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46011e6ea3 upstream.
On MIPS the GLOBAL bit of the PTE must have the same value in any
aligned pair of PTEs. These pairs of PTEs are referred to as
"buddies". In a SMP system is is possible for two CPUs to be calling
set_pte() on adjacent PTEs at the same time. There is a race between
setting the PTE and a different CPU setting the GLOBAL bit in its
buddy PTE.
This race can be observed when multiple CPUs are executing
vmap()/vfree() at the same time.
Make setting the buddy PTE's GLOBAL bit an atomic operation to close
the race condition.
The case of CONFIG_64BIT_PHYS_ADDR && CONFIG_CPU_MIPS32 is *not*
handled.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10835/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fed66e2cdd upstream.
Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.
Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.
Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 468b732b6f upstream.
"len" is a signed integer. We check that len is not negative, so it
goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.
I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.
Fixes: a8c879a7ee ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7895086afd upstream.
We need to check that a TRB is part of the current segment
before calculating its DMA address.
Previously a ring segment didn't use a full memory page, and every
new ring segment got a new memory page, so the off by one
error in checking the upper bound was never seen.
Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
didn't catch the case when a TRB was the first element of the next segment.
This is triggered if the virtual memory pages for a ring segment are
next to each in increasing order where the ring buffer wraps around and
causes errors like:
[ 106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
[ 106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0
The trb-end address is one outside the end-seg address.
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d62d73755 upstream.
p->thread.user_cpus_allowed is zero-initialized and is only filled on
the first sched_setaffinity call.
To avoid adding overhead in the task initialization codepath, simply OR
the returned mask in sched_getaffinity with p->cpus_allowed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10740/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: also convert from obsolete cpumask API]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c395170a5 upstream.
If an initiator doesn't have any real LUNs assigned, we should report
LUN 0 and a LUN list length of 1. Some versions of Solaris at least
go beserk if we report a LUN list length of 0.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 423f04d63c upstream.
raid1_end_read_request() assumes that the In_sync bits are consistent
with the ->degaded count.
raid1_spare_active updates the In_sync bit before the ->degraded count
and so exposes an inconsistency, as does error()
So extend the spinlock in raid1_spare_active() and error() to hide those
inconsistencies.
This should probably be part of
Commit: 34cab6f420 ("md/raid1: fix test for 'was read error from
last working device'.")
as it addresses the same issue. It fixes the same bug and should go
to -stable for same reasons.
Fixes: 76073054c9 ("md/raid1: clean up read_balance.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9547308bda upstream.
Make sure all non-READ SCSI commands get targ_xfer_tag initialized
to 0xffffffff, not just WRITEs.
Double-free of a TUR cmd object occurs under the following scenario:
1. TUR received (targ_xfer_tag is uninitialized and left at 0)
2. TUR status sent
3. First unsolicited NOPIN is sent to initiator (gets targ_xfer_tag of 0)
4. NOPOUT for NOPIN (with TTT=0) arrives
- its ExpStatSN acks TUR status, TUR is queued for removal
- LIO tries to find NOPIN with TTT=0, but finds the same TUR instead,
TUR is queued for removal for the 2nd time
(Drop unbalanced conditional bracket usage - nab)
Signed-off-by: Alexei Potashnik <alexei@purestorage.com>
Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2:
- Adjust context
- Keep the braces around the if-block]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7447223323 upstream.
Add support for the Sierra Wireless AR8550 device with
USB descriptor 0x1199, 0x68AB.
It is common with MC879x modules 1199:683c/683d which
also are composite devices with 7 interfaces (0..6)
and also MDM62xx based as the AR8550.
The major difference are only the interface attributes
02/02/01 on interfaces 3 and 4 on the AR8550. They are
vendor specific ff/ff/ff on MC879x modules.
lsusb reports:
Bus 001 Device 004: ID 1199:68ab Sierra Wireless, Inc.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1199 Sierra Wireless, Inc.
idProduct 0x68ab
bcdDevice 0.06
iManufacturer 3 Sierra Wireless, Incorporated
iProduct 2 AR8550
iSerial 0
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 198
bNumInterfaces 7
bConfigurationValue 1
iConfiguration 1 Sierra Configuration
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 1
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x83 EP 3 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x03 EP 3 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x85 EP 5 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x04 EP 4 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 4
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 2 Communications
bInterfaceSubClass 2 Abstract (modem)
bInterfaceProtocol 1 AT-commands (v.25ter)
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x87 EP 7 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x05 EP 5 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 5
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x88 EP 8 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x89 EP 9 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x06 EP 6 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 6
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 255 Vendor Specific Subclass
bInterfaceProtocol 255 Vendor Specific Protocol
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8a EP 10 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 5
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x8b EP 11 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x07 EP 7 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 32
Device Qualifier (for other device speed):
bLength 10
bDescriptorType 6
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
bNumConfigurations 1
Device Status: 0x0001
Self Powered
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Lars Melin <larsm17@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f898c522f0 upstream.
This patch removes a bogus BUG_ON in the ablkcipher path that
triggers when the destination buffer is different from the source
buffer and is scattered.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30b03d05e0 upstream.
While gntdev_release() is called the MMU notifier is still registered
and can traverse priv->maps list even if no pages are mapped (which is
the case -- gntdev_release() is called after all). But
gntdev_release() will clear that list, so make sure that only one of
those things happens at the same time.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1401c00e59 upstream.
Unmapping may require sleeping and we unmap while holding priv->lock, so
convert it to a mutex.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to functions we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a78bb11d7a upstream.
There are some log tail updates that are not protected by j_checkpoint_mutex.
Some of these are harmless because they happen during startup or shutdown but
updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can
really race with other log tail updates (e.g. someone doing
jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So
protect all log tail updates with j_checkpoint_mutex.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- Add unlock on the error path in jbd2_journal_flush()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Bartosz Kwitniewski <zerg2000@astral.org.pl>
output_core.c, added in 3.2.66, is only needed and can only be
compiled when CONFIG_INET is enabled.
The condition in the Makefile is already correct upstream.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d1cb7f658 upstream.
Add the correct dB ranges of Bose Companion 5 and Drangonfly DAC 1.2.
Signed-off-by: Yao-Wen Mao <yaowen@google.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7932c0bd77 upstream.
While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 417c20a9bd upstream.
This patch fixes a use-after-free bug in iscsit_release_sessions_for_tpg()
where se_portal_group->session_lock was incorrectly released/re-acquired
while walking the active se_portal_group->tpg_sess_list.
The can result in a NULL pointer dereference when iscsit_close_session()
shutdown happens in the normal path asynchronously to this code, causing
a bogus dereference of an already freed list entry to occur.
To address this bug, walk the session list checking for the same state
as before, but move entries to a local list to avoid dropping the lock
while walking the active list.
As before, signal using iscsi_session->session_restatement=1 for those
list entries to be released locally by iscsit_free_session() code.
Reported-by: Sunilkumar Nadumuttlu <sjn@datera.io>
Cc: Sunilkumar Nadumuttlu <sjn@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34cab6f420 upstream.
When we get a read error from the last working device, we don't
try to repair it, and don't fail the device. We simple report a
read error to the caller.
However the current test for 'is this the last working device' is
wrong.
When there is only one fully working device, it assumes that a
non-faulty device is that device. However a spare which is rebuilding
would be non-faulty but so not the only working device.
So change the test from "!Faulty" to "In_sync". If ->degraded says
there is only one fully working device and this device is in_sync,
this must be the one.
This bug has existed since we allowed read_balance to read from
a recovering spare in v3.0
Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 76073054c9 ("md/raid1: clean up read_balance.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 968491709e upstream.
This patch fixes a problem in the usbtouchscreen driver for DMC TSC-30
touch screen. Due to a missing delay between the RESET and SET_RATE
commands, the touch screen may become unresponsive during system startup or
driver loading.
According to the DMC documentation, a delay is needed after the RESET
command to allow the chip to complete its internal initialization. As this
delay is not guaranteed, we had a system where the touch screen
occasionally did not send any touch data. There was no other indication of
the problem.
The patch fixes the problem by adding a 150ms delay between the RESET and
SET_RATE commands.
Suggested-by: Jakob Mustafa <jakob.mustafa@bytecmed.com>
Signed-off-by: Bernhard Bender <bernhard.bender@bytecmed.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3f81d2447b upstream.
We were previously using free_bootmem() and just getting lucky
that nothing too bad happened.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5fb2c782f4 upstream.
This device automatically switches itself to another mode (0x1405)
unless the specific access pattern of Windows is followed in its
initial mode. That makes a dirty unmount of the internal storage
devices inevitable if they are mounted. So the card reader of
such a device should be ignored, lest an unclean removal become
inevitable.
This replaces an earlier patch that ignored all LUNs of this device.
That patch was overly broad.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aca3a0489a upstream.
Port link change with port in resume state should not be
reported to usbcore, as this is an internal state to be
handled by xhci driver. Reporting PLC to usbcore may
cause usbcore clearing PLC first and port change event irq
won't be generated.
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust indentation
- s/raw_port_status/temp/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 243292a2ad upstream.
xhci_hub_report_usb3_link_state() returns pls as U0 when the link
is in resume state, and this causes usb core to think the link is in
U0 while actually it's in resume state. When usb core transfers
control request on the link, it fails with TRB error as the link
is not ready for transfer.
To fix the issue, report U3 when the link is in resume state, thus
usb core knows the link it's not ready for transfer.
Signed-off-by: Zhuang Jin Can <jin.can.zhuang@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 326124a027 upstream.
When resetting a device the number of active TTs may need to be
corrected by xhci_update_tt_active_eps, but the number of old active
endpoints supplied to it was always zero, so the number of TTs and the
bandwidth reserved for them was not updated, and could rise
unnecessarily.
This affected systems using Intel's Patherpoint chipset, which rely on
software bandwidth checking. For example, a Lenovo X230 would lose the
ability to use ports on the docking station after enough suspend/resume
cycles because the bandwidth calculated would rise with every cycle when
a suitable device is attached.
The correct number of active endpoints is calculated in the same way as
in xhci_reserve_bandwidth.
Signed-off-by: Brian Campbell <bacam@z273.org.uk>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3496810663 upstream.
virt_dev->num_cached_rings counts on freed ring and is not updated
correctly. In xhci_free_or_cache_endpoint_ring() function, the free ring
is added into cache and then num_rings_cache is incremented as below:
virt_dev->ring_cache[rings_cached] =
virt_dev->eps[ep_index].ring;
virt_dev->num_rings_cached++;
here, free ring pointer is added to a current index and then
index is incremented.
So current index always points to empty location in the ring cache.
For getting available free ring, current index should be decremented
first and then corresponding ring buffer value should be taken from ring
cache.
But In function xhci_endpoint_init(), the num_rings_cached index is
accessed before decrement.
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
virt_dev->num_rings_cached--;
This is bug in manipulating the index of ring cache.
And it should be as below:
virt_dev->num_rings_cached--;
virt_dev->eps[ep_index].new_ring =
virt_dev->ring_cache[virt_dev->num_rings_cached];
virt_dev->ring_cache[virt_dev->num_rings_cached] = NULL;
Signed-off-by: Aman Deep <aman.deep@samsung.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4b31814d20 upstream.
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aebda61871 upstream.
This fixes an issue introduced in commit b23c843992 (usb: dwc3:
gadget: fix DEPSTARTCFG for non-EP0 EPs) that made sure we would
only use DEPSTARTCFG once per SetConfig.
The trick is that we should use one DEPSTARTCFG per SetConfig *OR*
SetInterface. SetInterface was completely missed from the original
patch.
This problem became aparent after commit 76e838c9f7 (usb: dwc3:
gadget: return error if command sent to DEPCMD register fails)
added checking of the return status of device endpoint commands.
'Set Endpoint Transfer Resource' command was caught failing
occasionally. This is because the Transfer Resource
Index was not getting reset during a SET_INTERFACE request.
Finally, to fix the issue, was we have to do is make sure that
our start_config_issued flag gets reset whenever we receive a
SetInterface request.
To verify the problem (and its fix), all we have to do is run
test 9 from testusb with 'testusb -t 9 -s 2048 -a -c 5000'.
Tested-by: Huang Rui <ray.huang@amd.com>
Tested-by: Subbaraya Sundeep Bhatta <subbaraya.sundeep.bhatta@xilinx.com>
Fixes: b23c843992 (usb: dwc3: gadget: fix DEPSTARTCFG for non-EP0 EPs)
Signed-off-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: use dev_vdbg() instead of dwc3_trace()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0848f6428b upstream.
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f42 ("packet: Add pre-defragmentation support for ipv4 fanouts.")
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4479004e64 upstream.
If we don't do this, and we then fail to recreate the debugfs
directory during a mode change, then we will fail later trying
to add stations to this now bogus directory:
BUG: unable to handle kernel NULL pointer dereference at 0000006c
IP: [<c0a92202>] mutex_lock+0x12/0x30
Call Trace:
[<c0678ab4>] start_creating+0x44/0xc0
[<c0679203>] debugfs_create_dir+0x13/0xf0
[<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211]
Signed-off-by: Tom Hughes <tom@compton.nu>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4934b0329f upstream.
This makes lines shorter and simplifies further patching.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Prerequisite of "net: Clone skb before setting peeked flag"]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d531be2ca2 upstream.
I have a ST4000DM000 disk. If Linux is booted while the disk is spun down,
the command that sets transfer mode causes the disk to spin up. The
spin-up takes longer than the default 5s timeout, so the command fails and
timeout is reported.
Fix this by increasing the timeout to 15s, which is enough for the disk to
spin up.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71d126fd28 upstream.
Some devices lose data on TRIM whether queued or not. This patch adds
a horkage to disable TRIM.
tj: Collapsed unnecessary if() nesting.
Signed-off-by: Arne Fitzenreiter <arne_f@ipfire.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Drop change to show_ata_dev_trim()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08c85d2a59 upstream.
Enabling AA on HP 250GB SATA disk VB0250EAVER causes errors:
[ 3.788362] ata3.00: failed to enable AA (error_mask=0x1)
[ 3.789243] ata3.00: failed to enable AA (error_mask=0x1)
Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk.
tj: Collected FPDMA_AA entries and updated comment.
Signed-off-by: Aleksei Mamlin <mamlinav@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 945b47441d upstream.
This commit adds the necessary quirk to make the Marvell 4140 SATA PMP
work properly. This PMP doesn't like SRST on port number 4 (the host
port) so this commit marks this port as not supporting SRST.
Signed-off-by: Lior Amsalem <alior@marvell.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4fabb59449 upstream.
Fixes: 3e0249f9c0 ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118 if (atomic_dec_and_test(&rds_ibdev->refcount))
119 queue_work(rds_wq, &rds_ibdev->free_work);
120 }
fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed95876264 upstream.
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.
For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount
# Create our test files. File foo has the same 2K of data at offset 4K
# as file bar has at its offset 0.
$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
-c "pwrite -S 0xbb 4k 2K" \
-c "pwrite -S 0xcc 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# File bar consists of a single inline extent (2K size).
$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
$SCRATCH_MNT/bar | _filter_xfs_io
# Now call the clone ioctl to clone the extent of file bar into file
# foo at its offset 4K. This made file foo have an inline extent at
# offset 4K, something which the btrfs code can not deal with in future
# IO operations because all inline extents are supposed to start at an
# offset of 0, resulting in all sorts of chaos.
# So here we validate that clone ioctl returns an EOPNOTSUPP, which is
# what it returns for other cases dealing with inlined extents.
$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/foo
# Because of the inline extent at offset 4K, the following write made
# the kernel crash with a BUG_ON().
$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
status=0
exit
The stack trace of the BUG_ON() triggered by the last write is:
[152154.035903] ------------[ cut here ]------------
[152154.036424] kernel BUG at mm/page-writeback.c:2286!
[152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
[152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G W 4.1.0-rc6-btrfs-next-11+ #2
[152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
[152154.036424] RIP: 0010:[<ffffffff8111a9d5>] [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
[152154.036424] RSP: 0018:ffff880429effc68 EFLAGS: 00010246
[152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
[152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
[152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
[152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
[152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
[152154.036424] FS: 00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
[152154.036424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
[152154.036424] Stack:
[152154.036424] ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
[152154.036424] ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
[152154.036424] 0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
[152154.036424] Call Trace:
[152154.036424] [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
[152154.036424] [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
[152154.036424] [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
[152154.036424] [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
[152154.036424] [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
[152154.036424] [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
[152154.036424] [<ffffffff81165f89>] vfs_write+0xa0/0xe4
[152154.036424] [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
[152154.036424] [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
[152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
[152154.036424] RIP [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
[152154.036424] RSP <ffff880429effc68>
[152154.242621] ---[ end trace e3d3376b23a57041 ]---
Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:
00fdf13a2e ("Btrfs: fix a crash of clone with inline extents's split")
3f9e3df8da ("btrfs: replace error code from btrfs_drop_extents")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[bwh: Backported to 3.2: test new_key.offset as last_dest_end isn't defined
in this function]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e47994dd44 upstream.
The sfpc inline assembly within execve_tail() may incorrectly set bits
28-31 of the sfpc instruction to a value which is not zero.
These bits however are currently unused and therefore should be zero
so we won't get surprised if these bits will be used in the future.
Therefore remove the second operand from the inline assembly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2c17d27c36 upstream.
Incoming packet should be either in backlog queue or
in RCU read-side section. Otherwise, the final sequence of
flush_backlog() and synchronize_net() may miss packets
that can run without device reference:
CPU 1 CPU 2
skb->dev: no reference
process_backlog:__skb_dequeue
process_backlog:local_irq_enable
on_each_cpu for
flush_backlog => IPI(hardirq): flush_backlog
- packet not found in backlog
CPU delayed ...
synchronize_net
- no ongoing RCU
read-side sections
netdev_run_todo,
rcu_barrier: no
ongoing callbacks
__netif_receive_skb_core:rcu_read_lock
- too late
free dev
process packet for freed dev
Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- No need to rename the label in __netif_receive_skb()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9e4dd3267 upstream.
commit 381c759d99 ("ipv4: Avoid crashing in ip_error")
fixes a problem where processed packet comes from device
with destroyed inetdev (dev->ip_ptr). This is not expected
because inetdev_destroy is called in NETDEV_UNREGISTER
phase and packets should not be processed after
dev_close_many() and synchronize_net(). Above fix is still
required because inetdev_destroy can be called for other
reasons. But it shows the real problem: backlog can keep
packets for long time and they do not hold reference to
device. Such packets are then delivered to upper levels
at the same time when device is unregistered.
Calling flush_backlog after NETDEV_UNREGISTER_FINAL still
accounts all packets from backlog but before that some packets
continue to be delivered to upper levels long after the
synchronize_net call which is supposed to wait the last
ones. Also, as Eric pointed out, processed packets, mostly
from other devices, can continue to add new packets to backlog.
Fix the problem by moving flush_backlog early, after the
device driver is stopped and before the synchronize_net() call.
Then use netif_running check to make sure we do not add more
packets to backlog. We have to do it in enqueue_to_backlog
context when the local IRQ is disabled. As result, after the
flush_backlog and synchronize_net sequence all packets
should be accounted.
Thanks to Eric W. Biederman for the test script and his
valuable feedback!
Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Fixes: 6e583ce524 ("net: eliminate refcounting in backlog queue")
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b7339f4c3 upstream.
Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().
Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.
For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4f7d2cdfdd upstream.
Jason Gunthorpe reported that since commit c02db8c629 ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].
Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.
Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.
Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).
Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c629 ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop unsupported attributes
- Use ndo_set_vf_tx_rate operation, not ndo_set_vf_rate]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01447e9f04 upstream.
legacy setcrtc ioctl does take a 32 bit value which might indeed
overflow
the checks of crtc_req->x > INT_MAX and crtc_req->y > INT_MAX aren't
needed any more with this
v2: -polish the annotation according to Daniel's comment
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d97e91548 upstream.
The crtc x/y panning coordinates are stored as signed integers
internally. The user provides them as unsigned, so we should check
that the user provided values actually fit in the internal datatypes.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f9c87a6f46 upstream.
If the kernel is compiled with gcc 5.1 and the XZ compression option
the decompress_kernel function calls _sclp_print_early in 64-bit mode
while the content of the upper register half of %r6 is non-zero.
This causes a specification exception on the servc instruction in
_sclp_servc.
The _sclp_print_early function saves and restores the upper registers
halves but it fails to clear them for the 31-bit code of the mini sclp
driver.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c7518794a upstream.
Allocate memory using GFP_NOIO when deleting a btree. dm_btree_del()
can be called via an ioctl and we don't want to recurse into the FS or
block layer.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f98a7aa81e upstream.
Add the USB serial console device ID for Aruba Networks 7xxx series
controllers which have a USB port for their serial console.
Signed-off-by: Peter Sanford <peter@sanford.io>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a822c83e47 upstream.
Given the pool's cell_sort_array holds 8192 pointers it triggers an
order 5 allocation via kmalloc. This order 5 allocation is prone to
failure as system memory gets more fragmented over time.
Fix this by allocating the cell_sort_array using vmalloc.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2: make a similar change in prison_{create,destroy}()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7444a072c3 upstream.
ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- s/ext4_free_data_cachep/ext4_free_ext_cachep/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a84b69cb6e upstream.
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db1385624c upstream.
Legacy NMI watchdog didn't work after migration/resume, because
vapics_in_nmi_mode was left at 0.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- s/kvm_apic_get_reg/apic_get_reg/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 42720138b0 upstream.
Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
so I'm including this one for stable@ as well. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dd302b59bd upstream.
br_nf_dev_queue_xmit must free skb in its error path.
NF_DROP is misleading -- its an okfn, not a netfilter hook.
Fixes: 462fb2af97 ("bridge : Sanitize skb before it enters the IP stack")
Fixes: efb6de9b4b ("netfilter: bridge: forward IPv6 fragmented packets")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2:
- Adjust filename
- Drop IPv6 changes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c45653c341 upstream.
Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
deadlocks in the page writeback path.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd7ade3cd9 upstream.
sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.
This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b5e6454aa upstream.
A buffer cache is allocated from movable area because it is referred
for a while and released soon. But some filesystems are taking buffer
cache for a long time and it can disturb page migration.
New APIs are introduced to allocate buffer cache with user specific
flag. *_gfp APIs are for user want to set page allocation flag for
page cache allocation. And *_unmovable APIs are for the user wants to
allocate page cache from non-movable area.
Signed-off-by: Gioh Kim <gioh.kim@lge.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Prerequisite for "bufferhead: Add _gfp version for sb_getblk()".
Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c04be18448 upstream.
ACPICA commit 90f5332a15e9d9ba83831ca700b2b9f708274658
This patch adds a new FACS initialization flag for acpi_tb_initialize().
acpi_enable_subsystem() might be invoked several times in OS bootup process,
and we don't want FACS initialization to be invoked twice. Lv Zheng.
Link: https://github.com/acpica/acpica/commit/90f5332a
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0689a86ae8 upstream.
The Steinberg MI2 and MI4 interfaces are compatible with the USB class
audio spec, but the MIDI part of the devices is reported as a vendor
specific interface.
This patch adds entries to quirks-table.h to recognize the MIDI
endpoints. Audio functionality was already working and is unaffected by
this change.
Signed-off-by: Dominic Sacré <dominic.sacre@gmx.de>
Signed-off-by: Albert Huitsing <albert@huitsing.nl>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ad0b3255a upstream.
fc->release is called from fuse_conn_put() which was used in the error
cleanup before fc->release was initialized.
[Jeremiah Mahler <jmmahler@gmail.com>: assign fc->release after calling
fuse_conn_init(fc) instead of before.]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Fixes: a325f9b922 ("fuse: update fuse_conn_init() and separate out fuse_conn_kill()")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 82cd003a77 upstream.
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
Fixes: http://tracker.ceph.com/issues/2759
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae9d8f1711 upstream.
While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.
This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.
This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:
[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075] ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075] ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075] ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075] [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[132408.505075] [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[132408.505075] [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
[132408.505075] [<ffffffff81155733>] __cache_free+0xe2/0x4b6
[132408.505075] [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075] [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075] [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
[132408.505075] [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
[132408.505075] [<ffffffff811563a1>] ? kfree+0xb6/0x14e
[132408.505075] [<ffffffff811563d0>] kfree+0xe5/0x14e
[132408.505075] [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075] [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075] [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075] [<ffffffff8106698f>] kthread+0xef/0xf7
[132408.505075] [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
[132408.505075] [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] [<ffffffff814653d2>] ret_from_fork+0x42/0x70
[132408.505075] [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!
Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.
Fixes: 1c70d8fb4d ("Btrfs: fix inode caching vs tree log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c3f4a1685b upstream.
The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:
/*
* (...)
*
* Don't free memory not originally allocated by kmalloc()
* or you will run into trouble.
*/
So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2528a8b8f4 upstream.
bitmap_parselist("", &mask, nmaskbits) will erroneously set bit zero in
the mask. The same bug is visible in cpumask_parselist() since it is
layered on top of the bitmask code, e.g. if you boot with "isolcpus=",
you will actually end up with cpu zero isolated.
The bug was introduced in commit 4b060420a5 ("bitmap, irq: add
smp_affinity_list interface to /proc/irq") when bitmap_parselist() was
generalized to support userspace as well as kernelspace.
Fixes: 4b060420a5 ("bitmap, irq: add smp_affinity_list interface to /proc/irq")
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b88f44e16 upstream.
While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has
ps->infix.cnt--;
return ps->infix.string[ps->infix.tail++];
The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.
Luckily, only root can write to the filter.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4875bbe7e upstream.
When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).
# echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter
That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.
Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8830a4e71 upstream.
This commit fix kernel crash when probing for rfkill devices in dell-laptop
driver failed. Function free_page() was incorrectly used on struct page *
instead of virtual address of SMI buffer.
This commit also simplify allocating page for SMI buffer by using
__get_free_page() function instead of sequential call of functions
alloc_page() and page_address().
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c5f3b1a51a upstream.
The kmemleak scanning thread can run for minutes. Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock. Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ... -> __delete_object() function until the lock is
released in scan_object().
When a kmemleak error occurs (e.g. it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects. If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()). This leads to kmemleak_scan potentially triggering a
page fault.
This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed. The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to kmemleak_free_percpu()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f1590670ce upstream.
Current implementation of descriptor init procedure only takes
care about setting/clearing ownership flag in "des0"/"des1"
fields while it is perfectly possible to get unexpected bits
set because of the following factors:
[1] On driver probe underlying memory allocated with
dma_alloc_coherent() might not be zeroed and so
it will be filled with garbage.
[2] During driver operation some bits could be set by SD/MMC
controller (for example error flags etc).
And unexpected and/or randomly set flags in "des0"/"des1"
fields may lead to unpredictable behavior of GMAC DMA block.
This change addresses both items above with:
[1] Use of dma_zalloc_coherent() instead of simple
dma_alloc_coherent() to make sure allocated memory is
zeroed. That shouldn't affect performance because
this allocation only happens once on driver probe.
[2] Do explicit zeroing of both "des0" and "des1" fields
of all buffer descriptors during initialization of
DMA transfer.
And while at it fixed identation of dma_free_coherent()
counterpart as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: arc-linux-dev@synopsys.com
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context, indentation
- Normal and extended descriptors are allocated in the same place here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
commit 2426f39100 upstream.
file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.
Fix the bug by checking actual mode bits before setting S_NOSEC.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d496f7842a upstream.
A ROSE socket doesn't necessarily always have a neighbour pointer so check
if the neighbour pointer is valid before dereferencing it.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 530c11d432 upstream.
The omap watchdog has the annoying behaviour that writes to most
registers don't have any effect when the watchdog is already running.
Quoting the AM335x reference manual:
To modify the timer counter value (the WDT_WCRR register),
prescaler ratio (the WDT_WCLR[4:2] PTV bit field), delay
configuration value (the WDT_WDLY[31:0] DLY_VALUE bit field), or
the load value (the WDT_WLDR[31:0] TIMER_LOAD bit field), the
watchdog timer must be disabled by using the start/stop sequence
(the WDT_WSPR register).
Currently the timer is stopped in the .probe callback but still there
are possibilities that yield to a situation where omap_wdt_start is
entered with the timer running (e.g. when /dev/watchdog is closed
without stopping and then reopened). In such a case programming the
timeout silently fails!
To circumvent this stop the timer before reprogramming.
Assuming one of the first things the watchdog user does is setting the
timeout explicitly nothing too bad should happen because this explicit
setting works fine.
Fixes: 7768a13c25 ("[PATCH] OMAP: Add Watchdog driver support")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 292db1bc6c upstream.
ext4 isn't willing to map clusters to a non-extent file. Don't signal
this with an out of space error, since the FS will retry the
allocation (which didn't fail) forever. Instead, return EUCLEAN so
that the operation will fail immediately all the way back to userspace.
(The fix is either to run e2fsck -E bmap2extent, or to chattr +e the file.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit adfa969850 upstream.
The value sent on the SPI bus is shifted by an erroneous number of bits.
The shift value was already computed in the iio_chan_spec structure and
hence subtracting this argument to 16 yields an erroneous data position
in the SPI stream.
Signed-off-by: JM Friedt <jmfriedt@femto-st.fr>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89d96a6f8e upstream.
Normally all of the buffers will have been forced out to disk before
we call invalidate_bdev(), but there will be some cases, where a file
system operation was aborted due to an ext4_error(), where there may
still be some dirty buffers in the buffer cache for the device. So
try to force them out to memory before calling invalidate_bdev().
This fixes a warning triggered by generic/081:
WARNING: CPU: 1 PID: 3473 at /usr/projects/linux/ext4/fs/block_dev.c:56 __blkdev_put+0xb5/0x16f()
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 29535f7b79 upstream.
The current handler of MMC_BLK_CMD_ERR in mmc_blk_issue_rw_rq function
may cause new coming request permanent missing when the ongoing
request (previoulsy started) complete end.
The problem scenario is as follows:
(1) Request A is ongoing;
(2) Request B arrived, and finally mmc_blk_issue_rw_rq() is called;
(3) Request A encounters the MMC_BLK_CMD_ERR error;
(4) In the error handling of MMC_BLK_CMD_ERR, suppose mmc_blk_cmd_err()
end request A completed and return zero. Continue the error handling,
suppose mmc_blk_reset() reset device success;
(5) Continue the execution, while loop completed because variable ret
is zero now;
(6) Finally, mmc_blk_issue_rw_rq() return without processing request B.
The process related to the missing request may wait that IO request
complete forever, possibly crashing the application or hanging the system.
Fix this issue by starting new request when reset success.
Signed-off-by: Ding Wang <justin.wang@spreadtrum.com>
Fixes: 67716327ee ("mmc: block: add eMMC hardware reset support")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4b200b4604 upstream.
This fixes a several year old regression that I found while trying
to get the Yoga 3 11 to work. The ideapad_rfk_set function is meant
to send a command to the embedded controller through ACPI, but
as of c1f73658ed, it sends the index of the rfkill device instead
of the command, and ignores the opcode field.
This changes it back to the original behavior, which indeed
flips the rfkill state as seen in the debugfs interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c1f73658ed ("ideapad: pass ideapad_priv as argument (part 2)")
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
[bwh: Backported to 3.2: device private data is just the device index, not a
pointer]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f6a6fda29 upstream.
If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case. In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts. So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first. And only after recovering
journal it can do the new updates.
The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.htmlhttp://comments.gmane.org/gmane.comp.file-systems.ext4/48841
[ Fixed bug in patch which allowed a non-negative error return from
jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
was causing xfstests ext4/306 to fail. -- Ted ]
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
[bwh: Backported to 3.2:
- Adjust context
- Don't drop j_checkpoint_mutex where we don't hold it]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79feb521a4 upstream.
When we reach jbd2_cleanup_journal_tail(), there is no guarantee that
checkpointed buffers are on a stable storage - especially if buffers were
written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's
caches. Thus when we update journal superblock effectively removing old
transaction from journal, this write of superblock can get to stable storage
before those checkpointed buffers which can result in filesystem corruption
after a crash. Thus we must unconditionally issue a cache flush before we
update journal superblock in these cases.
A similar problem can also occur if journal superblock is written only in
disk's caches, other transaction starts reusing space of the transaction
cleaned from the log and power failure happens. Subsequent journal replay would
still try to replay the old transaction but some of it's blocks may be already
overwritten by the new transaction. For this reason we must use WRITE_FUA when
updating log tail and we must first write new log tail to disk and update
in-memory information only after that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Prerequisite for "jbd2: fix ocfs2 corrupt when updating journal
superblock fails".
Backported to 3.2:
- Adjust context
- Drop changes to jbd2_journal_update_sb_log_tail trace event]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 24bcc89c7e upstream.
There are three case of updating journal superblock. In the first case, we want
to mark journal as empty (setting s_sequence to 0), in the second case we want
to update log tail, in the third case we want to update s_errno. Split these
cases into separate functions. It makes the code slightly more straightforward
and later patches will make the distinction even more important.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Prerequisite for "jbd2: fix ocfs2 corrupt when updating journal
superblock fails".
Backported to 3.2: drop changes to trace events.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fb22a8042 upstream.
Disable write buffering on the Toshiba ToPIC95 if it is enabled by
somebody (it is not supposed to be a power-on default according to
the datasheet). On the ToPIC95, practically no 32-bit Cardbus card
will work under heavy load without locking up the whole system if
this is left enabled. I tried about a dozen. It does not affect
16-bit cards. This is similar to the O2 bugs in early controller
revisions it seems.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=55961
Signed-off-by: Ryan C. Underwood <nemesis@icequake.net>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bdf96838ae upstream.
The commit cf108bca46: "ext4: Invert the locking order of page_lock
and transaction start" caused __ext4_journalled_writepage() to drop
the page lock before the page was written back, as part of changing
the locking order to jbd2_journal_start -> page_lock. However, this
introduced a potential race if there was a truncate racing with the
data=journalled writeback mode.
Fix this by grabbing the page lock after starting the journal handle,
and then checking to see if page had gotten truncated out from under
us.
This fixes a number of different warnings or BUG_ON's when running
xfstests generic/086 in data=journalled mode, including:
jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
c0, 164), jh->b_transaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0
- and -
kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
...
Call Trace:
[<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
[<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
[<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
[<c027d883>] ? lock_buffer+0x36/0x36
[<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
[<c0229139>] do_invalidatepage+0x22/0x26
[<c0229198>] truncate_inode_page+0x5b/0x85
[<c022934b>] truncate_inode_pages_range+0x156/0x38c
[<c0229592>] truncate_inode_pages+0x11/0x15
[<c022962d>] truncate_pagecache+0x55/0x71
[<c02b913b>] ext4_setattr+0x4a9/0x560
[<c01ca542>] ? current_kernel_time+0x10/0x44
[<c026c4d8>] notify_change+0x1c7/0x2be
[<c0256a00>] do_truncate+0x65/0x85
[<c0226f31>] ? file_ra_state_init+0x12/0x29
- and -
WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
irty_metadata+0x14a/0x1ae()
...
Call Trace:
[<c01b879f>] ? console_unlock+0x3a1/0x3ce
[<c082cbb4>] dump_stack+0x48/0x60
[<c0178b65>] warn_slowpath_common+0x89/0xa0
[<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
[<c0178bef>] warn_slowpath_null+0x14/0x18
[<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
[<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
[<c02b2f44>] write_end_fn+0x40/0x53
[<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
[<c02b59e7>] ext4_writepage+0x354/0x3b8
[<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
[<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
[<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
[<c02b5a5b>] __writepage+0x10/0x2e
[<c0225956>] write_cache_pages+0x22d/0x32c
[<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
[<c02b6ee8>] ext4_writepages+0x102/0x607
[<c019adfe>] ? sched_clock_local+0x10/0x10e
[<c01a8a7c>] ? __lock_is_held+0x2e/0x44
[<c01a8ad5>] ? lock_is_held+0x43/0x51
[<c0226dff>] do_writepages+0x1c/0x29
[<c0276bed>] __writeback_single_inode+0xc3/0x545
[<c0277c07>] writeback_sb_inodes+0x21f/0x36d
...
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9136291f1d upstream.
This patch fixes a bug in the XOR driver where the cleanup function can be
called and free descriptors that never been processed by the engine (which
result in data errors).
The cleanup function will free descriptors based on the ownership bit in
the descriptors.
Fixes: ff7b04796d ("dmaengine: DMA engine driver for Marvell XOR engine")
Signed-off-by: Lior Amsalem <alior@marvell.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a7068e3932 upstream.
The buffer for condtraints debug isn't big enough to hold the output
in all cases. So fix this issue by increasing the buffer.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15bf722e6f upstream.
ATOL FPrint fiscal printers require usb_clear_halt to be executed
to work properly. Add quirk to fix the issue.
Signed-off-by: Alexey Sokolov <sokolov@7pikes.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 300f77c08d upstream.
AR93xx and newer needs to stop rx before tx to avoid getting the DMA
engine or MAC into a stuck state.
This should reduce/fix the occurence of "Failed to stop Tx DMA" logspam.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2:
- Also move initialisation of ret to match upstream
- ath_drain_all_txq() takes a second parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 45c44b5ff9 upstream.
Increase the default init stage change timeout from 15 seconds to 30 seconds.
This resolves issues we have seen with some adapters not transitioning
to the first init stage within 15 seconds, which results in adapter
initialization failures.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d683cc49da upstream.
When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.
Fixes: ee5dc7732b ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8d975e73e upstream.
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b5eeed8cb6 upstream.
There is a small chance that pRD->pRDInfo->skb could go NULL
while the interrupt is processing.
Put NULL check on loop to break out.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fb6d1f7df5 upstream.
Fix USB 3.0 devices lost in NOTATTACHED state after a hub port reset.
Dissolve the function hub_port_finish_reset() completely and divide the
actions to be taken into those which need to be done after each reset
attempt and those which need to be done after the full procedure is
complete, and place them in the appropriate places in hub_port_reset().
Also, remove an unneeded forward declaration of hub_port_reset().
Verbose Problem Description:
USB 3.0 devices may be "lost for good" during a hub port reset.
This makes Linux unable to boot from USB 3.0 devices in certain
constellations of host controllers and devices, because the USB device is
lost during initialization, preventing the rootfs from being mounted.
The underlying problem is that in the affected constellations, during the
processing inside hub_port_reset(), the hub link state goes from 0 to
SS.inactive after the initial reset, and back to 0 again only after the
following "warm" reset.
However, hub_port_finish_reset() is called after each reset attempt and
sets the state the connected USB device based on the "preliminary" status
of the hot reset to USB_STATE_NOTATTACHED due to SS.inactive, yet when
the following warm reset is complete and hub_port_finish_reset() is
called again, its call to set the device to USB_STATE_DEFAULT is blocked
by usb_set_device_state() which does not allow taking USB devices out of
USB_STATE_NOTATTACHED state.
Thanks to Alan Stern for guiding me to the proper solution and how to
submit it.
Link: http://lkml.kernel.org/r/trinity-25981484-72a9-4d46-bf17-9c1cf9301a31-1432073240136%20()%203capp-gmx-bs27
Signed-off-by: Robert Schlabbach <robert_s@gmx.net>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- s/usb_clear_port_feature/clear_port_feature/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e5babdf928 upstream.
Since commit bd31b85960 (which is in 3.2-rc1) nw_gpio_lock is a raw spinlock
that needs usage of the corresponding raw functions.
This fixes:
drivers/mtd/maps/dc21285.c: In function 'nw_en_write':
drivers/mtd/maps/dc21285.c:41:340: warning: passing argument 1 of 'spinlock_check' from incompatible pointer type
spin_lock_irqsave(&nw_gpio_lock, flags);
In file included from include/linux/seqlock.h:35:0,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/mtd/maps/dc21285.c:8:
include/linux/spinlock.h:299:102: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
^
drivers/mtd/maps/dc21285.c:43:25: warning: passing argument 1 of 'spin_unlock_irqrestore' from incompatible pointer type
spin_unlock_irqrestore(&nw_gpio_lock, flags);
^
In file included from include/linux/seqlock.h:35:0,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/mtd/maps/dc21285.c:8:
include/linux/spinlock.h:370:91: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
static inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)
Fixes: bd31b85960 ("locking, ARM: Annotate low level hw locks as raw")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e91f8cb13 upstream.
If, at the time __rcu_process_callbacks() is invoked, there are callbacks
in Tiny RCU's callback list, but none of them are ready to be invoked,
the current list-management code will knit the non-ready callbacks out
of the list. This can result in hangs and possibly worse. This commit
therefore inserts a check for there being no callbacks that can be
invoked immediately.
This bug is unlikely to occur -- you have to get a new callback between
the time rcu_sched_qs() or rcu_bh_qs() was called, but before we get to
__rcu_process_callbacks(). It was detected by the addition of RCU-bh
testing to rcutorture, which in turn was instigated by Iftekhar Ahmed's
mutation testing. Although this bug was made much more likely by
915e8a4fe4 (rcu: Remove fastpath from __rcu_process_callbacks()), this
did not cause the bug, but rather made it much more probable. That
said, it takes more than 40 hours of rcutorture testing, on average,
for this bug to appear, so this fix cannot be considered an emergency.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76e838c9f7 upstream.
We need to return error to caller if command is not sent to
controller succesfully.
Signed-off-by: Subbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Fixes: 72246da40f (usb: Introduce DesignWare USB3 DRD Driver)
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 39fa10f7e2 upstream.
Since we are messing with state in the worker.
v2: drop the changes in the mst worker
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8687634b79 upstream.
In RS485 mode, we may want to set the delay_rts_after_send value to 0.
In the datasheet, the 0 value is said to "disable" the Transmitter Timeguard but
this is exactly the expected behavior if we want no delay...
Moreover, if the value was set to non-zero value by device-tree or earlier
ioctl command, it was impossible to change it back to zero.
Reported-by: Sami Pietikäinen <Sami.Pietikainen@wapice.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d079abd181 upstream.
Too many spaces were introduced in commit 63adc6fb8a ("pktgen: cleanup
checkpatch warnings"), thus misaligning "src_min:" to other columns.
Fixes: 63adc6fb8a ("pktgen: cleanup checkpatch warnings")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebb6ad73e6 upstream.
VMID Control 0 BIT[2:1] is VMID Divider Enable and Select
00 = VMID disabled (for OFF mode)
01 = 2 x 50kΩ divider (for normal operation)
10 = 2 x 250kΩ divider (for low power standby)
11 = 2 x 5kΩ divider (for fast start-up)
So WM8903_VMID_RES_250K should be 2 << 1, which is 4.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 14ba3ec1de upstream.
According to the datasheet:
R10 (0Ah) VMID Impedance Control
BIT 3:2 VMIDSEL DEFAULT 00
DESCRIPTION: VMID impedance selection control
00: 75kΩ output
01: 300kΩ output
10: 2.5kΩ output
WM8737_VMIDSEL_MASK is 0xC (VMIDSEL - [3:2]),
so it needs to left shift WM8737_VMIDSEL_SHIFT bits for setting these bits.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 073db4a51e upstream.
On A MIPS 32-cores machine a BUG_ON was triggered because some acesses to
mtd->usecount were done without taking mtd_table_mutex.
kernel: Call Trace:
kernel: [<ffffffff80401818>] __put_mtd_device+0x20/0x50
kernel: [<ffffffff804086f4>] blktrans_release+0x8c/0xd8
kernel: [<ffffffff802577e0>] __blkdev_put+0x1a8/0x200
kernel: [<ffffffff802579a4>] blkdev_close+0x1c/0x30
kernel: [<ffffffff8022006c>] __fput+0xac/0x250
kernel: [<ffffffff80171208>] task_work_run+0xd8/0x120
kernel: [<ffffffff8012c23c>] work_notifysig+0x10/0x18
kernel:
kernel:
Code: 2442ffff ac8202d8 000217fe <00020336> dc820128 10400003
00000000 0040f809 00000000
kernel: ---[ end trace 080fbb4579b47a73 ]---
Fixed by taking the mutex in blktrans_open and blktrans_release.
Note that this locking is already suggested in
include/linux/mtd/blktrans.h:
struct mtd_blktrans_ops {
...
/* Called with mtd_table_mutex held; no race with add/remove */
int (*open)(struct mtd_blktrans_dev *dev);
void (*release)(struct mtd_blktrans_dev *dev);
...
};
But we weren't following it.
Originally reported by (and patched by) Zhang and Giuseppe,
independently. Improved and rewritten.
Reported-by: Zhang Xingcai <zhangxingcai@huawei.com>
Reported-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Tested-by: Giuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1fa2337a31 upstream.
The maximum size for a DiSEqC command is 6, according to the
userspace API. However, the code allows to write up much more values:
drivers/media/dvb-frontends/cx24116.c:983 cx24116_send_diseqc_msg() error: buffer overflow 'd->msg' 6 <= 23
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 12f4543f5d upstream.
The maximum size for a DiSEqC command is 6, according to the
userspace API. However, the code allows to write up to 7 values:
drivers/media/dvb-frontends/s5h1420.c:193 s5h1420_send_master_cmd() error: buffer overflow 'cmd->msg' 6 <= 7
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5de2755c8c upstream.
Because we drop cpu_base->lock around calling hrtimer::function, it is
possible for hrtimer_start() to come in between and enqueue the timer.
If hrtimer::function then returns HRTIMER_RESTART we'll hit the BUG_ON
because HRTIMER_STATE_ENQUEUED will be set.
Since the above is a perfectly valid scenario, remove the BUG_ON and
make the enqueue_hrtimer() call conditional on the timer not being
enqueued already.
NOTE: in that concurrent scenario its entirely common for both sites
to want to modify the hrtimer, since hrtimers don't provide
serialization themselves be sure to provide some such that the
hrtimer::function and the hrtimer_start() caller don't both try and
fudge the expiration state at the same time.
To that effect, add a WARN when someone tries to forward an already
enqueued timer, the most common way to change the expiry of self
restarting timers. Ideally we'd put the WARN in everything modifying
the expiry but most of that is inlines and we don't need the bloat.
Fixes: 2d44ae4d71 ("hrtimer: clean up cpu->base locking tricks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20150415113105.GT5029@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d0a0b2f6d upstream.
ACPICA commit b60612373a4ef63b64a57c124576d7ddb6d8efb6
For physical addresses, since the address may exceed 32-bit address range
after calculation, we should use 0x%8.8X%8.8X instead of ACPI_PRINTF_UINT
and ACPI_FORMAT_UINT64() instead of
ACPI_FORMAT_NATIVE_UINT()/ACPI_FORMAT_TO_UINT().
This patch also removes above replaced macros as there are no users.
This is a preparation to switch acpi_physical_address to 64-bit on 32-bit
kernel builds.
Link: https://github.com/acpica/acpica/commit/b6061237
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
[gdavis: Move tbprint.c changes to tbutils.c due to lack of commit
"42f4786 ACPICA: Split table print utilities to a new a
separate file" in linux-3.10.y]
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cc2080b0e5 upstream.
ACPICA commit 7f06739db43a85083a70371c14141008f20b2198
For physical addresses, since the address may exceed 32-bit address range
after calculation, we should use %8.8X%8.8X (see ACPI_FORMAT_UINT64()) to
convert the %p formats.
This is a preparation to switch acpi_physical_address to 64-bit on 32-bit
kernel builds.
Link: https://github.com/acpica/acpica/commit/7f06739d
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
[gdavis: Move tbinstall.c changes to tbutils.c due to lack of commit
"42f4786 ACPICA: Split table print utilities to a new a
separate file" in linux-3.10.y]
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Drop inapplicable changes to drivers/acpi/acpica/utaddress.c and
acpi_tb_install_table()
- Fix similar format issues in acpi_tb_add_table() and
acpi_tb_install_table() that aren't present upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 85e014b430 upstream.
commit f254e3c57b upstream.
ACPICA commit 7d9fd64397d7c38899d3dc497525f6e6b044e0e3
OSPMs like Linux expect an acpi_physical_address returning value from
acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't
equal to sizeof (acpi_physical_address):
drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default]
In file included from include/acpi/acpi.h:64:0,
from include/linux/acpi.h:36,
from drivers/acpi/osl.c:41:
include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *'
This patch corrects acpi_find_root_pointer().
Link: https://github.com/acpica/acpica/commit/7d9fd643
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f61360148 upstream.
__sync_fetch_and_and and __sync_fetch_and_or are functions that are provided
by gcc and depending on the target architecture may be implemented in libgcc,
which is not always available in the kernel. This leads to a build failure
on ARMv5:
drivers/built-in.o: In function `line6_pcm_release':
:(.text+0x3bfe80): undefined reference to `__sync_fetch_and_and_4'
drivers/built-in.o: In function `line6_pcm_acquire':
:(.text+0x3bff30): undefined reference to `__sync_fetch_and_or_4'
To work around this, we can use the kernel-provided cmpxchg macro.
Build-tested only.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Markus Grabner <grabner@icg.tugraz.at>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Fix up two more instances of __sync_fetch_and_and() that were removed
separately upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca0ad83da1 upstream.
The Debian experimental linux source package (3.8.5-1) build fails
with the following errors:
...
MODPOST 2016 modules
ERROR: "__ucmpdi2" [fs/btrfs/btrfs.ko] undefined!
ERROR: "__ucmpdi2" [drivers/md/dm-verity.ko] undefined!
The attached patch resolves this problem. It is based on the s390
implementation of ucmpdi2.c.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9cdf30bd3b upstream.
Returns a non-zero value if the current processor implementation requires
an IHB instruction to deal with an instruction hazard as per MIPS R2
architecture specification, zero otherwise.
For a discussion, see http://patchwork.linux-mips.org/patch/9539/.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: trim the CPU type list]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4d46a67a3e upstream.
The lazy cache flushing implemented in the MIPS kernel suffers from a
race condition that is exposed by do_set_pte() in mm/memory.c.
A pre-condition is a file-system that writes to the page from the CPU
in its readpage method and then calls flush_dcache_page(). One example
is ubifs. Another pre-condition is that the dcache flush is postponed
in __flush_dcache_page().
Upon a page fault for an executable mapping not existing in the
page-cache, the following will happen:
1. Write to the page
2. flush_dcache_page
3. flush_icache_page
4. set_pte_at
5. update_mmu_cache (commits the flush of a dcache-dirty page)
Between steps 4 and 5 another thread can hit the same page and it will
encounter a valid pte. Because the data still is in the L1 dcache the CPU
will fetch stale data from L2 into the icache and execute garbage.
This fix moves the commit of the cache flush to step 3 to close the
race window. It also reduces the amount of flushes on non-executable
mappings because we never enter __flush_dcache_page() for non-aliasing
CPUs.
Regressions can occur in drivers that mistakenly relies on the
flush_dcache_page() in get_user_pages() for DMA operations.
[ralf@linux-mips.org: Folded in patch 9346 to fix highmem issue.]
Signed-off-by: Lars Persson <larper@axis.com>
Cc: linux-mips@linux-mips.org
Cc: paul.burton@imgtec.com
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/9346/
Patchwork: https://patchwork.linux-mips.org/patch/9738/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34376a50fb upstream.
The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq. The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.
To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.
Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.
This was introduced in 3.9 by commit c10d73671a ("softirq: reduce
latencies").
It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.
The hang stack traces look something like this:
------------[ cut here ]------------
WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Watchdog detected hard LOCKUP on cpu 2
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11
Call Trace:
<NMI> warn_slowpath_common+0x85/0x9f
warn_slowpath_fmt+0x46/0x48
watchdog_overflow_callback+0x9c/0xa7
__perf_event_overflow+0x137/0x1cb
perf_event_overflow+0x14/0x16
intel_pmu_handle_irq+0x2dc/0x359
perf_event_nmi_handler+0x19/0x1b
nmi_handle+0x7f/0xc2
do_nmi+0xbc/0x304
end_repeat_nmi+0x1e/0x2e
<<EOE>>
cpu_stopper_thread+0xae/0x162
smpboot_thread_fn+0x258/0x260
kthread+0xc7/0xcf
ret_from_fork+0x7c/0xb0
---[ end trace 4947dfa9b0a4cec3 ]---
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
irq event stamp: 835637905
hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257
hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: tasklet_hi_action+0xf0/0xf0
Process migration/1
Call Trace:
<IRQ>
__do_softirq+0x117/0x257
irq_exit+0x5f/0xbb
smp_apic_timer_interrupt+0x8a/0x98
apic_timer_interrupt+0x72/0x80
<EOI>
printk+0x4d/0x4f
stop_machine_cpu_stop+0x22c/0x274
cpu_stopper_thread+0xae/0x162
smpboot_thread_fn+0x258/0x260
kthread+0xc7/0xcf
ret_from_fork+0x7c/0xb0
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pekka Riikonen <priikone@iki.fi>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Rui Xiang <rui.xiang@huawei.com>
commit c10d73671a upstream.
In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.
This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.
This patch changes the fallback to ksoftirqd condition to :
- A time limit of 2 ms.
- need_resched() being set on current task
When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.
Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.
I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :
In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.
Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)
Before patch :
# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92
After patch :
# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Rui Xiang <rui.xiang@huawei.com>
commit fa8cbaaf5a upstream.
Since commit 8a0a9bd4db, this comment in mmap_rnd() does not
hold true as the value returned by get_random_int() will in fact be
different every single call. Remove the comment and simplify the code
back to its original desired form.
This reverts commit a5adc91a4b which is no longer necessary and
also fixes the sparc code that copied this same adjustment.
Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Moritz Mühlenhoff <jmm@inutil.org>
commit 73af963f9f upstream.
__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.
For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable. setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable). After that get_dumpable() fails.
(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)
Change __ptrace_may_access() to use same_thread_group() instead of "task
== current". Any security check is pointless when the tasks share the
same ->mm.
Signed-off-by: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Sheng Yong <shengyong1@huawei.com>
commit 55c844a4dd upstream.
When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:
Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
<IRQ> [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
[<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
[<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
[<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
[<ffffffff81050112>] scheduler_tick+0xe0/0xe9
[<ffffffff81036768>] update_process_times+0x60/0x70
[<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
[<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
[<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
[<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
[<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
[<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
<EOI> [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
[<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
[<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
[<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
[<ffffffff81018926>] machine_shutdown+0xa/0xc
[<ffffffff8101897e>] native_machine_restart+0x20/0x32
[<ffffffff810189b0>] machine_restart+0xa/0xc
[<ffffffff8103b196>] kernel_restart+0x47/0x4c
[<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
[<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
[<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
[<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
[<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
[<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---
The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().
So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.
The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Vinson Lee <vlee@twopensource.com>
Based on 08adb7dabd upstream.
We found that after v3.10.73, recvmsg might return -EFAULT while -EINVAL
was expected.
We tested it through the recvmsg01 testcase come from LTP testsuit. It set
msg->msg_namelen to -1 and the recvmsg syscall returned errno 14, which is
unexpected (errno 22 is expected):
recvmsg01 4 TFAIL : invalid socket length ; returned -1 (expected -1),
errno 14 (expected 22)
Linux mainline has no this bug for commit 08adb7dab fixes it accidentally.
However, it is too large and complex to be backported to LTS 3.10.
Commit 281c9c36 (net: compat: Update get_compat_msghdr() to match
copy_msghdr_from_user() behaviour) made get_compat_msghdr() return
error if msg_sys->msg_namelen was negative, which changed the behaviors
of recvmsg and sendmsg syscall in a lib32 system:
Before commit 281c9c36, get_compat_msghdr() wouldn't fail and it would
return -EINVAL in move_addr_to_user() or somewhere if msg_sys->msg_namelen
was invalid and then syscall returned -EINVAL, which is correct.
And now, when msg_sys->msg_namelen is negative, get_compat_msghdr() will
fail and wants to return -EINVAL, however, the outer syscall will return
-EFAULT directly, which is unexpected.
This patch gets the return value of get_compat_msghdr() as well as
copy_msghdr_from_user(), then returns this expected value if
get_compat_msghdr() fails.
Fixes: 281c9c36 (net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour)
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Hanbing Xu <xuhanbing@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43d77867a4 upstream.
Current implementation of unfreeze_partials() is so complicated,
but benefit from it is insignificant. In addition many code in
do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab.
Under current implementation which test status of cpu partial slab
and acquire list_lock in do {} while loop,
we don't need to acquire a list_lock and gain a little benefit
when front of the cpu partial slab is to be discarded, but this is a rare case.
In case that add_partial is performed and cmpxchg_double_slab is failed,
remove_partial should be called case by case.
I think that these are disadvantages of current implementation,
so I do refactoring unfreeze_partials().
Minimizing code in do {} while loop introduce a reduced fail rate
of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256'
when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done.
** before **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 182685
Unlocked Cmpxchg Double redos 0
** after **
Cmpxchg_double Looping
------------------------
Locked Cmpxchg Double redos 177995
Unlocked Cmpxchg Double redos 1
We can see cmpxchg_double_slab fail rate is improved slightly.
Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'.
** before **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
108517.190463 task-clock # 7.926 CPUs utilized ( +- 0.24% )
2,919,550 context-switches # 0.027 M/sec ( +- 3.07% )
100,774 CPU-migrations # 0.929 K/sec ( +- 4.72% )
124,201 page-faults # 0.001 M/sec ( +- 0.15% )
401,500,234,387 cycles # 3.700 GHz ( +- 0.24% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,576,913,354 instructions # 0.62 insns per cycle ( +- 0.13% )
45,934,956,860 branches # 423.297 M/sec ( +- 0.14% )
188,219,787 branch-misses # 0.41% of all branches ( +- 0.56% )
13.691837307 seconds time elapsed ( +- 0.24% )
** after **
Performance counter stats for './hackbench 50 process 4000' (30 runs):
107784.479767 task-clock # 7.928 CPUs utilized ( +- 0.22% )
2,834,781 context-switches # 0.026 M/sec ( +- 2.33% )
93,083 CPU-migrations # 0.864 K/sec ( +- 3.45% )
123,967 page-faults # 0.001 M/sec ( +- 0.15% )
398,781,421,836 cycles # 3.700 GHz ( +- 0.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
250,189,160,419 instructions # 0.63 insns per cycle ( +- 0.09% )
45,855,370,128 branches # 425.436 M/sec ( +- 0.10% )
169,881,248 branch-misses # 0.37% of all branches ( +- 0.43% )
13.596272341 seconds time elapsed ( +- 0.22% )
No regression is found, but rather we can see slightly better result.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Zefan Li <lizefan@huawei.com>
Commit 915f4f86dd ("debugfs: leave freeing a symlink body until inode
eviction", commit 0db59e5929 upstream) changed debugfs to define its
own super_operations and implement the evict_inode operation.
Luis Henriques pointed out that it needs to define the statfs
operation, as in simple_super_operations which it was using before.
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2c51a97f76 ]
The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:
1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.
2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.
Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.
Fixes: 767e97e1e0 ("neigh: RCU conversion of struct neighbour")
Fixes: a263b30936 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056 ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: drop change to __neigh_set_probe_once() which
we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 468479e604 ]
PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.
When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.
Fixes: dc99f60069 ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Demux functions return a sock pointer, not an index]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f98f4514d0 ]
We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()
Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417 ("packet: Add 'cpu' fanout policy.")
Fixes: dc99f60069 ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2dab80a8b4 ]
After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e9 ("bridge: add ageing_time, stp_state, priority over netlink")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 31a418986a ]
When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).
In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.
A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).
Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.
The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...
A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b48732e4a4 ]
got a rare NULL pointer dereference in clear_bit
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
----
v2: switch to sock_flag(sk, SOCK_DEAD) and added net/caif/caif_socket.c
v3: return -ECONNRESET in upstream caller of wait function for SOCK_DEAD
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 397a253af5 ]
Currently, the calibration function that corrects the initial offsets
among multiple devices only works the first time. If the function is
called more than once, the calibration fails and bogus offsets will be
programmed into the devices.
In a well hidden spot, the device documentation tells that trigger indexes
0 and 1 are special in allowing the TRIG_IF_LATE flag to actually work.
This patch fixes the issue by using one of the special triggers during the
recalibration method.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6663a4fa67 upstream.
Commit 59a53afe70 "powerpc: Don't setup
CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs
that aren't presently running shall have status of disabled, with
enable-method being used to determine whether the CPU can be enabled.
Fix by checking for spin-table, which is currently the only supported
enable-method.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jonathan Toppins <jtoppins@cumulusnetworks.com>
commit 7aff0fe330 upstream.
Add a helper function for finding the index of a string in a string
list property. This helper is useful for bindings that use a separate
*-name property for attaching names to tuples in another property such
as 'reg' or 'gpios'.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
[jt: dropped changes to files not existing in the 3.2 tree]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Curt Brune <curt@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d45a02d01 upstream.
->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.
This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().
Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.
Fixes: 9f7d653b67 ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust filename, context
- Most per-netns state is global]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit beb39db59d upstream.
We have two problems in UDP stack related to bogus checksums :
1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()
2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.
This patch is an attempt to make things better.
We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
pipe_iov_copy_{from,to}_user() may be tried twice with the same iovec,
the first time atomically and the second time not. The second attempt
needs to continue from the iovec position, pipe buffer offset and
remaining length where the first attempt failed, but currently the
pipe buffer offset and remaining length are reset. This will corrupt
the piped data (possibly also leading to an information leak between
processes) and may also corrupt kernel memory.
This was fixed upstream by commits f0d1bec9d5 ("new helper:
copy_page_from_iter()") and 637b58c288 ("switch pipe_read() to
copy_page_to_iter()"), but those aren't suitable for stable. This fix
for older kernel versions was made by Seth Jennings for RHEL and I
have extracted it from their update.
CVE-2015-1805
References: https://bugzilla.redhat.com/show_bug.cgi?id=1202855
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2cf30dc180 upstream.
When the following filter is used it causes a warning to trigger:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
Modules linked in: bnep lockd grace bluetooth ...
CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
Call Trace:
[<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
[<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
[<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
[<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81159065>] replace_preds+0x3c5/0x990
[<ffffffff811596b2>] create_filter+0x82/0xb0
[<ffffffff81159944>] apply_event_filter+0xd4/0x180
[<ffffffff81152bbf>] event_filter_write+0x8f/0x120
[<ffffffff811db2a8>] __vfs_write+0x28/0xe0
[<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
[<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
[<ffffffff811dc408>] vfs_write+0xb8/0x1b0
[<ffffffff811dc72f>] SyS_write+0x4f/0xb0
[<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
---[ end trace e11028bd95818dcd ]---
Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.
This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:
# cd /sys/kernel/debug/tracing
# echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
# cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression
And give no kernel warning.
Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: drop the check for OP_NOT, which we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a040eaca1 upstream.
Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758 ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f35b9cd55 upstream.
Commit 334c86c494 ("MIPS: IRQ: Add stackoverflow detection") added
kernel stack overflow detection, however it only enabled it conditional
upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never
actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW,
which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so
switch it to using that definition instead.
Fixes: 334c86c494 ("MIPS: IRQ: Add stackoverflow detection")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Adam Jiang <jiang.adam@gmail.com>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/10531/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae4bedf067 upstream.
Newer elantech touchpads are not recognized by the current driver, since it
fails to detect their firmware version number. This prevents more advanced
touchpad features from being usable such as two-finger scrolling. This
patch allows newer touchpads to be detected and be fully functional. Tested
on Sony Vaio SVF13N17PXB.
Signed-off-by: Jordan Rife <jrife0@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9eebed7de6 upstream.
* Fix version recognition in elantech_set_properties
The new hardware reports itself as v7 but the packets'
structure is unaltered.
* Fix packet type recognition in elantech_packet_check_v4
The bitmask used for v6 is too wide, only the last three bits of
the third byte in a packet (packet[3] & 0x03) are actually used to
distinguish between packet types.
Starting from v7, additional information (to be interpreted) is
stored in the remaining bits (packets[3] & 0x1c).
In addition, the value stored in (packet[0] & 0x0c) is no longer
a constant but contains additional information yet to be deciphered.
This change should be backwards compatible with v6 hardware.
Additional-author: Giovanni Frigione <gio.frigione@gmail.com>
Signed-off-by: Matteo Delfino <kendatsuba@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea114fc27d upstream.
The driver worked around an error in the MAYA44 USB(+)'s mixer unit
descriptor by aborting before parsing the missing field. However,
aborting parsing too early prevented parsing of the other units
connected to this unit, so the capture mixer controls would be missing.
Fix this by moving the check for this descriptor error after the parsing
of the unit's input pins.
Reported-by: nightmixes <nightmixes@gmail.com>
Tested-by: nightmixes <nightmixes@gmail.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2:
- Adjust context
- Logging statement was different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 044bddb9ca upstream.
Add mixer control names for the ESI Maya44 USB+ (which appears to be
identical width the AudioTrak Maya44 USB).
Reported-by: nightmixes <nightmixes@gmail.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f0ee9d17a upstream.
Make the check to skip the rate check more lax, so that it applies
to all hw_version 4 models.
This fixes the touchpad not being detected properly on Asus PU551LA
laptops.
Reported-and-tested-by: David Zafra Gómez <dezeta@klo.es>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 161f873b89 upstream.
We used to read file_handle twice. Once to get the amount of extra
bytes, and once to fetch the entire structure.
This may be problematic since we do size verifications only after the
first read, so if the number of extra bytes changes in userspace between
the first and second calls, we'll have an incoherent view of
file_handle.
Instead, read the constant size once, and copy that over to the final
structure without having to re-read it again.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Inspired by commit f18c34e483 ("lib: Fix strnlen_user() to not touch
memory after specified maximum") upstream. This version of
strnlen_user(), no longer present upstream, has a similar off-by-one
error.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jan Kara <jack@suse.cz>
commit 1ef9f05835 upstream.
Fix this from the logs:
usb 7-1: New USB device found, idVendor=046d, idProduct=08ca
...
usb 7-1: Warning! Unlikely big volume range (=3072), cval->res is probably wrong.
usb 7-1: [5] FU [Mic Capture Volume] ch = 1, val = 4608/7680/1
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2159184ea0 upstream.
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.
Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: apply to the 3 copies of this logic we
ended up with]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dcbff39da3 upstream.
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop. Not having one causes it to overrun
to invalid memory.
In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 83a35114d0 upstream.
This bug has been there since day 1; addresses in the top guest physical
page weren't considered valid. You could map that page (the check in
check_gpte() is correct), but if a guest tried to put a pagetable there
we'd check that address manually when walking it, and kill the guest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3f7352bf21 upstream.
x86 has variable length encoding. x86 JIT compiler is trying
to pick the shortest encoding for given bpf instruction.
While doing so the jump targets are changing, so JIT is doing
multiple passes over the program. Typical program needs 3 passes.
Some very short programs converge with 2 passes. Large programs
may need 4 or 5. But specially crafted bpf programs may hit the
pass limit and if the program converges on the last iteration
the JIT compiler will be producing an image full of 'int 3' insns.
Fix this corner case by doing final iteration over bpf program.
Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 47cc84ce0c upstream.
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.
This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.
This only happens when there is a MLDv2 querier in the network.
The fix will only break out of the loop when there is a failure adding a
multicast address.
The mdb before the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
After the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
Fixes: 08b202b672 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jonathan Toppins <jtoppins@cumulusnetworks.com>
commit a1cae34e23 upstream.
Multitheaded tests showed that the icv buffer in the current ghash
implementation is not handled correctly. A move of this working ghash
buffer value to the descriptor context fixed this. Code is tested and
verified with an multithreaded application via af_alg interface.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com>
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1df5b888f5 upstream.
This adds support for new Xsens device, Motion Tracker Development Board,
using Xsens' own Vendor ID
Signed-off-by: Patrick Riphagen <patrick.riphagen@xsens.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 77bb3dfdc0 upstream.
A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different
VCPU than it is bound to. This can result in a race between
handle_percpu_irq() and removing the action in __free_irq() because
handle_percpu_irq() does not take desc->lock. The interrupt handler
sees a NULL action and oopses.
Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER).
# cat /proc/interrupts | grep virq
40: 87246 0 xen-percpu-virq timer0
44: 0 0 xen-percpu-virq debug0
47: 0 20995 xen-percpu-virq timer1
51: 0 0 xen-percpu-virq debug1
69: 0 0 xen-dyn-virq xen-pcpu
74: 0 0 xen-dyn-virq mce
75: 29 0 xen-dyn-virq hvc_console
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[bwh: Backported to 3.2: adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 74856fbf44 upstream.
256 bytes per sector support has been broken since 2.6.X,
and no-one stepped up to fix this.
So disable support for it.
Signed-off-by: Mark Hounschell <dmarkh@cfl.rr.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e531d0bceb upstream.
The journal revoke block recovery code does not check r_count for
sanity, which means that an evil value of r_count could result in
the kernel reading off the end of the revoke table and into whatever
garbage lies beyond. This could crash the kernel, so fix that.
However, in testing this fix, I discovered that the code to write
out the revoke tables also was not correctly checking to see if the
block was full -- the current offset check is fine so long as the
revoke table space size is a multiple of the record size, but this
is not true when either journal_csum_v[23] are set.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: journal checksumming is not supported, so only
the first fix is needed]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f974865ff upstream.
The following commit introduced a bug when checking for zero length extent
5946d08 ext4: check for overlapping extents in ext4_valid_extent_entries()
Zero length extent could pass the check if lblock is zero.
Adding the explicit check for zero length back.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c1ac56b51 upstream.
In function dmi_present(), dmi_walk_early() calls dmi_table(), which
calls dmi_decode(), which ultimately calls dmi_save_uuid(). This last
function makes a decision based on the value of global variable
dmi_ver. The problem is that this variable is set right _after_
dmi_walk_early() returns. So dmi_save_uuid() always sees dmi_ver == 0
regardless of the actual version implemented.
This causes /sys/class/dmi/id/product_uuid to always use the old
ordering even on systems implementing DMI/SMBIOS 2.6 or later, which
should use the new ordering.
This is broken since kernel v3.8 for legacy DMI implementations and
since kernel v3.10 for SMBIOS 2 implementations. SMBIOS 3
implementations with the 64-bit entry point are not affected.
The first breakage does not matter much as in practice legacy DMI
implementations are always for versions older than 2.6, which is when
the UUID ordering changed. The second breakage is more problematic as
it affects the vast majority of x86 systems manufactured since 2009.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 9f9c9cbb60 ("drivers/firmware/dmi_scan.c: fetch dmi version from SMBIOS if it exists")
Fixes: 79bae42d51 ("dmi_scan: refactor dmi_scan_machine(), {smbios,dmi}_present()")
Acked-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Artem Savkov <artem.savkov@gmail.com>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79bae42d51 upstream.
Move the calls to memcpy_fromio() up into the loop in
dmi_scan_machine(), and move the signature checks back down into
dmi_decode(). We need to check at 16-byte intervals but keep a 32-byte
buffer for an SMBIOS entry, so shift the buffer after each iteration.
Merge smbios_present() into dmi_present(), so we look for an SMBIOS
signature at the beginning of the given buffer and then for a DMI
signature at an offset of 16 bytes.
[artem.savkov@gmail.com: use proper buf type in dmi_present()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: Tim McGrath <tmhikaru@gmail.com>
Tested-by: Tim Mcgrath <tmhikaru@gmail.com>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Artem Savkov <artem.savkov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Prerequisite for "firmware: dmi_scan: Fix ordering of product_uuid"]
commit 5e95235ccd upstream.
Recent toolchains force the TOC to be 256 byte aligned. We need
to enforce this alignment in our linker script, otherwise pointers
to our TOC variables (__toc_start, __prom_init_toc_start) could
be incorrect.
If they are bad, we die a few hundred instructions into boot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c0213d17a upstream.
When the v3 hardware sees more than one finger, it uses the semi-mt
protocol to report the touches. However, it currently works when
num_fingers is 0, 1 or 2, but when it is 3 and above, it sends only 1
finger as if num_fingers was 1.
This confuses userspace which knows how to deal with extra fingers
when all the slots are used, but not when some are missing.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=90101
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fdb6eb0a12 upstream.
When there is prefix specified, currently we will add this prefix in
widget->name, but not in widget->sname.
it causes failure at snd_soc_dapm_link_dai_widgets:
if (!w->sname || !strstr(w->sname, dai_w->name))
because dai_w->name has prefix added, but w->sname does not.
We should also add prefix for stream name
Signed-off-by: Koro Chen <koro.chen@mediatek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- s/prefix/dapm->codec->name_prefix]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 898761158b upstream.
smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
should not be reused for different values of it. Thus, it has to be
added to the mask in kvm_mmu_pte_write.
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 47b4e1fc49 upstream.
Remove checking tailroom when adding IV as it uses only
headroom, and move the check to the ICV generation that
actually needs the tailroom.
In other case I hit such warning and datapath don't work,
when testing:
- IBSS + WEP
- ath9k with hw crypt enabled
- IPv6 data (ping6)
WARNING: CPU: 3 PID: 13301 at net/mac80211/wep.c:102 ieee80211_wep_add_iv+0x129/0x190 [mac80211]()
[...]
Call Trace:
[<ffffffff817bf491>] dump_stack+0x45/0x57
[<ffffffff8107746a>] warn_slowpath_common+0x8a/0xc0
[<ffffffff8107755a>] warn_slowpath_null+0x1a/0x20
[<ffffffffc09ae109>] ieee80211_wep_add_iv+0x129/0x190 [mac80211]
[<ffffffffc09ae7ab>] ieee80211_crypto_wep_encrypt+0x6b/0xd0 [mac80211]
[<ffffffffc09d3fb1>] invoke_tx_handlers+0xc51/0xf30 [mac80211]
[...]
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: s/IEEE80211_WEP/WEP/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dbfe8ef559 upstream.
Avoton AHCI occasionally sees drive probe timeouts at driver load time.
When this happens SCR_STATUS indicates device detected, but no D2H FIS
reception. Reset the internal link state machines by bouncing
port-enable in the PCS register when this occurs.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Call ahci_start_engine() directly]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 948fa13504 upstream.
If the xHCI host controller has died (ie, device removed) or suffered
other serious fatal error (STS_FATAL), then xhci_irq should handle this
condition with IRQ_HANDLED instead of -ESHUTDOWN.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 18cc2f4cbb upstream.
Our event ring consists of only one segment, and we risk filling
the event ring in case we get isoc transfers with short intervals
such as webcams that fill a TD every microframe (125us)
With 64 TRB segment size one usb camera could fill the event ring in 8ms.
A setup with several cameras and other devices can fill up the
event ring as it is shared between all devices.
This has occurred when uvcvideo queues 5 * 32TD URBs which then
get cancelled when the video mode changes. The cancelled URBs are returned
in the xhci interrupt context and blocks the interrupt handler from
handling the new events.
A full event ring will block xhci from scheduling traffic and affect all
devices conneted to the xhci, will see errors such as Missed Service
Intervals for isoc devices, and and Split transaction errors for LS/FS
interrupt devices.
Increasing the TRB_PER_SEGMENT will also increase the default endpoint ring
size, which is welcome as for most isoc transfer we had to dynamically
expand the endpoint ring anyway to be able to queue the 5 * 32TDs uvcvideo
queues.
The default size used to be 64 TRBs per segment
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d104d0152a upstream.
Isoc TDs usually consist of one TRB, sometimes two. When all goes well we
receive only one success event for a TD, and move the dequeue pointer to
the next TD.
This fails if the TD consists of two TRBs and we get a transfer error
on the first TRB, we will then see two events for that TD.
Fix this by making sure the event we get is for the last TRB in that TD
before moving the dequeue pointer to the next TD. This will resolve some
of the uvc and dvb issues with the
"ERROR Transfer event TRB DMA ptr not part of current TD" error message
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e9eac2dce upstream.
If any memory allocation in resize_stripes fails we will return
-ENOMEM, but in some cases we update conf->pool_size anyway.
This means that if we try again, the allocations will be assumed
to be larger than they are, and badness results.
So only update pool_size if there is no error.
This bug was introduced in 2.6.17 and the patch is suitable for
-stable.
Fixes: ad01c9e375 ("[PATCH] md: Allow stripes to be expanded in preparation for expanding an array")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b9a5e5e18f upstream.
Since acpi_reserve_resources() is defined as a device_initcall(),
there's no guarantee that it will be executed in the right order
with respect to the rest of the ACPI initialization code. On some
systems this leads to breakage if, for example, the address range
that should be reserved for the ACPI fixed registers is given to
the PCI host bridge instead if the race is won by the wrong code
path.
Fix this by turning acpi_reserve_resources() into a void function
and calling it directly from within the ACPI initialization sequence.
Reported-and-tested-by: George McCollister <george.mccollister@gmail.com>
Link: http://marc.info/?t=143092384600002&r=1&w=2
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1432a2a35 upstream.
There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged. This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.
dlm_get_lock_resource {
...
spin_lock(&dlm->spinlock);
tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
if (tmpres) {
spin_unlock(&dlm->spinlock);
>>>>>>>> race window, dlm_run_purge_list() may run and purge
the lock resource
spin_lock(&tmpres->spinlock);
...
spin_unlock(&tmpres->spinlock);
}
}
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d8fd150fe3 upstream.
The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).
Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.
This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebe9cb3bb1 upstream.
If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value. Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 184af16b09 upstream.
The PM_RESTORE_PREPARE is not handled now in mmc_pm_notify(),
as result mmc_rescan() could be scheduled and executed at
late hibernation restore stages when MMC device is suspended
already - which, in turn, will lead to system crash on TI dra7-evm board:
WARNING: CPU: 0 PID: 3188 at drivers/bus/omap_l3_noc.c:148 l3_interrupt_handler+0x258/0x374()
44000000.ocp:L3 Custom Error: MASTER MPU TARGET L4_PER1_P3 (Idle): Data Access in User mode during Functional access
Hence, add missed PM_RESTORE_PREPARE PM event in mmc_pm_notify().
Fixes: 4c2ef25fe0 (mmc: fix all hangs related to mmc/sd card...)
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 280227a75b upstream.
fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: Davide Italiano <dccitaliano@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- Add the 'out' label]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f32393c943 upstream.
The incorrect ordering of operations during cpu dlpar add results in invalid
affinity for the cpu being added. The ibm,associativity property in the
device tree is populated with all zeroes for the added cpu which results in
invalid affinity mappings and all cpus appear to belong to node 0.
This occurs because rtas configure-connector is called prior to making the
rtas set-indicator calls. Phyp does not assign affinity information
for a cpu until the rtas set-indicator calls are made to set the isolation
and allocation state.
Correct the order of operations to make the rtas set-indicator
calls (done in dlpar_acquire_drc) before calling rtas configure-connector.
Fixes: 1a8061c46c ("powerpc/pseries: Add kernel based CPU DLPAR handling")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2:
- Adjust context
- Keep using goto instead of directly returning]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 483d821108 upstream.
Unregister GPIOs requested through sysfs at chip remove to avoid leaking
the associated memory and sysfs entries.
The stale sysfs entries prevented the gpio numbers from being exported
when the gpio range was later reused (e.g. at device reconnect).
This also fixes the related module-reference leak.
Note that kernfs makes sure that any on-going sysfs operations finish
before the class devices are unregistered and that further accesses
fail.
The chip exported flag is used to prevent gpiod exports during removal.
This also makes it harder to trigger, but does not fix, the related race
between gpiochip_remove and export_store, which is really a race with
gpiod_request that needs to be addressed separately.
Also note that this would prevent the crashes (e.g. NULL-dereferences)
at reconnect that affects pre-3.18 kernels, as well as use-after-free on
operations on open attribute files on pre-3.14 kernels (prior to
kernfs).
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2:
- Adjust filename, context
- Move up initialisation of 'desc' in gpio_export()
- Use global 'gpio_desc' array and gpio_free() function in
gpiochip_unexport()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01cca93a94 upstream.
Unregister gpiochip device (used to export information through sysfs)
before removing it internally. This way removal will reverse addition.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8014bcc86e upstream.
The variable for the 'permissive' module parameter used to be static
but was recently changed to be extern. This puts it in the kernel
global namespace if the driver is built-in, so its name should begin
with a prefix identifying the driver.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: af6fc858a3 ("xen-pciback: limit guest control of command register")
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
commit 48ef23a4f6 upstream.
This phone is already supported by the visor driver.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c735ed74d8 upstream.
Added the USB serial console device ID for KCF Technologies PRN device
which has a USB port for its serial console.
Signed-off-by: Mark Edwards <sonofaforester@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7241ea558c upstream.
Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two
modes for DMA. Second mode is useful for 64 bit os with more then 2 GB
of ram (fixes problems with big soundfont loading)
1) 32MB from 2 GB address space using 8192 pages (used now as default)
2) 16MB from 4 GB address space using 4096 pages
Mode is set using HCFG_EXPANDED_MEM flag in HCFG register.
Also format of emu10k2 page table is then different.
Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c94e65c66 upstream.
The OSS emulation in synth-emux helper has a potential AB/BA deadlock
at the simultaneous closing and opening:
close ->
snd_seq_release() ->
sne_seq_free_client() ->
snd_seq_delete_all_ports(): takes client->ports_mutex ->
port_delete() ->
snd_emux_unuse(): takes emux->register_mutex
open ->
snd_seq_oss_open() ->
snd_emux_open_seq_oss(): takes emux->register_mutex ->
snd_seq_event_port_attach() ->
snd_seq_create_port(): takes client->ports_mutex
This patch addresses the deadlock by reducing the rance taking
emux->register_mutex in snd_emux_open_seq_oss(). The lock is needed
for the refcount handling, so move it locally. The calls in
emux_seq.c are already with the mutex, thus they are replaced with the
version without mutex lock/unlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6befa9d883 upstream.
Do not probe all serial drivers by of_serial.c which are using
device_type = "serial"; property. Only drivers which have valid
compatible strings listed in the driver should be probed.
When PORT_UNKNOWN is setup probe will fail anyway.
Arnd quotation about driver historical background:
"when I wrote that driver initially, the idea was that it would
get used as a stub to hook up all other serial drivers but after
that, the common code learned to create platform devices from DT"
This patch fix the problem with on the system with xilinx_uartps and
16550a where of_serial failed to register for xilinx_uartps and because
of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking
for irq with request_irq() EINVAL is returned.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c90c07b98 upstream.
For systems with CONFIG_SERIAL_OF_PLATFORM=y and device_type =
"serial"; property in DT of_serial.c driver maps and unmaps IRQ (because
driver probe fails). Then a driver is called but irq mapping is not
created that's why driver is failing again in again on request_irq().
Based on this use platform_get_irq() instead of platform_get_resource()
which is doing irq_desc allocation and driver itself can request IRQ.
Fix both xilinx serial drivers in the tree.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Return directly on failure in xuartps_probe()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 414b7e3b9c upstream.
The USB mini-driver in rtlwifi, which is used by rtl8192cu, issues a call to
usb_control_msg() with a timeout value of 0. In some instances where the
interface is shutting down, this infinite wait results in a CPU deadlock. A
one second timeout fixes this problem without affecting any normal operations.
This bug is reported at https://bugzilla.novell.com/show_bug.cgi?id=927786.
Reported-by: Bernhard Wiedemann <bwiedemann@suse.com>
Tested-by: Bernhard Wiedemann <bwiedemann@suse.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Bernhard Wiedemann <bwiedemann@suse.com>
Cc: Takashi Iwai<tiwai@suse.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0d3bba0287 upstream.
Phil and I found out a problem with commit:
7e860a6e7a ("cdc-acm: add sanity checks")
It added some sanity checks to ignore potential garbage in CDC headers but
also introduced a potential infinite loop. This can happen at the first
loop iteration (elength = 0 in that case) if the description isn't a
DT_CS_INTERFACE or later if 'buffer[0]' is zero.
It should also be noted that the wrong length was being added to 'buffer'
in case 'buffer[1]' was not a DT_CS_INTERFACE descriptor, since elength was
assigned after that check in the loop.
A specially crafted USB device could be used to trigger this infinite loop.
Fixes: 7e860a6e7a ("cdc-acm: add sanity checks")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
CC: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
CC: Oliver Neukum <oneukum@suse.de>
CC: Adam Lee <adam8157@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 118c855b56 upstream.
The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9cd9554615 upstream.
The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 579d69bc1f upstream.
The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Torsten Luettgert <ml-lkml@enda.eu>
Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07b0e5d49d upstream.
The emux-synth driver has a possible AB/BA mutex deadlock at unloading
the emu10k1 driver:
snd_emux_free() ->
snd_emux_detach_seq(): mutex_lock(&emu->register_mutex) ->
snd_seq_delete_kernel_client() ->
snd_seq_free_client(): mutex_lock(®ister_mutex)
snd_seq_release() ->
snd_seq_free_client(): mutex_lock(®ister_mutex) ->
snd_seq_delete_all_ports() ->
snd_emux_unuse(): mutex_lock(&emu->register_mutex)
Basically snd_emux_detach_seq() doesn't need a protection of
emu->register_mutex as it's already being unregistered. So, we can
get rid of this for avoiding the deadlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d02260824e upstream.
Some models provide too long string for the shortname that has 32bytes
including the terminator, and it results in a non-terminated string
exposed to the user-space. This isn't too critical, though, as the
string is stopped at the succeeding longname string.
This patch fixes such entries by dropping "SB" prefix (it's enough to
fit within 32 bytes, so far). Meanwhile, it also changes strcpy()
with strlcpy() to make sure that this kind of problem won't happen in
future, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 09c5b4803a upstream.
When the LPM policy is set to ATA_LPM_MAX_POWER, the device might
generate a spurious PHY event that cuases errors on the link.
Ignore this event if it occured within 10s after the policy change.
The timeout was chosen observing that on a Dell XPS13 9333 these
spurious events can occur up to roughly 6s after the policy change.
Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8393b811f3 upstream.
This is a preparation commit that will allow to add other criteria
according to which PHY events should be dropped.
Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 464d1387ac upstream.
mm/page-writeback.c has several places where 1 is added to the divisor
to prevent division by zero exceptions; however, if the original
divisor is equivalent to -1, adding 1 leads to division by zero.
There are three places where +1 is used for this purpose - one in
pos_ratio_polynom() and two in bdi_position_ratio(). The second one
in bdi_position_ratio() actually triggered div-by-zero oops on a
machine running a 3.10 kernel. The divisor is
x_intercept - bdi_setpoint + 1 == span + 1
span is confirmed to be (u32)-1. It isn't clear how it ended up that
but it could be from write bandwidth calculation underflow fixed by
c72efb658f ("writeback: fix possible underflow in write bandwidth
calculation").
At any rate, +1 isn't a proper protection against div-by-zero. This
patch converts all +1 protections to |1. Note that
bdi_update_dirty_ratelimit() was already using |1 before this patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 085e68eeaf upstream.
The host's decision to enable machine check exceptions should remain
in force during non-root mode. KVM was writing 0 to cr4 on VCPU reset
and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
Tested: Built.
On earlier version, tested by injecting machine check
while a guest is spinning.
Before the change, if guest CR4.MCE==0, then the machine check is
escalated to Catastrophic Error (CATERR) and the machine dies.
If guest CR4.MCE==1, then the machine check causes VMEXIT and is
handled normally by host Linux. After the change, injecting a machine
check causes normal Linux machine check handling.
Signed-off-by: Ben Serebrin <serebrin@google.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: use read_cr4() instead of cr4_read_shadow()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13f6b191aa upstream.
Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.
Fixes: f1d8269802 ('memstick: use fully asynchronous request processing')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b72c186999 upstream.
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <pthread.h>
#include <assert.h>
int pid;
void *waiter(void *arg)
{
int stat;
for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;
assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}
int main(void)
{
pthread_t thread;
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}
assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);
return 0;
}
Note for stable: the bug is very old, but without 9899d11f65 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d43698e8ab upstream.
Commit 2473238eac ("ihex: add support for CS:IP/EIP records") removes
the "default:" statement in the switch block, making the "return
usage();" line dead code and ihex2fw silently ignoring unknown options.
Restore this statement.
This bug was found by building with HOSTCC=clang and adding
-Wunreachable-code-return to HOSTCFLAGS.
Fixes: 2473238eac ("ihex: add support for CS:IP/EIP records")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 16b8528d20 upstream.
We only want to steer the I/O completion towards a queue, but don't
actually access any per-CPU data, so the raw_ version is fine to use
and avoids the warnings when using smp_processor_id().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[bwh: Backported to 3.2: drop changes to megasas_build_dcdb_fusion()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca9b590caa upstream.
The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.
It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.
The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.
An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.
Fixes: b832be1e40 ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66578b0b2f upstream.
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).
This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().
There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a87938b2e2 upstream.
With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.
Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.
Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).
Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB. The larger the data
segment of the binary the higher the probability of failure.
Fix this by calculating the total size of the binary in the same way as
load_elf_interp().
Signed-off-by: Michael Davidson <md@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2b8760100e upstream.
ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451
It is reported that on a physically 64-bit addressed machine, 32-bit kernel
can trigger crashes in accessing the memory regions that are beyond the
32-bit boundary. The region field's start address should still be 32-bit
compliant, but after a calculation (adding some offsets), it may exceed the
32-bit boundary. This case is rare and buggy, but there are real BIOSes
leaked with such issues (see References below).
This patch fixes this gap by always defining IO addresses as 64-bit, and
allows OSPMs to optimize it for a real 32-bit machine to reduce the size of
the internal objects.
Internal acpi_physical_address usages in the structures that can be fixed
by this change include:
1. struct acpi_object_region:
acpi_physical_address address;
2. struct acpi_address_range:
acpi_physical_address start_address;
acpi_physical_address end_address;
3. struct acpi_mem_space_context;
acpi_physical_address address;
4. struct acpi_table_desc
acpi_physical_address address;
See known issues 1 for other usages.
Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer
from same problem, so this patch changes it accordingly.
For iasl, it will enforce acpi_physical_address as 32-bit to generate
32-bit OSPM compatible tables on 32-bit platforms, we need to define
ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h.
Known issues:
1. Cleanup of mapped virtual address
In struct acpi_mem_space_context, acpi_physical_address is used as a virtual
address:
acpi_physical_address mapped_physical_address;
It is better to introduce acpi_virtual_address or use acpi_size instead.
This patch doesn't make such a change. Because this should be done along
with a change to acpi_os_map_memory()/acpi_os_unmap_memory().
There should be no functional problem to leave this unchanged except
that only this structure is enlarged unexpectedly.
Link: https://github.com/acpica/acpica/commit/aacf863c
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501
Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Reported-and-tested-by: Sial Nije <sialnije@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9a5cbce421 upstream.
We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccccf3d672 upstream.
If we attempt to clone a 0 length region into a file we can end up
inserting a range in the inode's extent_io tree with a start offset
that is greater then the end offset, which triggers immediately the
following warning:
[ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3914.620886] BTRFS: end < start 4095 4096
(...)
[ 3914.638093] Call Trace:
[ 3914.638636] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3914.639620] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3914.640789] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3914.642041] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3914.643236] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3914.644441] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3914.645711] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3914.646914] [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33
[ 3914.648058] [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs]
[ 3914.650105] [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs]
[ 3914.651361] [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs]
[ 3914.652761] [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs]
[ 3914.654128] [<ffffffff811226dd>] ? might_fault+0x58/0xb5
[ 3914.655320] [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs]
(...)
[ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---
This later makes the inode eviction handler enter an infinite loop that
keeps dumping the following warning over and over:
[ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
[ 3915.119913] BTRFS: end < start 4095 4096
(...)
[ 3915.137394] Call Trace:
[ 3915.137913] [<ffffffff81425fd9>] dump_stack+0x4c/0x65
[ 3915.139154] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb
[ 3915.140316] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs]
[ 3915.141505] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48
[ 3915.142709] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs]
[ 3915.143849] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs]
[ 3915.145120] [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs]
[ 3915.146352] [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50
[ 3915.147565] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs]
[ 3915.148785] [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33
[ 3915.149931] [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs]
[ 3915.151154] [<ffffffff81168904>] evict+0xa0/0x148
[ 3915.152094] [<ffffffff811689e5>] dispose_list+0x39/0x43
[ 3915.153081] [<ffffffff81169564>] evict_inodes+0xdc/0xeb
[ 3915.154062] [<ffffffff81154418>] generic_shutdown_super+0x49/0xef
[ 3915.155193] [<ffffffff811546d1>] kill_anon_super+0x13/0x1e
[ 3915.156274] [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs]
(...)
[ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---
So just bail out of the clone ioctl if the length of the region to clone
is zero, without locking any extent range, in order to prevent this issue
(same behaviour as a pwrite with a 0 length for example).
This is trivial to reproduce. For example, the steps for the test I just
made for fstests:
mkfs.btrfs -f SCRATCH_DEV
mount SCRATCH_DEV $SCRATCH_MNT
touch $SCRATCH_MNT/foo
touch $SCRATCH_MNT/bar
$CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
umount $SCRATCH_MNT
A test case for fstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d744194956 upstream.
Sebastian reported a crash caused by a jump label mismatch after resume.
This happens because we do not save the kernel text section during suspend
and therefore also do not restore it during resume, but use the kernel image
that restores the old system.
This means that after a suspend/resume cycle we lost all modifications done
to the kernel text section.
The reason for this is the pfn_is_nosave() function, which incorrectly
returns that read-only pages don't need to be saved. This is incorrect since
we mark the kernel text section read-only.
We still need to make sure to not save and restore pages contained within
NSS and DCSS segment.
To fix this add an extra case for the kernel text section and only save
those pages if they are not contained within an NSS segment.
Fixes the following crash (and the above bugs as well):
Jump label code mismatch at netif_receive_skb_internal+0x28/0xd0
Found: c0 04 00 00 00 00
Expected: c0 f4 00 00 00 11
New: c0 04 00 00 00 00
Kernel panic - not syncing: Corrupted kernel text
CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.19.0-01975-gb1b096e70f23 #4
Call Trace:
[<0000000000113972>] show_stack+0x72/0xf0
[<000000000081f15e>] dump_stack+0x6e/0x90
[<000000000081c4e8>] panic+0x108/0x2b0
[<000000000081be64>] jump_label_bug.isra.2+0x104/0x108
[<0000000000112176>] __jump_label_transform+0x9e/0xd0
[<00000000001121e6>] __sm_arch_jump_label_transform+0x3e/0x50
[<00000000001d1136>] multi_cpu_stop+0x12e/0x170
[<00000000001d1472>] cpu_stopper_thread+0xb2/0x168
[<000000000015d2ac>] smpboot_thread_fn+0x134/0x1b0
[<0000000000158baa>] kthread+0x10a/0x110
[<0000000000824a86>] kernel_thread_starter+0x6/0xc
Reported-and-tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: add necessary #include directives]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 451a2886b6 upstream.
unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there. If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.
X-Coverup: TINC (and there's no lumber cartel either)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: s/MAX_UIOVEC/UIO_MAXIOV/. This was fixed upstream by commit
fdc81f45e9 ("sg_start_req(): use import_iovec()"), but we don't have
that function.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f7e9e35836 upstream.
This problem appears to have been introduced in 2.6.29 by commit
93197a36a9 "Rewrite sysfs processor cache info code".
This caused lscpu to error out on at least e500v2 devices, eg:
error: cannot open /sys/devices/system/cpu/cpu0/cache/index2/size: No such file or directory
Some embedded powerpc systems use cache-size in DTS for the unified L2
cache size, not d-cache-size, so we need to allow for both DTS names.
Added a new CACHE_TYPE_UNIFIED_D cache_type_info structure to handle
this.
Fixes: 93197a36a9 ("powerpc: Rewrite sysfs processor cache info code")
Signed-off-by: Dave Olson <olson@cumulusnetworks.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[bwh: Backported to 3.2:
- Adjust context
- Preserve __cpuinit attribute on cache_do_one_devnode_unified()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 91bf0c2dcb upstream.
The functions snd_emu10k1_proc_spdif_read and snd_emu1010_fpga_read
acquire the emu_lock before accessing the FPGA. The function used
to access the FPGA (snd_emu1010_fpga_read) also tries to take
the emu_lock which causes a deadlock.
Remove the outer locking in the proc-functions (guarding only the
already safe fpga read) to prevent this deadlock.
[removed superfluous flags variables too -- tiwai]
Signed-off-by: Michael Gernoth <michael@gernoth.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8de580742f upstream.
We may exit this function without properly freeing up the maapings
we may have acquired. Fix the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
[bwh: Backported to 3.2:
- Adjust filename
- Keep using kmap_atomic()/kunmap_atomic(), not the sg_-prefixed functions]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebe96e641d upstream.
AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.
Fixes: 0c5f9b8830 ("RDS: Documentation")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd884149ac upstream.
On ASUS TP500LN and X750JN, the touchpad absolute mode is reset each
time set_rate is done.
In order to fix this, we will verify the firmware version, and if it
matches the one in those laptops, the set_rate function is overloaded
with a function elantech_set_rate_restore_reg_07 that performs the
set_rate with the original function, followed by a restore of reg_07
(the register that sets the absolute mode on elantech v4 hardware).
Also the ASUS TP500LN and X750JN firmware version, capabilities, and
button constellation is added to elantech.c
Reported-and-tested-by: George Moutsopoulos <gmoutso@yahoo.co.uk>
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2:
- Adjust context
- Drop the insertion into a comment we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e7056c433 upstream.
Looking over the implementation for jhash2 and comparing it to jhash_3words
I realized that the two hashes were in fact very different. Doing a bit of
digging led me to "The new jhash implementation" in which lookup2 was
supposed to have been replaced with lookup3.
In reviewing the patch I noticed that jhash2 had originally initialized a
and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and
c were initialized to initval + (length << 2) + JHASH_INITVAL. However the
changes in jhash_3words simply replaced the initialization of a and b with
JHASH_INITVAL.
This change corrects what I believe was an oversight so that a, b, and c in
jhash_3words all have the same value added consisting of initval + (length
<< 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the
same hash result given the same inputs.
Fixes: 60d509c823 ("The new jhash implementation")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e12fb97222 upstream.
Previously commit 14ece1028b added a
support for for syncing parent directory of newly created inodes to
make sure that the inode is not lost after a power failure in
no-journal mode.
However this does not work in majority of cases, namely:
- if the directory has inline data
- if the directory is already indexed
- if the directory already has at least one block and:
- the new entry fits into it
- or we've successfully converted it to indexed
So in those cases we might lose the inode entirely even after fsync in
the no-journal mode. This also includes ext2 default mode obviously.
I've noticed this while running xfstest generic/321 and even though the
test should fail (we need to run fsck after a crash in no-journal mode)
I could not find a newly created entries even when if it was fsynced
before.
Fix this by adjusting the ext4_add_entry() successful exit paths to set
the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
parent directory as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Frank Mayhar <fmayhar@google.com>
[bwh: Backported to 3.2: inline data is not supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 74ff960222 upstream.
The delay time after a reset in the codec probe callback was too short,
and did not work on certain hw because the codec needs more time to
power on. This increases the delay time from 1us to 1ms.
Signed-off-by: Pascal Huerst <pascal.huerst@gmail.com>
Acked-by: Brian Austin <brian.austin@cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8defb3367f upstream.
Usually ELF_ET_DYN_BASE is 2/3 of TASK_SIZE. With 3G/1G user/kernel
split this is not so, because 2*TASK_SIZE overflows 32 bits,
so the actual value of ELF_ET_DYN_BASE is:
(2 * TASK_SIZE / 3) = 0x2a000000
When ASLR is disabled PIE binaries will load at ELF_ET_DYN_BASE address.
On 32bit platforms AddressSanitzer uses addresses [0x20000000 - 0x40000000]
for shadow memory [1]. So ASan doesn't work for PIE binaries when ASLR disabled
as it fails to map shadow memory.
Also after Kees's 'split ET_DYN ASLR from mmap ASLR' patchset PIE binaries
has a high chance of loading somewhere in between [0x2a000000 - 0x40000000]
even if ASLR enabled. This makes ASan with PIE absolutely incompatible.
Fix overflow by dividing TASK_SIZE prior to multiplying.
After this patch ELF_ET_DYN_BASE equals to (for CONFIG_VMSPLIT_3G=y):
(TASK_SIZE / 3 * 2) = 0x7f555554
[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm#Mapping
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Maria Guseva <m.guseva@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3c3b04d10f upstream.
Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
works:
$ touch file
$ setfattr -n user. -v 1 file
$ getfattr -d file
user.="1"
ie. the missing attribute name after the namespace.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
Reported-by: William Douglas <william.douglas@intel.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2: XATTR_BTRFS_PREFIX is not supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dcc82f4783 upstream.
While committing a transaction we free the log roots before we write the
new super block. Freeing the log roots implies marking the disk location
of every node/leaf (metadata extent) as pinned before the new super block
is written. This is to prevent the disk location of log metadata extents
from being reused before the new super block is written, otherwise we
would have a corrupted log tree if before the new super block is written
a crash/reboot happens and the location of any log tree metadata extent
ended up being reused and rewritten.
Even though we pinned the log tree's metadata extents, we were issuing a
discard against them if the fs was mounted with the -o discard option,
resulting in corruption of the log tree if a crash/reboot happened before
writing the new super block - the next time the fs was mounted, during
the log replay process we would find nodes/leafs of the log btree with
a content full of zeroes, causing the process to fail and require the
use of the tool btrfs-zero-log to wipeout the log tree (and all data
previously fsynced becoming lost forever).
Fix this by not doing a discard when pinning an extent. The discard will
be done later when it's safe (after the new super block is committed) at
extent-tree.c:btrfs_finish_extent_commit().
Fixes: e688b7252f (Btrfs: fix extent pinning bugs in the tree log)
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 73cffdb65e upstream.
Don't wait after sending request for offers to the host. This wait is
unnecessary and simply adds 5 seconds to the boot time.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: deleted variable t was declared as int]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 299d0c5b27 upstream.
The comparison from the previous line seems to have been erroneously
(partially) copied-and-pasted onto the next. The second line should be
checking req.bytes, not req.lnum.
Coverity CID #139400
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[rw: Fixed comparison]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f16db8071c upstream.
In some of the 'out_not_moved' error paths, lnum may be used
uninitialized. Don't ignore the warning; let's fix it.
This uninitialized variable doesn't have much visible effect in the end,
since we just schedule the PEB for erasure, and its LEB number doesn't
really matter (it just gets printed in debug messages). But let's get it
straight anyway.
Coverity CID #113449
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d74adbdb9a upstream.
If aeb->len >= vol->reserved_pebs, we should not be writing aeb into the
PEB->LEB mapping.
Caught by Coverity, CID #711212.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2: adjust context; s/leb/seb/g]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8eef7d70f7 upstream.
We are completely discarding the earlier value of 'bitflips', which
could reflect a bitflip found in ubi_io_read_vid_hdr(). Let's use the
bitwise OR of header and data 'bitflip' statuses instead.
Coverity CID #1226856
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2c20d92dad upstream.
the lcd type as defined in the Kconfig is not matching in the code.
as a result the rs, rw and en pins were getting interchanged.
Kconfig defines the value of PANEL_LCD to be 1 if we select custom
configuration but in the code LCD_TYPE_CUSTOM is defined as 5.
my hardware is LCD_TYPE_CUSTOM, but the pins were assigned to it
as pins of LCD_TYPE_OLD, and it was not working.
Now values are corrected with referenece to the values defined in
Kconfig and it is working.
checked on JHD204A lcd with LCD_TYPE_CUSTOM configuration.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: parameter description was split across two lines]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 323ece54e0 upstream.
Values directly from descriptors given in debug statements
must be converted to native endianness.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8787041d9b upstream.
The WM8741 DAC supports the following typical audio sampling rates:
44.1kHz, 88.2kHz, 176.4kHz (eg: with a master clock of 22.5792MHz)
32kHz, 48kHz, 96kHz, 192kHz (eg: with a master clock of 24.576MHz)
For the rates lists, we should use 82000 instead of 88235, 176400
instead of 1764000 and 192000 instead of 19200 (seems to be a typo).
Signed-off-by: Sergej Sawazki <ce3a@gmx.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0b053c9518 upstream.
OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient to
ensure protection from dead store optimization.
For the random driver and crypto drivers, calls are emitted ...
$ gdb vmlinux
(gdb) disassemble memzero_explicit
Dump of assembler code for function memzero_explicit:
0xffffffff813a18b0 <+0>: push %rbp
0xffffffff813a18b1 <+1>: mov %rsi,%rdx
0xffffffff813a18b4 <+4>: xor %esi,%esi
0xffffffff813a18b6 <+6>: mov %rsp,%rbp
0xffffffff813a18b9 <+9>: callq 0xffffffff813a7120 <memset>
0xffffffff813a18be <+14>: pop %rbp
0xffffffff813a18bf <+15>: retq
End of assembler dump.
(gdb) disassemble extract_entropy
[...]
0xffffffff814a5009 <+313>: mov %r12,%rdi
0xffffffff814a500c <+316>: mov $0xa,%esi
0xffffffff814a5011 <+321>: callq 0xffffffff813a18b0 <memzero_explicit>
0xffffffff814a5016 <+326>: mov -0x48(%rbp),%rax
[...]
... but in case in future we might use facilities such as LTO, then
OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible
eviction of the memset(). We have to use a compiler barrier instead.
Minimal test example when we assume memzero_explicit() would *not* be
a call, but would have been *inlined* instead:
static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
<foo>
}
int main(void)
{
char buff[20];
snprintf(buff, sizeof(buff) - 1, "test");
printf("%s", buff);
memzero_explicit(buff, sizeof(buff));
return 0;
}
With <foo> := OPTIMIZER_HIDE_VAR():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: xor %eax,%eax
0x000000000040046b <+43>: add $0x28,%rsp
0x000000000040046f <+47>: retq
End of assembler dump.
With <foo> := barrier():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: movq $0x0,(%rsp)
0x0000000000400471 <+49>: movq $0x0,0x8(%rsp)
0x000000000040047a <+58>: movl $0x0,0x10(%rsp)
0x0000000000400482 <+66>: xor %eax,%eax
0x0000000000400484 <+68>: add $0x28,%rsp
0x0000000000400488 <+72>: retq
End of assembler dump.
As can be seen, movq, movq, movl are being emitted inlined
via memset().
Reference: http://thread.gmane.org/gmane.linux.kernel.cryptoapi/13764/
Fixes: d4c5efdb97 ("random: add and use memzero_explicit() for clearing data")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: mancha security <mancha1@zoho.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1915a718b1 upstream.
The return value of power_supply_register() call was not checked and
even on error probe() function returned 0. If registering failed then
during unbind the driver tried to unregister power supply which was not
actually registered.
This could lead to memory corruption because power_supply_unregister()
unconditionally cleans up given power supply.
Fix this by checking return status of power_supply_register() call. In
case of failure, clean up sysfs entries and fail the probe.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 9be0fcb5ed ("compal-laptop: add JHL90, battery & hwmon interface")
Signed-off-by: Sebastian Reichel <sre@kernel.org>
[bwh: Backported to 3.2: insert the appropriate cleanup code as there is no
common 'remove' label]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3c93e1a3f upstream.
As per Mentor Graphics' documentation, we should
always handle TX endpoints before RX endpoints.
This patch fixes that error while also updating
some hard-to-read comments which were scattered
around musb_interrupt().
This patch should be backported as far back as
possible since this error has been in the driver
since it's conception.
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b75f4c9afa upstream.
s390 documentation requires words 0 and 10-15 to be reserved and stored as
zeros. As we fill out all other fields, we can memset the full structure.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08e8331654 upstream.
There is a race condition between e1000_change_mtu's cleanups and
netpoll, when we change the MTU across jumbo size:
Changing MTU frees all the rx buffers:
e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings ->
e1000_clean_rx_ring
Then, close to the end of e1000_change_mtu:
pr_info -> ... -> netpoll_poll_dev -> e1000_clean ->
e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag
And when we come back to do the rest of the MTU change:
e1000_up -> e1000_configure -> e1000_configure_rx ->
e1000_alloc_jumbo_rx_buffers
alloc_jumbo finds the buffers already != NULL, since data (shared with
page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage,
or at least not what is expected when in jumbo state.
This results in an unusable adapter (packets don't get through), and a
NULL pointer dereference on the next call to e1000_clean_rx_ring
(other mtu change, link down, shutdown):
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81194d6e>] put_compound_page+0x7e/0x330
[...]
Call Trace:
[<ffffffff81195445>] put_page+0x55/0x60
[<ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200
[<ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60
[<ffffffff815df5e0>] e1000_down+0x1c0/0x1d0
[<ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840
[<ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170
[<ffffffff81647050>] dev_set_mtu+0xa0/0x140
[<ffffffff81664218>] do_setlink+0x218/0xac0
[<ffffffff814459e9>] ? nla_parse+0xb9/0x120
[<ffffffff816652d0>] rtnl_newlink+0x6d0/0x890
[<ffffffff8104f000>] ? kvm_clock_read+0x20/0x40
[<ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100
[<ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260
By setting the allocator to a dummy version, netpoll can't mess up our
rx buffers. The allocator is set back to a sane value in
e1000_configure_rx.
Fixes: edbbb3ca10 ("e1000: implement jumbo receive with partial descriptors")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 40384e4bbe upstream.
Correctly rollback state if the failure occurs after we have handed over
the ownership of the buffer to the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b05ac3824 upstream.
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.
The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov
for noticing that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
commit 579eb62ac3 upstream.
commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.
Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"
Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
commit f20fbaad76 upstream.
`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length. It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow. For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long. If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address. Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.
Thanks to Dan Carpenter for reporting the potential integer overflow.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
[Ian Abbott: Note: original commit compares the lengths to INT_MAX
instead of bufsiz due to changes in earlier commits.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 330966e501 upstream.
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.
It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Brad Spengler: backported to 3.2]
Signed-off-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 845704a535 ]
Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.
Lets try a different strategy :
In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.
By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.
In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop inapplicable change to sk_forced_wmem_schedule()
- s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2ab957492d ]
Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets
Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.
This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.
v2:
Remove useless comment
Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9974ad4ae upstream.
netpoll can call functions in hard irq context that are ordinarily
called in lesser contexts. For those functions use dev_kfree_skb_any
and dev_consume_skb_any so skbs are freed safely from hard irq
context.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: use only dev_kfree_skb() and not dev_consume_skb_any()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d8ec2c02ca upstream.
Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as
which can be called in hard irq by netpoll, softirq context
by normal napi polling, and in normal sleepable context
by the network device close method.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f7e79913a1 upstream.
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 497a27b9e1 upstream.
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 989c9ba104 upstream.
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a2ccd2e4bd upstream.
Replace dev_kfree_skb with dev_kfree_skb_any in functions that can
be called in hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 508f81d517 upstream.
Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit
as it can be called in both hard irq and other contexts.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 355a901e6c ]
While working on sk_forward_alloc problems reported by Denys
Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
sk_forward_alloc is negative while connect is in progress.
We can fix this by calling regular sk_stream_alloc_skb() both for the
SYN packet (in tcp_connect()) and the syn_data packet in
tcp_send_syn_data()
Then, tcp_send_syn_data() can avoid copying syn_data as we simply
can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
Instead of open coding memcpy_fromiovecend(), simply use this helper.
This leaves in socket write queue clean fast clone skbs.
This was tested against our fastopen packetdrill tests.
Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop the Fast Open changes
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7d985ed1dc ]
[I would really like an ACK on that one from dhowells; it appears to be
quite straightforward, but...]
MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of
fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit
in there. It gets passed via flags; in fact, another such check in the same
function is done correctly - as flags & MSG_PEEK.
It had been that way (effectively disabled) for 8 years, though, so the patch
needs beating up - that case had never been tested. If it is correct, it's
-stable fodder.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3eeff778e0 ]
It should be checking flags, not msg->msg_flags. It's ->sendmsg()
instances that need to look for that in ->msg_flags, ->recvmsg() ones
(including the other ->recvmsg() instance in that file, as well as
unix_dgram_recvmsg() this one claims to be imitating) check in flags.
Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC
in receive") back in 2010, so it goes quite a while back.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f862e07cf9 ]
The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
on the stack in order to pass a pair of addresses. This happens to just
fit withint the 1024 byte stack size warning limit on x86, but just
exceed that limit on ARM, which gives us this warning:
net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
As the use of this large variable is basically bogus, we can rearrange
the code to not do that. Instead of passing an rds socket into
rds_iw_get_device, we now just pass the two addresses that we have
available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
to create two address structures on the stack there.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cdda88912d upstream.
I found if we write a larger than 4GB value to some sysctl
variables, the sending syscall will hang up forever, because these
variables are 32 bits, such large values make them overflow to 0 or
negative.
This patch try to fix overflow or prevent from zero value setup
of below sysctl variables:
net.core.wmem_default
net.core.rmem_default
net.core.rmem_max
net.core.wmem_max
net.ipv4.udp_rmem_min
net.ipv4.udp_wmem_min
net.ipv4.tcp_wmem
net.ipv4.tcp_rmem
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Li Yu <raise.sail@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Delete now-unused 'zero' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 9145736d48 ]
1. For an IPv4 ping socket, ping_check_bind_addr does not check
the family of the socket address that's passed in. Instead,
make it behave like inet_bind, which enforces either that the
address family is AF_INET, or that the family is AF_UNSPEC and
the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
if the socket family is not AF_INET6. Return EAFNOSUPPORT
instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
instead of EINVAL if an incorrect socket address structure is
passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
IPv4, and it cannot easily be made to support IPv4 because
the protocol numbers for ICMP and ICMPv6 are different. This
makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
of making the socket unusable.
Among other things, this fixes an oops that can be triggered by:
int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
struct sockaddr_in6 sin6 = {
.sin6_family = AF_INET6,
.sin6_addr = in6addr_any,
};
bind(s, (struct sockaddr *) &sin6, sizeof(sin6));
Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Drop the IPv6 part
- Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit acf8dd0a9d ]
If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
checksum is to be computed on segmentation. However, in this case,
skb->csum_start and skb->csum_offset are never set as raw socket
transmit path bypasses udp_send_skb() where they are usually set. As a
result, driver may access invalid memory when trying to calculate the
checksum and store the result (as observed in virtio_net driver).
Moreover, the very idea of modifying the userspace provided UDP header
is IMHO against raw socket semantics (I wasn't able to find a document
clearly stating this or the opposite, though). And while allowing
CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
too intrusive change just to handle a corner case like this. Therefore
disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 42c972a1f3 ]
The National Instruments USB Host-to-Host Cable is based on the Prolific
PL-25A1 chipset. Add its VID/PID so the plusb driver will recognize it.
Signed-off-by: Ben Shelton <ben.shelton@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2f1d8b9e8a ]
Brian reported crashes using IPv6 traffic with macvtap/veth combo.
I tracked the crashes in neigh_hh_output()
-> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
Neighbour code assumes headroom to push Ethernet header is
at least 16 bytes.
It appears macvtap has only 14 bytes available on arches
where NET_IP_ALIGN is 0 (like x86)
Effect is a corruption of 2 bytes right before skb->head,
and possible crashes if accessing non existing memory.
This fix should also increase IPv4 performance, as paranoid code
in ip_finish_output2() wont have to call skb_realloc_headroom()
Reported-by: Brian Rak <brak@vultr.com>
Tested-by: Brian Rak <brak@vultr.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 16a3fa2863 upstream.
We currently use hdr_len as a hint of head length which is advertised by
guest. But when guest advertise a very big value, it can lead to an 64K+
allocating of kmalloc() which has a very high possibility of failure when host
memory is fragmented or under heavy stress. The huge hdr_len also reduce the
effect of zerocopy or even disable if a gso skb is linearized in guest.
To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
head, which guarantees an order 0 allocation each time.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a4176a9391 ]
colons are used as a separator in netdev device lookup in dev_ioctl.c
Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
Signed-off-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 34eea79e26 ]
In tcf_em_validate(), after calling request_module() to load the
kind-specific module, set em->ops to NULL before returning -EAGAIN, so
that module_put() is not called again by tcf_em_tree_destroy().
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3e32e733d1 ]
ip_check_defrag() may be used by af_packet to defragment outgoing packets.
skb_network_offset() of af_packet's outgoing packets is not zero.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1c4cff0cf5 ]
The gnet_stats_copy_app() function gets called, more often than not, with its
second argument a pointer to an automatic variable in the caller's stack.
Therefore, to avoid copying garbage afterwards when calling
gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
memory that gets freed after use.
[xiyou.wangcong@gmail.com: remove a useless kfree()]
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7afb8886a0 ]
Ignacy reported that when eth0 is down and add a vlan device
on top of it like:
ip link add link eth0 name eth0.1 up type vlan id 1
We will get a refcount leak:
unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2
The problem is when rtnl_configure_link() fails in rtnl_newlink(),
we simply call unregister_device(), but for stacked device like vlan,
we almost do nothing when we unregister the upper device, more work
is done when we unregister the lower device, so call its ->dellink().
Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e2a4800e75 ]
When we've run out of space in the output buffer to store more data, we
will call zlib_deflate with a NULL output buffer until we've consumed
remaining input.
When this happens, olen contains the size the output buffer would have
consumed iff we'd have had enough room.
This can later cause skb_over_panic when ppp_generic skb_put()s
the returned length.
Reported-by: Iain Douglas <centos@1n6.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fc752f1f43 ]
An exception is seen in ICMP ping receive path where the skb
destructor sock_rfree() tries to access a freed socket. This happens
because ping_rcv() releases socket reference with sock_put() and this
internally frees up the socket. Later icmp_rcv() will try to free the
skb and as part of this, skb destructor is called and which leads
to a kernel panic as the socket is freed already in ping_rcv().
-->|exception
-007|sk_mem_uncharge
-007|sock_rfree
-008|skb_release_head_state
-009|skb_release_all
-009|__kfree_skb
-010|kfree_skb
-011|icmp_rcv
-012|ip_local_deliver_finish
Fix this incorrect free by cloning this skb and processing this cloned
skb instead.
This patch was suggested by Eric Dumazet
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6088beef3f ]
NAPI poll logic now enforces that a poller returns exactly the budget
when it wants to be called again.
If a driver limits TX completion, it has to return budget as well when
the limit is hit, not the number of received packets.
Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: d75b1ade56 ("net: less interrupt masking in NAPI")
Cc: Manish Chopra <manish.chopra@qlogic.com>
Acked-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ac64da0b83 ]
softnet_data.input_pkt_queue is protected by a spinlock that
we must hold when transferring packets from victim queue to an active
one. This is because other cpus could still be trying to enqueue packets
into victim queue.
A second problem is that when we transfert the NAPI poll_list from
victim to current cpu, we absolutely need to special case the percpu
backlog, because we do not want to add complex locking to protect
process_queue : Only owner cpu is allowed to manipulate it, unless cpu
is offline.
Based on initial patch from Prasad Sodagudi & Subash Abhinov
Kasiviswanathan.
This version is better because we do not slow down packet processing,
only make migration safer.
Reported-by: Prasad Sodagudi <psodagud@codeaurora.org>
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f812116b17 ]
The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as
struct {
struct sock_extended_err ee;
struct sockaddr_in(6) offender;
} errhdr;
The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.
An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Upstream commit 44512449, "jfs: fix readdir cookie incompatibility
with NFSv4", was backported incorrectly into the stable trees which
used the filldir callback (rather than dir_emit). The position is
being incorrectly passed to filldir for the . and .. entries.
The still-maintained stable trees that need to be fixed are 3.2.y,
3.4.y and 3.10.y.
https://bugzilla.kernel.org/show_bug.cgi?id=94741
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 14977489ff upstream.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: This is not merely a cleanup but also fixes a regression introduced by
commit 3114ea7a24 ("NFSv4: Return the delegation if the server returns
NFS4ERR_OPENMODE"), backported in 3.2.14]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6a2a2b3ae0 upstream.
Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when
msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage
value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will
break old binaries and any code for which there is no access to source code.
To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland.
Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b01fc86b9 upstream.
This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.
This patch was mostly written by Linus Torvalds.
Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Drop the task_no_new_privs() and user namespace checks
- Open-code file_inode()
- s/READ_ONCE/ACCESS_ONCE/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6fd99094de upstream.
A local route may have a lower hop_limit set than global routes do.
RFC 3756, Section 4.2.7, "Parameter Spoofing"
> 1. The attacker includes a Current Hop Limit of one or another small
> number which the attacker knows will cause legitimate packets to
> be dropped before they reach their destination.
> As an example, one possible approach to mitigate this threat is to
> ignore very small hop limits. The nodes could implement a
> configurable minimum hop limit, and ignore attempts to set it below
> said limit.
Signed-off-by: D.S. Ljungmark <ljungmark@modio.se>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust ND_PRINTK() usage]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db27ebb111 upstream.
Max unacked packets/bytes is an int while sizeof(long) was used in the
sysctl table.
This means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b8d9117cc upstream.
The timeout entries are sizeof(int) rather than sizeof(long), which
means that when they were getting read we'd also leak kernel memory
to userspace along with the timeout values.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 223b02d923 upstream.
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3dc8523fa7 upstream.
Adds an entry for Creative USB X-Fi to the rc_config array in
mixer_quirks.c to allow use of volume knob on the device.
Adds support for newer X-Fi Pro card, known as "Model No. SB1095"
with USB ID "041e:3237"
Signed-off-by: Dmitry M. Fedin <dmitry.fedin@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 64b4e2526d upstream.
"ocfs2 syncs the wrong range" had been broken; prior to it the
code was doing the wrong thing in case of O_APPEND, all right,
but _after_ it we were syncing the wrong range in 100% cases.
*ppos, aka iocb->ki_pos is incremented prior to that point,
so we are always doing sync on the area _after_ the one we'd
written to.
Spotted by Joseph Qi <joseph.qi@huawei.com> back in January;
unfortunately, I'd missed his mail back then ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bba0bdd7ad upstream.
SCSI transport drivers and SCSI LLDs block a SCSI device if the
transport layer is not operational. This means that in this state
no requests should be processed, even if the REQ_PREEMPT flag has
been set. This patch avoids that a rescan shortly after a cable
pull sporadically triggers the following kernel oops:
BUG: unable to handle kernel paging request at ffffc9001a6bc084
IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib]
Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100)
Call Trace:
[<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp]
[<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp]
[<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod]
[<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod]
[<ffffffff81223b37>] __blk_run_queue+0x27/0x30
[<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110
[<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0
[<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod]
[<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod]
[<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod]
[<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod]
[<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod]
[<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod]
[<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod]
[<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod]
[<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160
[<ffffffff811589de>] vfs_write+0xce/0x140
[<ffffffff81158b53>] sys_write+0x53/0xa0
[<ffffffff81464592>] system_call_fastpath+0x16/0x1b
[<00007f611c9d9300>] 0x7f611c9d92ff
Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0c36820e2a upstream.
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d9 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8494057ab5 upstream.
Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.
Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.
This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.
Addresses: CVE-2014-8159
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 788211d81b upstream.
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:
* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free
The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.
Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80313b3078 upstream.
The ASRock Q1900DC-ITX mainboard (Baytrail-D) hangs randomly in
both BIOS and UEFI mode while rebooting unless reboot=pci is
used. Add a quirk to reboot via the pci method.
The problem is very intermittent and hard to debug, it might succeed
rebooting just fine 40 times in a row - but fails half a dozen times
the next day. It seems to be slightly less common in BIOS CSM mode
than native UEFI (with the CSM disabled), but it does happen in either
mode. Since I've started testing this patch in late january, rebooting
has been 100% reliable.
Most of the time it already hangs during POST, but occasionally it
might even make it through the bootloader and the kernel might even
start booting, but then hangs before the mode switch. The same symptoms
occur with grub-efi, gummiboot and grub-pc, just as well as (at least)
kernel 3.16-3.19 and 4.0-rc6 (I haven't tried older kernels than 3.16).
Upgrading to the most current mainboard firmware of the ASRock
Q1900DC-ITX, version 1.20, does not improve the situation.
( Searching the web seems to suggest that other Bay Trail-D mainboards
might be affected as well. )
--
Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/r/20150330224427.0fb58e42@mir
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e6d36a653b upstream.
This commit removes the reboot quirk originally added by commit
e19e074 ("x86: Fix reboot problem on VersaLogic Menlow boards").
Testing with a VersaLogic Ocelot (VL-EPMs-21a rev 1.00 w/ BIOS
6.5.102) revealed the following regarding the reboot hang
problem:
- v2.6.37 reboot=bios was needed.
- v2.6.38-rc1: behavior changed, reboot=acpi is needed,
reboot=kbd and reboot=bios results in system hang.
- v2.6.38: VersaLogic patch (e19e074 "x86: Fix reboot problem on
VersaLogic Menlow boards") was applied prior to v2.6.38-rc7. This
patch sets a quirk for VersaLogic Menlow boards that forces the use
of reboot=bios, which doesn't work anymore.
- v3.2: It seems that commit 660e34c ("x86: Reorder reboot method
preferences") changed the default reboot method to acpi prior to
v3.0-rc1, which means the default behavior is appropriate for the
Ocelot. No VersaLogic quirk is required.
The Ocelot board used for testing can successfully reboot w/out
having to pass any reboot= arguments for all 3 current versions
of the BIOS.
Signed-off-by: Michael D Labriola <michael.d.labriola@gmail.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Michael D Labriola <mlabriol@gdeb.com>
Cc: Kushal Koolwal <kushalkoolwal@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/87vcnub9hu.fsf@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af95b41426 upstream.
We have a HP machine which use the codec node 0x17 connecting the
internal speaker, and from the node capability, we saw the EAPD,
if we don't set the EAPD on for this node, the internal speaker
can't output any sound.
BugLink: https://bugs.launchpad.net/bugs/1436745
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98cf21c61a upstream.
Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated. This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.
The resulting first leaf node is correct:
----------------------------------------------------
| key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
----------------------------------------------------
But the parent index key0 still contains the previous key CNID=123:
-----------------------
| key0.CNID=123 | ... |
-----------------------
A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().
Along the way, I removed duplicate code with unification of the if
condition. The resulting code is equivalent to the original code
because node is never 0.
Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value. However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch. brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3fe89b3e2a upstream.
I have constantly stumbled upon "kernel BUG at mm/rmap.c:399!" after
upgrading to 3.19 and had no luck with 4.0-rc1 neither.
So, after looking into new logic introduced by commit 7a3ef208e6 ("mm:
prevent endless growth of anon_vma hierarchy"), I found chances are that
unlink_anon_vmas() is called without incrementing dst->anon_vma->degree
in anon_vma_clone() due to allocation failure. If dst->anon_vma is not
NULL in error path, its degree will be incorrectly decremented in
unlink_anon_vmas() and eventually underflow when exiting as a result of
another call to unlink_anon_vmas(). That's how "kernel BUG at
mm/rmap.c:399!" is triggered for me.
This patch fixes the underflow by dropping dst->anon_vma when allocation
fails. It's safe to do so regardless of original value of dst->anon_vma
because dst->anon_vma doesn't have valid meaning if anon_vma_clone()
fails. Besides, callers don't care dst->anon_vma in such case neither.
Also suggested by Michal Hocko, we can clean up vma_adjust() a bit as
anon_vma_clone() now does the work.
[akpm@linux-foundation.org: tweak comment]
Fixes: 7a3ef208e6 ("mm: prevent endless growth of anon_vma hierarchy")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6436a123a1 upstream.
Return a negative error value like the rest of the entries in this function.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: tweaked subject line]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b229a0f840 upstream.
This patch uses the existing CALAO Systems ftdi_8u2232c_probe in order
to avoid attaching a TTY to the JTAG port as this board is based on the
CALAO Systems reference design and needs the same fix up.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
[johan: clean up probe logic ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 227a4fd801 upstream.
When a device with an isochronous endpoint is plugged into the Intel
xHCI host controller, and the driver submits multiple frames per URB,
the xHCI driver will set the Block Event Interrupt (BEI) flag on all
but the last TD for the URB. This causes the host controller to place
an event on the event ring, but not send an interrupt. When the last
TD for the URB completes, BEI is cleared, and we get an interrupt for
the whole URB.
However, under Intel xHCI host controllers, if the event ring is full
of events from transfers with BEI set, an "Event Ring is Full" event
will be posted to the last entry of the event ring, but no interrupt
is generated. Host will cease all transfer and command executions and
wait until software completes handling the pending events in the event
ring. That means xHC stops, but event of "event ring is full" is not
notified. As the result, the xHC looks like dead to user.
This patch is to apply XHCI_AVOID_BEI quirk to Intel xHC devices. And
it should be backported to kernels as old as 3.0, that contains the
commit 69e848c209 ("Intel xhci: Support EHCI/xHCI port switching.").
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Alistair Grant <akgrant0710@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9425183d17 upstream.
Linux xHCI driver doesn't report and handle port cofig error change.
If Port Configure Error for root hub port occurs, CEC bit in PORTSC
would be set by xHC and remains 1. This happends when the root port
fails to configure its link partner, e.g. the port fails to exchange
port capabilities information using Port Capability LMPs.
Then the Port Status Change Events will be blocked until all status
change bits(CEC is one of the change bits) are cleared('0') (refer to
xHCI spec 4.19.2). Otherwise, the port status change event for this
root port will not be generated anymore, then root port would look
like dead for user and can't be recovered until a Host Controller
Reset(HCRST).
This patch is to check CEC bit in PORTSC in xhci_get_port_status()
and set a Config Error in the return status if CEC is set. This will
cause a ClearPortFeature request, where CEC bit is cleared in
xhci_clear_port_change_bit().
[The commit log is based on initial Marvell patch posted at
http://marc.info/?l=linux-kernel&m=142323612321434&w=2]
Reported-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Fix indentation
- s/raw_port_status/temp/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c72efb658f upstream.
From 1ebf33901ecc75d9496862dceb1ef0377980587c Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 23 Mar 2015 00:08:19 -0400
2f800fbd77 ("writeback: fix dirtied pages accounting on redirty")
introduced account_page_redirty() which reverts stat updates for a
redirtied page, making BDI_DIRTIED no longer monotonically increasing.
bdi_update_write_bandwidth() uses the delta in BDI_DIRTIED as the
basis for bandwidth calculation. While unlikely, since the above
patch, the newer value may be lower than the recorded past value and
underflow the bandwidth calculation leading to a wild result.
Fix it by subtracing min of the old and new values when calculating
delta. AFAIK, there hasn't been any report of it happening but the
resulting erratic behavior would be non-critical and temporary, so
it's possible that the issue is happening without being reported. The
risk of the fix is very low, so tagged for -stable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Fixes: 2f800fbd77 ("writeback: fix dirtied pages accounting on redirty")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 746db9443e upstream.
When non-realtime tasks get priority-inheritance boosted to a realtime
scheduling class, RLIMIT_RTTIME starts to apply to them. However, the
counter used for checking this (the same one used for SCHED_RR
timeslices) was not getting reset. This meant that tasks running with a
non-realtime scheduling class which are repeatedly boosted to a realtime
one, but never block while they are running realtime, eventually hit the
timeout without ever running for a time over the limit. This patch
resets the realtime timeslice counter when un-PI-boosting from an RT to
a non-RT scheduling class.
I have some test code with two threads and a shared PTHREAD_PRIO_INHERIT
mutex which induces priority boosting and spins while boosted that gets
killed by a SIGXCPU on non-fixed kernels but doesn't with this patch
applied. It happens much faster with a CONFIG_PREEMPT_RT kernel, and
does happen eventually with PREEMPT_VOLUNTARY kernels.
Signed-off-by: Brian Silverman <brian@peloton-tech.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: austin@peloton-tech.com
Link: http://lkml.kernel.org/r/1424305436-6716-1-git-send-email-brian@peloton-tech.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d525211f9d upstream.
Vince reported a watchdog lockup like:
[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
[<ffffffff810b7f10>] lock_release+0x130/0x260
[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
[<ffffffff811f71ce>] send_sigio+0xae/0x100
[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
[<ffffffff8115d103>] perf_pending_event+0x33/0x60
[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
[<ffffffff8114e448>] irq_work_run+0x18/0x40
[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80
Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.
This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.
Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e1e9bda22d upstream.
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:
Thread 1 Thread 2
======== ========
inv_file = NULL
refind = 0
spin_lock(&cifs_file_list_lock)
// invalidHandle found on openFileList
inv_file = open_file
// inv_file->count currently 1
cifsFileInfo_get(inv_file)
// inv_file->count = 2
spin_unlock(&cifs_file_list_lock);
cifs_reopen_file() cifs_close()
// fails (rc != 0) ->cifsFileInfo_put()
spin_lock(&cifs_file_list_lock)
// inv_file->count = 1
spin_unlock(&cifs_file_list_lock)
spin_lock(&cifs_file_list_lock);
list_move_tail(&inv_file->flist,
&cifs_inode->openFileList);
spin_unlock(&cifs_file_list_lock);
cifsFileInfo_put(inv_file);
->spin_lock(&cifs_file_list_lock)
// inv_file->count = 0
list_del(&cifs_file->flist);
// cleanup!!
kfree(cifs_file);
spin_unlock(&cifs_file_list_lock);
spin_lock(&cifs_file_list_lock);
++refind;
// refind = 1
goto refind_writable;
At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 91edd096e2 upstream.
Commit db31c55a6f (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b965 (net: socket: error on a negative msg_namelen).
In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3ae0 net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7dabd (fold verify_iovec() into
copy_msghdr_from_user()).
This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().
Fixes: db31c55a6f (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: s/uaddr/tmp1/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 87f966d97b upstream.
On a MIPS Malta board, tons of fifo underflow errors have been observed
when using u-boot as bootloader instead of YAMON. The reason for that
is that YAMON used to set the pcnet device to SRAM mode but u-boot does
not. As a result, the default Tx threshold (64 bytes) is now too small to
keep the fifo relatively used and it can result to Tx fifo underflow errors.
As a result of which, it's best to setup the SRAM on supported controllers
so we can always use the NOUFLO bit.
Cc: <netdev@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: Don Fry <pcnet32@frontier.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e71fc8629 upstream.
Add USB VID/PID for Xircom PGMFHUB USB/serial component. (The hub and SCSI
bridge on that hardware are recognized out of the box by existing drivers.)
Tested VID/PID using new_id and loopback connection and was met with
success, but that's all the testing done.
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5f9f975b79 upstream.
Entrega is misspelled as Entregra or Entrgra, so fix that.
Signed-off-by: Mark Knibbs <markk@clara.co.uk>
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4899c054a9 upstream.
Synapse Wireless uses the FTDI VID with a custom PID of 0x9090 for their
SNAP Stick 200 product.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1b03ab5e8 upstream.
When an error occurred during event registration memory was freed twice
resulting in kernel memory corruption and a crash in unrelated code.
The problem was caused by
iio_device_unregister_eventset()
iio_device_unregister_sysfs()
being called twice, once on the error path and then
again via iio_dev_release().
Fix this by making these two functions idempotent so they
may be called multiple times.
The problem was observed before applying
78b33216 iio:core: Handle error when mask type is not separate
Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2:
- Adjust filenames, context
- Drop inapplicable change to iio_free_chan_devattr_list()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7d70e15480 upstream.
global_update_bandwidth() uses static variable update_time as the
timestamp for the last update but forgets to initialize it to
INITIALIZE_JIFFIES.
This means that global_dirty_limit will be 5 mins into the future on
32bit and some large amount jiffies into the past on 64bit. This
isn't critical as the only effect is that global_dirty_limit won't be
updated for the first 5 mins after booting on 32bit machines,
especially given the auxiliary nature of global_dirty_limit's role -
protecting against global dirty threshold's sudden dips; however, it
does lead to unintended suboptimal behavior. Fix it.
Fixes: c42843f2f0 ("writeback: introduce smoothed global dirty limit")
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 215a8fe419 upstream.
This patch fixes a NULL pointer dereference OOPs with pSCSI backends
within target_core_stat.c code. The bug is caused by a configfs attr
read if no pscsi_dev_virt->pdv_sd has been configured.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d556546e7e upstream.
This patch adds a missing set of conditional check braces in
ft_invl_hw_context() originally introduced by commit dcd998ccd
when handling DDP failures in ft_recv_write_data() code.
commit dcd998ccdb
Author: Kiran Patil <kiran.patil@intel.com>
Date: Wed Aug 3 09:20:01 2011 +0000
tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 61a3855bb7 upstream.
For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
overflow, according to the IB spec, we have to saturate a counter to its
max value, do that.
Fixes: c37791349c ('IB/mlx4: Support PMA counters for IBoE')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Open-code U32_MAX]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 496fcc294d upstream.
As HT/VHT depend heavily on QoS/WMM, it's not a good idea to
let userspace add clients that have HT/VHT but not QoS/WMM.
Since it does so in certain cases we've observed (client is
using HT IEs but not QoS/WMM) just ignore the HT/VHT info at
this point and don't pass it down to the drivers which might
unconditionally use it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- VHT is not supported]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccfe8c3f7e upstream.
The kernel crypto API logic requires the caller to provide the
length of (ciphertext || authentication tag) as cryptlen for the
AEAD decryption operation. Thus, the cipher implementation must
calculate the size of the plaintext output itself and cannot simply use
cryptlen.
The RFC4106 GCM decryption operation tries to overwrite cryptlen memory
in req->dst. As the destination buffer for decryption only needs to hold
the plaintext memory but cryptlen references the input buffer holding
(ciphertext || authentication tag), the assumption of the destination
buffer length in RFC4106 GCM operation leads to a too large size. This
patch simply uses the already calculated plaintext size.
In addition, this patch fixes the offset calculation of the AAD buffer
pointer: as mentioned before, cryptlen already includes the size of the
tag. Thus, the tag does not need to be added. With the addition, the AAD
will be written beyond the already allocated buffer.
Note, this fixes a kernel crash that can be triggered from user space
via AF_ALG(aead) -- simply use the libkcapi test application
from [1] and update it to use rfc4106-gcm-aes.
Using [1], the changes were tested using CAVS vectors to demonstrate
that the crypto operation still delivers the right results.
[1] http://www.chronox.de/libkcapi.html
CC: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 283ee1482f upstream.
According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time. The code path that caused the deadlock was
as follows:
nilfs_fill_super()
load_nilfs()
nilfs_salvage_orphan_logs()
* Do roll-forwarding, attach segment constructor for recovery,
and kick it.
nilfs_segctor_thread()
nilfs_segctor_thread_construct()
* A lock is held with nilfs_transaction_lock()
nilfs_segctor_do_construct()
nilfs_segctor_drop_written_files()
iput()
iput_final()
write_inode_now()
writeback_single_inode()
__writeback_single_inode()
do_writepages()
nilfs_writepage()
nilfs_construct_dsync_segment()
nilfs_transaction_lock() --> deadlock
This can happen if commit 7ef3ff2fea ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time. The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that. For
instance, we can reproduce the issue with the following steps:
< nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
# dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
# dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
count=1 && reboot -nfh
< the system will immediately reboot >
# mount -t nilfs2 /dev/sdb1 /nilfs
The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.
This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Yuxuan Shui <yshuiv7@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fcdcd1dec6 upstream.
The device complies to the UAC1 standard but hides that fact with
proprietary descriptors. The autodetect quirk for Roland devices
catches the audio interface but misses the MIDI part, so a specific
quirk is needed.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reported-by: Rafa Lafuente <rafalafuente@gmail.com>
Tested-by: Raphaël Doursenaud <raphael@doursenaud.fr>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit be3bb8236d upstream.
There was no check about the id string of user control elements, so we
accepted even a control element with an empty string, which is
obviously bogus. This patch adds more sanity checks of id strings.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3458390b9f upstream.
To take down the MOB and GMR memory types, the driver may have to issue
fence objects and thus make sure that the fence manager is taken down
after those memory types.
Reorder device init accordingly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
[bwh: Backported to 3.2:
- Adjust context
- Only the GMR memory type is used]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af6fc858a3 upstream.
Otherwise the guest can abuse that control to cause e.g. PCIe
Unsupported Request responses by disabling memory and/or I/O decoding
and subsequently causing (CPU side) accesses to the respective address
ranges, which (depending on system configuration) may be fatal to the
host.
Note that to alter any of the bits collected together as
PCI_COMMAND_GUEST permissive mode is now required to be enabled
globally or on the specific device.
This is CVE-2015-2150 / XSA-120.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[bwh: Backported to 3.2: also change type of permissive from int to bool]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4a18c8b1a upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07892b1035 upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eaddf6fd95 upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 24cc883c1f upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd14016fbf upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00a14c2968 upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8371aa0fe upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Paul Handrigan <Paul.Handrigan@cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08641d9b7b upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bf4c1d483 upstream.
The correct values referred by a boolean control are
value.integer.value[], not value.enumerated.item[].
The former is long while the latter is int, so it's even incompatible
on 64bit architectures.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 524a386825 upstream.
Some archs (specifically PowerPC), are sensitive with the ordering of
the enabling of the calls to function tracing and setting of the
function to use to be traced.
That is, update_ftrace_function() sets what function the ftrace_caller
trampoline should call. Some archs require this to be set before
calling ftrace_run_update_code().
Another bug was discovered, that ftrace_startup_sysctl() called
ftrace_run_update_code() directly. If the function the ftrace_caller
trampoline changes, then it will not be updated. Instead a call
to ftrace_startup_enable() should be called because it tests to see
if the callback changed since the code was disabled, and will
tell the arch to update appropriately. Most archs do not need this
notification, but PowerPC does.
The problem could be seen by the following commands:
# echo 0 > /proc/sys/kernel/ftrace_enabled
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /proc/sys/kernel/ftrace_enabled
# cat /sys/kernel/debug/tracing/trace
The trace will show that function tracing was not active.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1619dc3f8f upstream.
When ftrace is enabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_START_FUNC_RET command to ftrace_run_update_code(). Similarly, when
ftrace is disabled globally through the proc interface, we must check if
ftrace_graph_active is set. If it is set, then we should also pass the
FTRACE_STOP_FUNC_RET command to ftrace_run_update_code().
Consider the following situation.
# echo 0 > /proc/sys/kernel/ftrace_enabled
After this ftrace_enabled = 0.
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
Since ftrace_enabled = 0, ftrace_enable_ftrace_graph_caller() is never
called.
# echo 1 > /proc/sys/kernel/ftrace_enabled
Now ftrace_enabled will be set to true, but still
ftrace_enable_ftrace_graph_caller() will not be called, which is not
desired.
Further if we execute the following after this:
# echo nop > /sys/kernel/debug/tracing/current_tracer
Now since ftrace_enabled is set it will call
ftrace_disable_ftrace_graph_caller(), which causes a kernel warning on
the ARM platform.
On the ARM platform, when ftrace_enable_ftrace_graph_caller() is called,
it checks whether the old instruction is a nop or not. If it's not a nop,
then it returns an error. If it is a nop then it replaces instruction at
that address with a branch to ftrace_graph_caller.
ftrace_disable_ftrace_graph_caller() behaves just the opposite. Therefore,
if generic ftrace code ever calls either ftrace_enable_ftrace_graph_caller()
or ftrace_disable_ftrace_graph_caller() consecutively two times in a row,
then it will return an error, which will cause the generic ftrace code to
raise a warning.
Note, x86 does not have an issue with this because the architecture
specific code for ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() does not check the previous state,
and calling either of these functions twice in a row has no ill effect.
Link: http://lkml.kernel.org/r/e4fbe64cdac0dd0e86a3bf914b0f83c0b419f146.1425666454.git.panand@redhat.com
Signed-off-by: Pratyush Anand <panand@redhat.com>
[
removed extra if (ftrace_start_up) and defined ftrace_graph_active as 0
if CONFIG_FUNCTION_GRAPH_TRACER is not set.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 40c8790bcb upstream.
When the driver sets this rate a power of zero value is set causing
data flow stoppage until another rate is tried.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 969439016d upstream.
When accessing CAN network interfaces with AF_PACKET sockets e.g. by dhclient
this can lead to a skb_under_panic due to missing skb initialisations.
Add the missing initialisations at the CAN skbuff creation times on driver
level (rx path) and in the network layer (tx path).
Reported-by: Austin Schuh <austin@peloton-tech.com>
Reported-by: Daniel Steer <daniel.steer@mclaren.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to alloc_canfd_skb()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebc80840b8 upstream.
The Fimware 8.1 has a bug in which the extra buttons are only sent when the
ExtBit is 1. This should be fixed in a future FW update which should have
a bump of the minor version.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dc5465dc8a upstream.
On the X1 Carbon 3rd gen (with a 2015 broadwell cpu), the physical middle
button of the trackstick (attached to the touchpad serio device, of course)
seems to get lost.
Actually, the touchpads reports 3 extra buttons, which falls in the switch
below to the '2' case. Let's handle the case of odd numbers also, so that
the middle button finds its way back.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: open-code GENMASK()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e893286918 upstream.
On gcc5 the kernel does not link:
ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670.
Because prior GCC versions always emitted NOPs on ALIGN directives, but
gcc5 started omitting them.
.LSTARTFDEDLSI1 says:
/* HACK: The dwarf2 unwind routines will subtract 1 from the
return address to get an address in the middle of the
presumed call instruction. Since we didn't get here via
a call, we need to include the nop before the real start
to make up for it. */
.long .LSTART_sigreturn-1-. /* PC-relative start address */
But commit 69d0627a7f ("x86 vDSO: reorder vdso32 code") from 2.6.25
replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before
__kernel_sigreturn.
Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses
vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN".
So fix this by adding to that point at least a single NOP and make the
function ALIGN possibly with more NOPs then.
Kudos for reporting and diagnosing should go to Richard.
Reported-by: Richard Biener <rguenther@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da29370056 upstream.
EEH recovery for bnx2x based adapters is not reliable on all Power
systems using the default hot reset, which can result in an
unrecoverable EEH error. Forcing the use of fundamental reset
during EEH recovery fixes this.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eeb8a7e8bb upstream.
when multiport is off, virtio console invokes config access from irq
context, config access is blocking on s390.
Fix this up by scheduling work from config irq - similar to what we do
for multiport configs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to virtcons_freeze()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aa75ebc275 upstream.
Some APs experience problems when working with
U-APSD. Decreasing the probability of that
happening by using legacy mode for all ACs but VO
isn't enough.
Cisco 4410N originally forced us to enable VO by
default only because it treated non-VO ACs as
legacy.
However some APs (notably Netgear R7000) silently
reclassify packets to different ACs. Since u-APSD
ACs require trigger frames for frame retrieval
clients would never see some frames (e.g. ARP
responses) or would fetch them accidentally after
a long time.
It makes little sense to enable u-APSD queues by
default because it needs userspace applications to
be aware of it to actually take advantage of the
possible additional powersavings. Implicitly
depending on driver autotrigger frame support
doesn't make much sense.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6a4ed6fe0 upstream.
Some APs experience problems when working with U-APSD. Decrease the
probability of that happening by using legacy mode for all ACs but VO.
The AP that caused us troubles was a Cisco 4410N. It ignores our
setting, and always treats non-VO ACs as legacy.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e5db29806b upstream.
Since it's possible for the discard and write same queue limits to
change while the upper level command is being sliced and diced, fix up
both of them (a) to reject IO if the special command is unsupported at
the start of the function and (b) read the limits once and let the
commands error out on their own if the status happens to change.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2: adjust context; drop the write_same handling]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ab7c7bb6f4 upstream.
__dm_destroy() must take the suspend_lock so that its presuspend and
postsuspend calls do not race with an internal suspend.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aa991b3b26 upstream.
Regular pipe buffers' ->steal method (generic_pipe_buf_steal()) doesn't set
PG_uptodate.
Don't warn on this condition, just set the uptodate flag.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9d239d353c upstream.
The commit d297933cc7 (spi: dw: Fix detecting FIFO depth) tries to fix the
logic of the FIFO detection based on the description on the comments. However,
there is a slight difference between numbers in TX Level and TX FIFO size.
So, by specification the FIFO size would be in a range 2-256 bytes. From TX
Level prospective it means we can set threshold in the range 0-(FIFO size - 1)
bytes. Hence there are currently two issues:
a) FIFO size 2 bytes is actually skipped since TX Level is 1 bit and could be
either 0 or 1 byte;
b) FIFO size is incorrectly decreased by 1 which already done by meaning of
TX Level register.
This patch fixes it eventually right.
Fixes: d297933cc7 (spi: dw: Fix detecting FIFO depth)
Reviewed-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 528c943f3b upstream.
ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).
Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.
Fixes: fe5e7a1efb ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
The previous fix, 'gadgetfs: use-after-free in ->aio_read()',
missed one error path where the iovec needs to be freed.
This fix is not needed upstream as that error path was removed
by commit 7fe3976e0f ('gadget: switch ep_io_operations to
->read_iter/->write_iter').
Fixes: f01d35a15f ('gadgetfs: use-after-free in ->aio_read()')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f01d35a15f upstream.
AIO_PREAD requests call ->aio_read() with iovec on caller's stack, so if
we are going to access it asynchronously, we'd better get ourselves
a copy - the one on kernel stack of aio_run_iocb() won't be there
anymore. function/f_fs.c take care of doing that, legacy/inode.c
doesn't...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- Adjust filename, context
- Add kfree(priv->iv) to one additional failure path]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79fbf4a550 upstream.
Fix overflow bug in tty_wait_until_sent on 64-bit machines, where an
infinite timeout (0) would be passed to the underlying tty-driver's
wait_until_sent-operation as a negative timeout (-1), causing it to
return immediately.
This manifests itself for example as tcdrain() returning immediately,
drivers not honouring the drain flags when setting terminal attributes,
or even dropped data on close as a requested infinite closing-wait
timeout would be ignored.
The first symptom was reported by Asier LLANO who noted that tcdrain()
returned prematurely when using the ftdi_sio usb-serial driver.
Fix this by passing 0 rather than MAX_SCHEDULE_TIMEOUT (LONG_MAX) to the
underlying tty driver.
Note that the serial-core wait_until_sent-implementation is not affected
by this bug due to a lucky chance (comparison to an unsigned maximum
timeout), and neither is the cyclades one that had an explicit check for
negative timeouts, but all other tty drivers appear to be affected.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: ZIV-Asier Llano Palacios <asier.llano@cgglobal.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2c3fbe3cf2 upstream.
In case an infinite timeout (0) is requested, the irda wait_until_sent
implementation would use a zero poll timeout rather than the default
200ms.
Note that wait_until_sent is currently never called with a 0-timeout
argument due to a bug in tty_wait_until_sent.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30a22c215a upstream.
commit 6ae9200f2c ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust filename
- Use console_cmdline[i] instead of *c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0bf0bd079 upstream.
This problem was taken care of three times already in
* b0de59b573 (TTY: do not update
atime/mtime on read/write),
* 37b7f3c765 (TTY: fix atime/mtime
regression), and
* b0b885657b (tty: fix up atime/mtime
mess, take three)
But it still misses one point. As John Paul correctly points out, we
do not care about setting date. If somebody ever changes wall
time backwards (by mistake for example), tty timestamps are never
updated until the original wall time passes.
So check the absolute difference of times and if it large than "8
seconds or so", always update the time. That means we will update
immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
check, but it was always that way.
Thanks John for serving me this so nicely debugged.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-by: John Paul Perry <john_paul.perry@alcatel-lucent.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2e0ea8611 upstream.
I'm still receiving reports to my email address, so let's point this
at the linux-serial mailing list instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8cb91e058 upstream.
The xhci in Intel Sunrisepoint and Cherryview platforms need a driver
workaround for a Stuck PME that might either block PME events in suspend,
or create spurious PME events preventing runtime suspend.
Workaround is to clear a internal PME flag, BIT(28) in a vendor specific
PMCTRL register at offset 0x80a4, in both suspend resume callbacks
Without this, xhci connected usb devices might never be able to wake up the
system from suspend, or prevent device from going to suspend (xhci d3)
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 45ba2154d1 upstream.
When a control transfer has a short data stage, the xHCI controller generates
two transfer events: a COMP_SHORT_TX event that specifies the untransferred
amount, and a COMP_SUCCESS event. But when the data stage is not short, only the
COMP_SUCCESS event occurs. Therefore, xhci-hcd must set urb->actual_length to
urb->transfer_buffer_length while processing the COMP_SUCCESS event, unless
urb->actual_length was set already by a previous COMP_SHORT_TX event.
The driver checks this by seeing whether urb->actual_length == 0, but this alone
is the wrong test, as it is entirely possible for a short transfer to have an
urb->actual_length = 0.
This patch changes the xhci driver to rely on a new td->urb_length_set flag,
which is set to true when a COMP_SHORT_TX event is received and the URB length
updated at that stage.
This fixes a bug which affected the HSO plugin, which relies on URBs with
urb->actual_length == 0 to halt re-submitting the RX URB in the control
endpoint.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d51199a83a upstream.
DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be
set to 32.
The 64 was introduced by commit (in 2009):
a152ff24b9 ASoC: OMAP: Make DMA 64 aligned
But the dma_mask and coherent_dma_mask can not be used to specify alignment.
Fixes: a152ff24b9 (ASoC: OMAP: Make DMA 64 aligned)
Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: not using dma_coerce_mask_and_coherent()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e17cb1288 upstream.
i915.ko depends upon the acpi/video.ko module and so refuses to load if
ACPI is disabled at runtime if for example the BIOS is broken beyond
repair. acpi/video provides an optional service for i915.ko and so we
should just allow the modules to load, but do no nothing in order to let
the machines boot correctly.
Reported-by: Bill Augur <bill-auger@programmer.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Acked-by: Aaron Lu <aaron.lu@intel.com>
[ rjw: Fixed up the new comment in acpi_video_init() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6d65261a09 upstream.
eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.
One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.
This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.
https://bugzilla.kernel.org/show_bug.cgi?id=93691https://launchpad.net/bugs/1305335
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
[bwh: Backported to 3.2:
- Adjust context
- We don't have file_inode() so open-code the inode lookup]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c7d373c3f0 upstream.
This patch integrates Cyber Cortex AV boards with the existing
ftdi_jtag_quirk in order to use serial port 0 with JTAG which is
required by the manufacturers' software.
Steps: 2
[ftdi_sio_ids.h]
1. Defined the device PID
[ftdi_sio.c]
2. Added a macro declaration to the ids array, in order to enable the
jtag quirk for the device.
Signed-off-by: Max Mansfield <max.m.mansfield@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 957ed60b53 upstream.
Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
have a memory overrun issue:
Each b-tree node of nilfs2 stores a set of key-value pairs and the number
of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
a few other "bn_*" members.
Since the value of "bn_nchildren" is used for operations on the key-values
within the b-tree node, it can cause memory access overrun if a large
number is incorrectly set to "bn_nchildren".
For instance, nilfs_btree_node_lookup() function determines the range of
binary search with it, and too large "bn_nchildren" leads
nilfs_btree_node_get_key() in that function to overrun.
As for intermediate b-tree nodes, this is prevented by a sanity check
performed when each node is read from a drive, however, no sanity check
has been done for root nodes stored in inodes.
This patch fixes the issue by adding missing sanity check against b-tree
root nodes so that it's called when on-memory inodes are read from ifile,
inode metadata file.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 675af70856 upstream.
These device ID's are not associated with the cp210x module currently,
but should be. This patch allows the devices to operate upon connecting
them to the usb bus as intended.
Signed-off-by: Michiel van de Garde <mgparser@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c1c98a3bb upstream.
The current minstrel_ht rate control behavior is somewhat optimistic in
trying to find optimum TX rate. While this is usually fine for normal
Data frames, there are cases where a more conservative set of retry
parameters would be beneficial to make the connection more robust.
EAPOL frames are critical to the authentication and especially the
EAPOL-Key message 4/4 (the last message in the 4-way handshake) is
important to get through to the AP. If that message is lost, the only
recovery mechanism in many cases is to reassociate with the AP and start
from scratch. This can often be avoided by trying to send the frame with
more conservative rate and/or with more link layer retries.
In most cases, minstrel_ht is currently using the initial EAPOL-Key
frames for probing higher rates and this results in only five link layer
transmission attempts (one at high(ish) MCS and four at MCS0). While
this works with most APs, it looks like there are some deployed APs that
may have issues with the EAPOL frames using HT MCS immediately after
association. Similarly, there may be issues in cases where the signal
strength or radio environment is not good enough to be able to get
frames through even at couple of MCS 0 tries.
The best approach for this would likely to be to reduce the TX rate for
the last rate (3rd rate parameter in the set) to a low basic rate (say,
6 Mbps on 5 GHz and 2 or 5.5 Mbps on 2.4 GHz), but doing that cleanly
requires some more effort. For now, we can start with a simple one-liner
that forces the minimum rate to be used for EAPOL frames similarly how
the TX rate is selected for the IEEE 802.11 Management frames. This does
result in a small extra latency added to the cases where the AP would be
able to receive the higher rate, but taken into account how small number
of EAPOL frames are used, this is likely to be insignificant. A future
optimization in the minstrel_ht design can also allow this patch to be
reverted to get back to the more optimized initial TX rate.
It should also be noted that many drivers that do not use minstrel as
the rate control algorithm are already doing similar workarounds by
forcing the lowest TX rate to be used for EAPOL frames.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust the controlling if-statement to make
this work]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca4383a394 upstream.
Add missing error handling when registering the tty device at port
probe. This avoids trying to remove an uninitialised character device
when the port device is removed.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
[bwh: Backported to 3.2:
- Adjust context
- No need to clean up autopm]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07fdfc5e9f upstream.
Fix return value in probe error path, which could end up returning
success (0) on errors. This could in turn lead to use-after-free or
double free (e.g. in port_remove) when the port device is removed.
Fixes: c706ebdfc8 ("USB: usb-serial: call port_probe and port_remove
at the right times")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f6950344d3 upstream.
These product identifiers (PID) all deal with marine NMEA format data
used on motor boats and yachts. We supply the programmed devices to
Chetco, for use inside their equipment. The PIDs are a direct copy of
our Windows device drivers (FTDI drivers with altered PIDs).
Signed-off-by: Mark Glover <mark@actisense.com>
[johan: edit commit message slightly ]
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0c2b68198 upstream.
When a signal is delivered, the information in the siginfo structure
is copied to userspace. Good security practice dicatates that the
unused fields in this structure should be initialized to 0 so that
random kernel stack data isn't exposed to the user. This patch adds
such an initialization to the two places where usbfs raises signals.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Dave Mielke <dave@mielke.cc>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6596a926b0 upstream.
Include the high order bit fields for Max scratchpad buffers when
calculating how many scratchpad buffers are needed.
I'm suprised this hasn't caused more issues, we never allocated more than
32 buffers even if xhci needed more. Either we got lucky and xhci never
really used past that area, or then we got enough zeroed dma memory anyway.
Should be backported as far back as possible
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d720d8cec5 upstream.
With commit a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg), the
MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
changing the kernel compat behaviour from the one before the commit it
was trying to fix (1be374a051, net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg).
On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
the kernel). However, on a 64-bit kernel, the compat ABI is different
with commit a7526eb5d0.
This patch changes the compat_sys_{send,recv}msg behaviour to the one
prior to commit 1be374a051.
The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
but the general rule is not to break user ABI (even when the user
behaviour is not entirely sane).
Fixes: a7526eb5d0 (net: Unbreak compat_sys_{send,recv}msg)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ff6f8e61e upstream.
This has been broken for a long time: it broke first in 2.6.35, then was
almost fixed in 2.6.36 but this one-liner slipped through the cracks.
The bug shows up as an infinite loop in Windows 7 (and newer) boot on
32-bit hosts without EPT.
Windows uses CMPXCHG8B to write to page tables, which causes a
page fault if running without EPT; the emulator is then called from
kvm_mmu_page_fault. The loop then happens if the higher 4 bytes are
not 0; the common case for this is that the NX bit (bit 63) is 1.
Fixes: 6550e1f165
Fixes: 16518d5ada
Reported-by: Erik Rull <erik.rull@rdsoftware.de>
Tested-by: Erik Rull <erik.rull@rdsoftware.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70372a7566 upstream.
When a PCM draining is performed to an empty stream that has been
already in PREPARED state, the current code just ignores and leaves as
it is, although the drain is supposed to set all such streams to SETUP
state. This patch covers that overlooked case.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2f97c20e5f upstream.
The gpio_chip operations receive a pointer the gpio_chip struct which is
contained in the driver's private struct, yet the container_of call in those
functions point to the mfd struct defined in include/linux/mfd/tps65912.h.
Signed-off-by: Nicolas Saenz Julienne <nicolassaenzj@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5885ebda87 upstream.
A new fsync vs power fail test in xfstests indicated that XFS can
have unreliable data consistency when doing extending truncates that
require block zeroing. The blocks beyond EOF get zeroed in memory,
but we never force those changes to disk before we run the
transaction that extends the file size and exposes those blocks to
userspace. This can result in the blocks not being correctly zeroed
after a crash.
Because in-memory behaviour is correct, tools like fsx don't pick up
any coherency problems - it's not until the filesystem is shutdown
or the system crashes after writing the truncate transaction to the
journal but before the zeroed data in the page cache is flushed that
the issue is exposed.
Fix this by also flushing the dirty data in memory region between
the old size and new size when we've found blocks that need zeroing
in the truncate process.
Reported-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0db59e5929 upstream.
As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.
And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- Plumb in debugfs_super_operations, which we didn't previously define
- Call truncate_inode_pages() instead of truncate_inode_pages_final()
- Call end_writeback() instead of clear_inode()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fba04a9e0c upstream.
skb_copy_bits() returns zero on success and negative value on error,
so it is needed to invert the condition in ip_check_defrag().
Fixes: 1bf3751ec9 ("ipv4: ip_check_defrag must not modify skb before unsharing")
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1467559232 upstream.
The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.
A define of K(x) as
is defined in the code, but not used.
This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7eb71e0351 upstream.
It turns out it's possible to get __remove_osd() called twice on the
same OSD. That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory. One scenario that I was able to reproduce is as follows:
<osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
osd_reset()
<osdmap - osd3 down>
ceph_osdc_handle_map()
<takes map_sem>
kick_requests()
<takes request_mutex>
reset_changed_osds()
__reset_osd()
__remove_osd()
<releases request_mutex>
<releases map_sem>
<takes map_sem>
<takes request_mutex>
__kick_osd_requests()
__reset_osd()
__remove_osd() <-- !!!
A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().
Fixes: http://tracker.ceph.com/issues/8087
Cc: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e7c22d447 upstream.
The issue is that the stack for processes is not properly randomized on
64 bit architectures due to an integer overflow.
The affected function is randomize_stack_top() in file
"fs/binfmt_elf.c":
static unsigned long randomize_stack_top(unsigned long stack_top)
{
unsigned int random_variable = 0;
if ((current->flags & PF_RANDOMIZE) &&
!(current->personality & ADDR_NO_RANDOMIZE)) {
random_variable = get_random_int() & STACK_RND_MASK;
random_variable <<= PAGE_SHIFT;
}
return PAGE_ALIGN(stack_top) + random_variable;
return PAGE_ALIGN(stack_top) - random_variable;
}
Note that, it declares the "random_variable" variable as "unsigned int".
Since the result of the shifting operation between STACK_RND_MASK (which
is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
random_variable <<= PAGE_SHIFT;
then the two leftmost bits are dropped when storing the result in the
"random_variable". This variable shall be at least 34 bits long to hold
the (22+12) result.
These two dropped bits have an impact on the entropy of process stack.
Concretely, the total stack entropy is reduced by four: from 2^28 to
2^30 (One fourth of expected entropy).
This patch restores back the entropy by correcting the types involved
in the operations in the functions randomize_stack_top() and
stack_maxrandom_size().
The successful fix can be tested with:
$ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack]
7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack]
7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack]
7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack]
...
Once corrected, the leading bytes should be between 7ffc and 7fff,
rather than always being 7fff.
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Ismael Ripoll <iripoll@upv.es>
[ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: CVE-2015-1593
Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1fe89e1b6d upstream.
Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.
In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.
Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.
Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.
Reported-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Stefan Bader <stefan.bader@canonical.com>
Fixes: 8323f26ce3 ("sched: Fix race in task_group()")
Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 22aa66a3ee upstream.
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero. Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.
pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences s->origin. These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.
This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bec1f4a88 upstream.
The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.
dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.
There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.
To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 18c0b82a3e upstream.
This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.
These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78296c97ca upstream.
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.
Lets add an additional parameter to make sure storage is valid long
enough.
While we are at it, adds some const qualifiers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b4711757d upstream.
ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec63917 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37527b8692 upstream.
I created a dm-raid1 device backed by a device that supports DISCARD
and another device that does NOT support DISCARD with the following
dm configuration:
# echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 4K 1G 0
`-moo (dm-0) 0 4K 1G 0
sdb 0 0B 0B 0
`-moo (dm-0) 0 4K 1G 0
Notice that the mirror device /dev/mapper/moo advertises DISCARD
support even though one of the mirror halves doesn't.
If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
loop in do_region() when it tries to issue a DISCARD request to sdb.
The problem is that when we call do_region() against sdb, num_sectors
is set to zero because q->limits.max_discard_sectors is zero.
Therefore, "remaining" never decreases and the loop never terminates.
To fix this: before entering the loop, check for the combination of
REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
the mirror device.
This bug was found by the unfortunate coincidence of pvmove and a
discard operation in the RHEL 6.5 kernel; upstream is also affected.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2ed51ac64 upstream.
It may be possible that a device claims discard support but it rejects
discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3
filesystem driven by the ext4 driver. It may also happen if the
underlying devices are moved from one disk on another.
If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
not degrade the array.
This patch fixes failed test shell/lvconvert-repair-transient.sh in the
lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
filesystem and it is being driven by the ext4 driver.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0153c3d94 upstream.
RME RayDAT and AIO use a fixed buffer size of 16384 samples. With period
sizes of 32-4096, this translates to 4-512 periods.
The older RME cards have a variable buffer size but require exactly two
periods.
This patch enforces nperiods=2 on those cards.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9cb12d7b4c upstream.
For whatever reason, generic_access_phys() only remaps one page, but
actually allows to access arbitrary size. It's quite easy to trigger
large reads, like printing out large structure with gdb, which leads to a
crash. Fix it by remapping correct size.
Fixes: 28b2ee20c7 ("access_process_vm device memory infrastructure")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8138a67a55 upstream.
I noticed that "allowed" can easily overflow by falling below 0, because
(total_vm / 32) can be larger than "allowed". The problem occurs in
OVERCOMMIT_NONE mode.
In this case, a huge allocation can success and overcommit the system
(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
(system-wide), so system become unusable.
The problem was masked out by commit c9b1d0981f
("mm: limit growth of 3% hardcoded other user reserve"),
but it's easy to reproduce it on older kernels:
1) set overcommit_memory sysctl to 2
2) mmap() large file multiple times (with VM_SHARED flag)
3) try to malloc() large amount of memory
It also can be reproduced on newer kernels, but miss-configured
sysctl_user_reserve_kbytes is required.
Fix this issue by switching to signed arithmetic here.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: there is no 'reserved' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5703b087dc upstream.
I noticed, that "allowed" can easily overflow by falling below 0,
because (total_vm / 32) can be larger than "allowed". The problem
occurs in OVERCOMMIT_NONE mode.
In this case, a huge allocation can success and overcommit the system
(despite OVERCOMMIT_NONE mode). All subsequent allocations will fall
(system-wide), so system become unusable.
The problem was masked out by commit c9b1d0981f
("mm: limit growth of 3% hardcoded other user reserve"),
but it's easy to reproduce it on older kernels:
1) set overcommit_memory sysctl to 2
2) mmap() large file multiple times (with VM_SHARED flag)
3) try to malloc() large amount of memory
It also can be reproduced on newer kernels, but miss-configured
sysctl_user_reserve_kbytes is required.
Fix this issue by switching to signed arithmetic here.
[akpm@linux-foundation.org: use min_t]
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: there is no 'reserved' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0f792cf949 upstream.
When running the test which causes the race as shown in the previous patch,
we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
This race happens when pte turns into migration entry just after the first
check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
To fix this, we need to check pte_present() again after huge_ptep_get().
This patch also reorders taking ptl and doing pte_page(), because
pte_page() should be done in ptl. Due to this reordering, we need use
trylock_page() in page != pagecache_page case to respect locking order.
Fixes: 66aebce747 ("hugetlb: fix race condition in hugetlb_fault()")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Error label is named 'out_page_table_lock' not 'out_ptl']
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d4d4eda237 upstream.
On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
speedstep-smi driver sometimes loads and sometimes doesn't load with
"change to state X failed" message.
The hardware sometimes refuses to change frequency and in this case, we
need to retry later. I found out that we need to enable interrupts while
waiting. When we enable interrupts, the hardware blockage that prevents
frequency transition resolves and the transition is possible. With
disabled interrupts, the blockage doesn't resolve (no matter how long do
we wait). The exact reasons for this hardware behavior are unknown.
This patch enables interrupts in the function speedstep_set_state that can
be called with disabled interrupts. However, this function is called with
disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
any problem.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d8ba1f9714 upstream.
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.
Reported-by: David Ramos <daramos@stanford.edu>
Fixes: 4aece6a19c ("nfs41: cb_sequence xdr implementation")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6ee8e25fc3 upstream.
Commit e9fd702a58 ("audit: convert audit watches to use fsnotify
instead of inotify") broke handling of renames in audit. Audit code
wants to update inode number of an inode corresponding to watched name
in a directory. When something gets renamed into a directory to a
watched name, inotify previously passed moved inode to audit code
however new fsnotify code passes directory inode where the change
happened. That confuses audit and it starts watching parent directory
instead of a file in a directory.
This can be observed for example by doing:
cd /tmp
touch foo bar
auditctl -w /tmp/foo
touch foo
mv bar foo
touch foo
In audit log we see events like:
type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
...
type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
...
and that's it - we see event for the first touch after creating the
audit rule, we see events for rename but we don't see any event for the
last touch. However we start seeing events for unrelated stuff
happening in /tmp.
Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
FS_MOVED_TO events instead of the directory where the change happens.
This doesn't introduce any new problems because noone besides
audit_watch.c cares about the passed value:
fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
kernel/audit_tree.c doesn't care about passed 'data' at all.
kernel/audit_watch.c expects moved inode as 'data'.
Fixes: e9fd702a58 ("audit: convert audit watches to use fsnotify instead of inotify")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4940626de upstream.
The problem here is that we check:
if (dev >= SNDRV_CARDS)
Then we increment "dev".
if (!joystick_port[dev++])
Then we use it as an offset into a array with SNDRV_CARDS elements.
if (!request_region(joystick_port[dev], 8, "Riptide gameport")) {
This has 3 effects:
1) If you use the module option to specify the joystick port then it has
to be shifted one space over.
2) The wrong error message will be printed on failure if you have over
32 cards.
3) Static checkers will correctly complain that are off by one.
Fixes: db1005ec6f ('ALSA: riptide - Fix joystick resource handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 364d5716a7 upstream.
ifla_vf_policy[] is wrong in advertising its individual member types as
NLA_BINARY since .type = NLA_BINARY in combination with .len declares the
len member as *max* attribute length [0, len].
The issue is that when do_setvfinfo() is being called to set up a VF
through ndo handler, we could set corrupted data if the attribute length
is less than the size of the related structure itself.
The intent is exactly the opposite, namely to make sure to pass at least
data of minimum size of len.
Fixes: ebc08a6f47 ("rtnetlink: Add VF config code to rtnetlink")
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: drop the unsupported attributes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15e1ce3318 upstream.
A quirk of some older firmwares that report endpoint pipe type as PIPE_BULK
but the endpoint otheriwse functions as interrupt.
Check if usb_endpoint_type is USB_ENDPOINT_XFER_BULK and set as usb_rcvbulkpipe.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2:
- Adjust filename, context
- Add definition of the local variable 'd']
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37480a0568 upstream.
Commit 26df6d1340 ("tty: Add EXTPROC support for LINEMODE")
allows a process which has opened a pty master to send _any_ signal
to the process group of the pty slave. Although potentially
exploitable by a malicious program running a setuid program on
a pty slave, it's unknown if this exploit currently exists.
Limit to signals actually used.
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Howard Chu <hyc@symas.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 19e3ae6b4f upstream.
The vcs device's poll/fasync support relies on the vt notifier to signal
changes to the screen content. Notifier invocations were missing for
changes that comes through the selection interface though. Fix that.
Tested with BRLTTY 5.2.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Cc: Dave Mielke <dave@mielke.cc>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c99197902d upstream.
The usb_hcd_unlink_urb() routine in hcd.c contains two possible
use-after-free errors. The dev_dbg() statement at the end of the
routine dereferences urb and urb->dev even though both structures may
have been deallocated.
This patch fixes the problem by storing urb->dev in a local variable
(avoiding the dereference of urb) and moving the dev_dbg() up before
the usb_put_dev() call.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 074f9dd55f upstream.
Currently the USB stack assumes that all host controller drivers are
capable of receiving wakeup requests from downstream devices.
However, this isn't true for the isp1760-hcd driver, which means that
it isn't safe to do a runtime suspend of any device attached to a
root-hub port if the device requires wakeup.
This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure
and sets the flag in isp1760-hcd. The core is modified to prevent a
direct child of the root hub from being put into runtime suspend with
wakeup enabled if the flag is set.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e860a6e7a upstream.
Check the special CDC headers for a plausible minimum length.
Another big operating systems ignores such garbage.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Reviewed-by: Adam Lee <adam8157@gmail.com>
Tested-by: Adam Lee <adam8157@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6ffa30d3f7 upstream.
Bruce reported seeing this warning pop when mounting using v4.1:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 1121 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810ff58f>] prepare_to_wait+0x2f/0x90
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer ppdev joydev snd virtio_console virtio_balloon pcspkr serio_raw parport_pc parport pvpanic floppy soundcore i2c_piix4 virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring ata_generic virtio pata_acpi
CPU: 1 PID: 1121 Comm: nfsv4.1-svc Not tainted 3.19.0-rc4+ #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
0000000000000000 000000004e5e3f73 ffff8800b998fb48 ffffffff8186ac78
0000000000000000 ffff8800b998fba0 ffff8800b998fb88 ffffffff810ac9da
ffff8800b998fb68 ffffffff81c923e7 00000000000004d9 0000000000000000
Call Trace:
[<ffffffff8186ac78>] dump_stack+0x4c/0x65
[<ffffffff810ac9da>] warn_slowpath_common+0x8a/0xc0
[<ffffffff810aca65>] warn_slowpath_fmt+0x55/0x70
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810ff58f>] ? prepare_to_wait+0x2f/0x90
[<ffffffff810dd2ad>] __might_sleep+0xbd/0xd0
[<ffffffff8124c973>] kmem_cache_alloc_trace+0x243/0x430
[<ffffffff810d941e>] ? groups_alloc+0x3e/0x130
[<ffffffff810d941e>] groups_alloc+0x3e/0x130
[<ffffffffa0301b1e>] svcauth_unix_accept+0x16e/0x290 [sunrpc]
[<ffffffffa0300571>] svc_authenticate+0xe1/0xf0 [sunrpc]
[<ffffffffa02fc564>] svc_process_common+0x244/0x6a0 [sunrpc]
[<ffffffffa02fd044>] bc_svc_process+0x1c4/0x260 [sunrpc]
[<ffffffffa03d5478>] nfs41_callback_svc+0x128/0x1f0 [nfsv4]
[<ffffffff810ff970>] ? wait_woken+0xc0/0xc0
[<ffffffffa03d5350>] ? nfs4_callback_svc+0x60/0x60 [nfsv4]
[<ffffffff810d45bf>] kthread+0x11f/0x140
[<ffffffff810ea815>] ? local_clock+0x15/0x30
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
[<ffffffff81874bfc>] ret_from_fork+0x7c/0xb0
[<ffffffff810d44a0>] ? kthread_create_on_node+0x250/0x250
---[ end trace 675220a11e30f4f2 ]---
nfs41_callback_svc does most of its work while in TASK_INTERRUPTIBLE,
which is just wrong. Fix that by finishing the wait immediately if we've
found that the list has something on it.
Also, we don't expect this kthread to accept signals, so we should be
using a TASK_UNINTERRUPTIBLE sleep instead. That however, opens us up
hung task warnings from the watchdog, so have the schedule_timeout
wake up every 60s if there's no callback activity.
Reported-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5ae711a246 upstream.
If ib_query_qp() fails or the memory registration mode isn't
supported, don't leak the PD. An orphaned IB/core resource will
cause IB module removal to hang.
Fixes: bd7ed1d133 ("RPC/RDMA: check selected memory registration ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[bwh: Backported to 3.2:
- Adjust context
- There are only 2 goto's to be changed, not 3]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e461894dc2 upstream.
StrongARM core uses RCSR SMR bit to tell to bootloader that it was reset
by entering the sleep mode. After we have resumed, there is little point
in having that bit enabled. Moreover, if this bit is set before reboot,
the bootloader can become confused. Thus clear the SMR bit on resume
just before clearing the scratchpad (resume address) register.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 42b8ce6f55 upstream.
`do_cmd_ioctl()` in "comedi_fops.c" handles the `COMEDI_CMD` ioctl.
This returns `-EAGAIN` if it has copied a modified `struct comedi_cmd`
back to user-space. (This occurs when the low-level Comedi driver's
`do_cmdtest()` handler returns non-zero to indicate a problem with the
contents of the `struct comedi_cmd`, or when the `struct comedi_cmd` has
the `CMDF_BOGUS` flag set.)
`compat_cmd()` in "comedi_compat32.c" handles the 32-bit compatible
version of the `COMEDI_CMD` ioctl. Currently, it never copies a 32-bit
compatible version of `struct comedi_cmd` back to user-space, which is
at odds with the way the regular `COMEDI_CMD` ioctl is handled. To fix
it, change `compat_cmd()` to copy a 32-bit compatible version of the
`struct comedi_cmd` back to user-space when the main ioctl handler
returns `-EAGAIN`.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 19e353f2b3 upstream.
The intention is obviously to sign-extend a 12 bit quantity. But
because of C's promotion rules, the assignment is equivalent to "val16
&= 0xfff;". Use the proper API for this.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6f0331236 upstream.
Added the USB serial console device ID for Siemens Ruggedcom devices
which have a USB port for their serial console.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ac96caf0f upstream.
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.
This patch changes our hrtimer to be based on a monotonic clock.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6d1cff2a88 upstream.
We hit use after free on dereferncing pointer to task_smack struct in
smk_of_task() called from smack_task_to_inode().
task_security() macro uses task_cred_xxx() to get pointer to the task_smack.
task_cred_xxx() could be used only for non-pointer members of task's
credentials. It cannot be used for pointer members since what they point
to may disapper after dropping RCU read lock.
Mainly task_security() used this way:
smk_of_task(task_security(p))
Intead of this introduce function smk_of_task_struct() which
takes task_struct as argument and returns pointer to smk_known struct
and do this under RCU read lock.
Bogus task_security() macro is not used anymore, so remove it.
KASan's report for this:
AddressSanitizer: use after free in smack_task_to_inode+0x50/0x70 at addr c4635600
=============================================================================
BUG kmalloc-64 (Tainted: PO): kasan error
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in new_task_smack+0x44/0xd8 age=39 cpu=0 pid=1866
kmem_cache_alloc_trace+0x88/0x1bc
new_task_smack+0x44/0xd8
smack_cred_prepare+0x48/0x21c
security_prepare_creds+0x44/0x4c
prepare_creds+0xdc/0x110
smack_setprocattr+0x104/0x150
security_setprocattr+0x4c/0x54
proc_pid_attr_write+0x12c/0x194
vfs_write+0x1b0/0x370
SyS_write+0x5c/0x94
ret_fast_syscall+0x0/0x48
INFO: Freed in smack_cred_free+0xc4/0xd0 age=27 cpu=0 pid=1564
kfree+0x270/0x290
smack_cred_free+0xc4/0xd0
security_cred_free+0x34/0x3c
put_cred_rcu+0x58/0xcc
rcu_process_callbacks+0x738/0x998
__do_softirq+0x264/0x4cc
do_softirq+0x94/0xf4
irq_exit+0xbc/0x120
handle_IRQ+0x104/0x134
gic_handle_irq+0x70/0xac
__irq_svc+0x44/0x78
_raw_spin_unlock+0x18/0x48
sync_inodes_sb+0x17c/0x1d8
sync_filesystem+0xac/0xfc
vdfs_file_fsync+0x90/0xc0
vfs_fsync_range+0x74/0x7c
INFO: Slab 0xd3b23f50 objects=32 used=31 fp=0xc4635600 flags=0x4080
INFO: Object 0xc4635600 @offset=5632 fp=0x (null)
Bytes b4 c46355f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object c4635600: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635610: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635620: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object c4635630: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
Redzone c4635640: bb bb bb bb ....
Padding c46356e8: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding c46356f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 5 PID: 834 Comm: launchpad_prelo Tainted: PBO 3.10.30 #1
Backtrace:
[<c00233a4>] (dump_backtrace+0x0/0x158) from [<c0023dec>] (show_stack+0x20/0x24)
r7:c4634010 r6:d3b23f50 r5:c4635600 r4:d1002140
[<c0023dcc>] (show_stack+0x0/0x24) from [<c06d6d7c>] (dump_stack+0x20/0x28)
[<c06d6d5c>] (dump_stack+0x0/0x28) from [<c01c1d50>] (print_trailer+0x124/0x144)
[<c01c1c2c>] (print_trailer+0x0/0x144) from [<c01c1e88>] (object_err+0x3c/0x44)
r7:c4635600 r6:d1002140 r5:d3b23f50 r4:c4635600
[<c01c1e4c>] (object_err+0x0/0x44) from [<c01cac18>] (kasan_report_error+0x2b8/0x538)
r6:d1002140 r5:d3b23f50 r4:c6429cf8 r3:c09e1aa7
[<c01ca960>] (kasan_report_error+0x0/0x538) from [<c01c9430>] (__asan_load4+0xd4/0xf8)
[<c01c935c>] (__asan_load4+0x0/0xf8) from [<c031e168>] (smack_task_to_inode+0x50/0x70)
r5:c4635600 r4:ca9da000
[<c031e118>] (smack_task_to_inode+0x0/0x70) from [<c031af64>] (security_task_to_inode+0x3c/0x44)
r5:cca25e80 r4:c0ba9780
[<c031af28>] (security_task_to_inode+0x0/0x44) from [<c023d614>] (pid_revalidate+0x124/0x178)
r6:00000000 r5:cca25e80 r4:cbabe3c0 r3:00008124
[<c023d4f0>] (pid_revalidate+0x0/0x178) from [<c01db98c>] (lookup_fast+0x35c/0x43y4)
r9:c6429efc r8:00000101 r7:c079d940 r6:c6429e90 r5:c6429ed8 r4:c83c4148
[<c01db630>] (lookup_fast+0x0/0x434) from [<c01deec8>] (do_last.isra.24+0x1c0/0x1108)
[<c01ded08>] (do_last.isra.24+0x0/0x1108) from [<c01dff04>] (path_openat.isra.25+0xf4/0x648)
[<c01dfe10>] (path_openat.isra.25+0x0/0x648) from [<c01e1458>] (do_filp_open+0x3c/0x88)
[<c01e141c>] (do_filp_open+0x0/0x88) from [<c01ccb28>] (do_sys_open+0xf0/0x198)
r7:00000001 r6:c0ea2180 r5:0000000b r4:00000000
[<c01cca38>] (do_sys_open+0x0/0x198) from [<c01ccc00>] (SyS_open+0x30/0x34)
[<c01ccbd0>] (SyS_open+0x0/0x34) from [<c001db80>] (ret_fast_syscall+0x0/0x48)
Read of size 4 by thread T834:
Memory state around the buggy address:
c4635380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635400: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
c4635480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635500: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
c4635580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>c4635600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
c4635680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
c4635700: 00 00 00 00 04 fc fc fc fc fc fc fc fc fc fc fc
c4635780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
c4635800: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc
c4635880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
[bwh: Backported to 3.2:
- smk_of_task() and similar functions return char * not struct smack_known *
- The callers of task_security() are quite different, but most can be changed
to use smk_of_task_struct() just as in the upstream version
- Use open-coded RCU locking in the one place using smk_of_forked() instead
of smk_of_task()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 398a1e71dc upstream.
Add newly registered TPMs to the tail of the list, not the beginning, so that
things that are specifying TPM_ANY_NUM don't find that the device they're
using has inadvertently changed. Adding a second device would break IMA, for
instance.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5efd2ea8c9 upstream.
the following error pops up during "testusb -a -t 10"
| musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma)
hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of
size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in
hcd_buffer_alloc() returning memory which is 32 bytes aligned and it
might by identified by buffer_offset() as another buffer. This means the
buffer which is on a 32 byte boundary will not get freed, instead it
tries to free another buffer with the error message.
This patch fixes the issue by creating the smallest DMA buffer with the
size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is
smaller). This might be 32, 64 or even 128 bytes. The next three pools
will have the size 128, 512 and 2048.
In case the smallest pool is 128 bytes then we have only three pools
instead of four (and zero the first entry in the array).
The last pool size is always 2048 bytes which is the assumed PAGE_SIZE /
2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where
we would end up with 8KiB buffer in case we have 16KiB pages.
Instead I think it makes sense to have a common size(s) and extend them
if there is need to.
There is a BUILD_BUG_ON() now in case someone has a minalign of more than
128 bytes.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1399ff86f2 upstream.
We can place this in definitions that we expect the compiler to remove by
dead code elimination. If this assertion fails, we get a nice error
message at build time.
The GCC function attribute error("message") was added in version 4.3, so
we define a new macro __linktime_error(message) to expand to this for
GCC-4.3 and later. This will give us an error diagnostic from the
compiler on the line that fails. For other compilers
__linktime_error(message) expands to nothing, and we have to be content
with a link time error, but at least we will still get a build error.
BUILD_BUG() expands to the undefined function __build_bug_failed() and
will fail at link time if the compiler ever emits code for it. On GCC-4.3
and later, attribute((error())) is used so that the failure will be noted
at compile time instead.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: DM <dm.n9107@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 145b3fe579 upstream.
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
Commit 89ec3dcf17 ("PCI: Generate uppercase hex for modalias interface
class") fixed only half of the problem. Some udev implementations rely on
the uevent file and not the modalias file.
Fixes: d1ded203ad ("PCI: add MODALIAS to hotplug event for pci devices")
Fixes: 89ec3dcf17 ("PCI: Generate uppercase hex for modalias interface class")
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23b133bdc4 upstream.
Check length of extended attributes and allocation descriptors when
loading inodes from disk. Otherwise corrupted filesystems could confuse
the code and make the kernel oops.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.16: use make_bad_inode() instead of returning error]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7914495427 upstream.
Store blocksize in a local variable in udf_fill_inode() since it is used
a lot of times.
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Needed for the following fix. Backported to 3.16: adjust context.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a52d209336 upstream.
Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped
working. This patch enables the "replacement" for REGULATOR_DUMMY and
allows the touchscreen to work even though there is no regulator for "vcc".
Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit baad2dc49c upstream.
Add regulator_has_full_constraints() call to spitz board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.
This fixes the following warnings that can be seen on spitz if
regulators are enabled:
ads7846 spi2.0: unable to get regulator: -517
spi spi2.0: Driver ads7846 requests probe deferral
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9bc78f32c2 upstream.
Add regulator_has_full_constraints() call to poodle board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.
This fixes the following warnings that can be seen on poodle if
regulators are enabled:
ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 271e80176a upstream.
Add regulator_has_full_constraints() call to corgi board file to let
regulator core know that we do not have any additional regulators left.
This lets it substitute unprovided regulators with dummy ones.
This fixes the following warnings that can be seen on corgi if
regulators are enabled:
ads7846 spi1.0: unable to get regulator: -517
spi spi1.0: Driver ads7846 requests probe deferral
wm8731 0-001b: Failed to get supply 'AVDD': -517
wm8731 0-001b: Failed to request supplies: -517
wm8731 0-001b: ASoC: failed to probe component -517
corgi-audio corgi-audio: ASoC: failed to instantiate card -517
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c561a5753d upstream.
BugLink: https://bugs.launchpad.net/bugs/1400215
ath3k devices fail to load firmwares on xHCI buses, but work well on
EHCI, this might be a compatibility issue between xHCI and ath3k chips.
As my testing result, those chips will work on xHCI buses again with
this patch.
This workaround is from Qualcomm, they also did some workarounds in
Windows driver.
Signed-off-by: Adam Lee <adam.lee@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c26585458 upstream.
When the ipv6 fib changes during a table dump, the walk is
restarted and the number of nodes dumped are skipped. But the existing
code doesn't advance to the next node after a node is skipped. This can
cause the dump to loop or produce lots of duplicates when the fib
is modified during the dump.
This change advances the walk to the next node if the current node is
skipped after a restart.
Signed-off-by: Kumar Sundararajan <kumar@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa809e2fd6 upstream.
Commit 2bec5a369e (ipv6: fib: fix crash when changing large fib
while dumping it) introduced ability to restart the dump at tree root,
but failed to skip correctly a count of already dumped entries. Code
didn't match Patrick intent.
We must skip exactly the number of already dumped entries.
Note that like other /proc/net files or netlink producers, we could
still dump some duplicates entries.
Reported-by: Debabrata Banerjee <dbavatar@gmail.com>
Reported-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5e5aeb4367 upstream.
Verify that the frequency value from userspace is valid and makes sense.
Unverified values can cause overflows later on.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a8f29e89f2 upstream.
Userspace expects to see a long space before the first pulse is sent on
the lirc device. Currently, if a long time has passed and a new packet
is started, the lirc codec just returns and doesn't send anything. This
makes lircd ignore many perfectly valid signals unless they are sent in
quick sucession. When a reset event is delivered, we cannot know
anything about the duration of the space. But it should be safe to
assume it has been a long time and we just set the duration to maximum.
Signed-off-by: Austin Lund <austin.lund@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit be8e89087e upstream.
The hardware range code values and list of valid ranges for the AI
subdevice is incorrect for several supported boards. The hardware range
code values for all boards except PCI-DAS4020/12 is determined by
calling `ai_range_bits_6xxx()` based on the maximum voltage of the range
and whether it is bipolar or unipolar, however it only returns the
correct hardware range code for the PCI-DAS60xx boards. For
PCI-DAS6402/16 (and /12) it returns the wrong code for the unipolar
ranges. For PCI-DAS64/Mx/16 it returns the wrong code for all the
ranges and the comedi range table is incorrect.
Change `ai_range_bits_6xxx()` to use a look-up table pointed to by new
member `ai_range_codes` of `struct pcidas64_board` to map the comedi
range table indices to the hardware range codes. Use a new comedi range
table for the PCI-DAS64/Mx/16 boards (and the commented out variants).
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 84672369ff upstream.
Whenever a device is unregistered in vmbus_device_unregister (drivers/hv/vmbus_drv.c), the device name in the log message may contain garbage as the memory has already been freed by the time pr_info is called. Log example:
[ 3149.170475] hv_vmbus: child device àõsèè0_5 unregistered
By logging the message just before calling device_unregister, the correct device name is printed:
[ 3145.034652] hv_vmbus: child device vmbus_0_5 unregistered
Also changing register & unregister messages to debug to avoid unnecessarily cluttering the kernel log.
Signed-off-by: Fernando M Soto <fsoto@bluecatnetworks.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7ef3ff2fea upstream.
Nilfs2 eventually hangs in a stress test with fsstress program. This
issue was caused by the following deadlock over I_SYNC flag between
nilfs_segctor_thread() and writeback_sb_inodes():
nilfs_segctor_thread()
nilfs_segctor_thread_construct()
nilfs_segctor_unlock()
nilfs_dispose_list()
iput()
iput_final()
evict()
inode_wait_for_writeback() * wait for I_SYNC flag
writeback_sb_inodes()
* set I_SYNC flag on inode->i_state
__writeback_single_inode()
do_writepages()
nilfs_writepages()
nilfs_construct_dsync_segment()
nilfs_segctor_sync()
* wait for completion of segment constructor
inode_sync_complete()
* clear I_SYNC flag after __writeback_single_inode() completed
writeback_sb_inodes() calls do_writepages() for dirty inodes after
setting I_SYNC flag on inode->i_state. do_writepages() in turn calls
nilfs_writepages(), which can run segment constructor and wait for its
completion. On the other hand, segment constructor calls iput(), which
can call evict() and wait for the I_SYNC flag on
inode_wait_for_writeback().
Since segment constructor doesn't know when I_SYNC will be set, it
cannot know whether iput() will block or not unless inode->i_nlink has a
non-zero count. We can prevent evict() from being called in iput() by
implementing sop->drop_inode(), but it's not preferable to leave inodes
with i_nlink == 0 for long periods because it even defers file
truncation and inode deallocation. So, this instead resolves the
deadlock by calling iput() asynchronously with a workqueue for inodes
with i_nlink == 0.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23aaed6659 upstream.
walk_page_range() silently skips vma having VM_PFNMAP set, which leads
to undesirable behaviour at client end (who called walk_page_range).
Userspace applications get the wrong data, so the effect is like just
confusing users (if the applications just display the data) or sometimes
killing the processes (if the applications do something with
misunderstanding virtual addresses due to the wrong data.)
For example for pagemap_read, when no callbacks are called against
VM_PFNMAP vma, pagemap_read may prepare pagemap data for next virtual
address range at wrong index.
Eventually userspace may get wrong pagemap data for a task.
Corresponding to a VM_PFNMAP marked vma region, kernel may report
mappings from subsequent vma regions. User space in turn may account
more pages (than really are) to the task.
In my case I was using procmem, procrack (Android utility) which uses
pagemap interface to account RSS pages of a task. Due to this bug it
was giving a wrong picture for vmas (with VM_PFNMAP set).
Fixes: a9ff785e44 ("mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas")
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cfbf654efc upstream.
When making use of RFC5061, section 4.2.4. for setting the primary IP
address, we're passing a wrong parameter header to param_type2af(),
resulting always in NULL being returned.
At this point, param.p points to a sctp_addip_param struct, containing
a sctp_paramhdr (type = 0xc004, length = var), and crr_id as a correlation
id. Followed by that, as also presented in RFC5061 section 4.2.4., comes
the actual sctp_addr_param, which also contains a sctp_paramhdr, but
this time with the correct type SCTP_PARAM_IPV{4,6}_ADDRESS that
param_type2af() can make use of. Since we already hold a pointer to
addr_param from previous line, just reuse it for param_type2af().
Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Saran Maruti Ramanara <saran.neti@telus.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49d2ca84e4 upstream.
Fix memory leak in the gpio sysfs interface due to failure to drop
reference to device returned by class_find_device when setting the
gpio-line polarity.
Fixes: 0769746183 ("gpiolib: add support for changing value polarity
in sysfs")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0f303db08d upstream.
Fix memory leak in the gpio sysfs interface due to failure to drop
reference to device returned by class_find_device when creating a link.
Fixes: a4177ee7f1 ("gpiolib: allow exported GPIO nodes to be named
using sysfs links")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c7754e7510 upstream.
As printk() invocation can cause e.g. a TLB miss, printk() cannot be
called before the exception handlers have been properly initialized.
This can happen e.g. when netconsole has been loaded as a kernel module
and the TLB table has been cleared when a CPU was offline.
Call cpu_report() in start_secondary() only after the exception handlers
have been initialized to fix this.
Without the patch the kernel will randomly either lockup or crash
after a CPU is onlined and the console driver is a module.
Signed-off-by: Hemmo Nieminen <hemmo.nieminen@iki.fi>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/8953/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8997c27ec4 upstream.
src_net points to the netns where the netlink message has been received. This
netns may be different from the netns where the interface is created (because
the user may add IFLA_NET_NS_[PID|FD]). In this case, src_net is the link netns.
It seems wrong to override the netns in the newlink() handler because if it
was not already src_net, it means that the user explicitly asks to create the
netdevice in another netns.
CC: Sjur Brændeland <sjur.brandeland@stericsson.com>
CC: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Fixes: 8391c4aab1 ("caif: Bugfixes in CAIF netdevice for close and flow control")
Fixes: c412540063 ("caif-hsi: Add rtnl support")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: drop the change to caif_hsi change]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ce357795e upstream.
Fixed commit added from64to32 under _#ifndef do_csum_ but used it
under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
robot reported TILEGX's). Move from64to32 under the latter.
Fixes: 150ae0e946 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4161b4505f upstream.
When ak4114 work calls its callback and the callback invokes
ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding
this, control the reentrance by introducing a refcount. Also
flush_delayed_work() is replaced with cancel_delayed_work_sync().
The exactly same bug is present in ak4113.c and fixed as well.
Reported-by: Pavel Hofman <pavel.hofman@ivitera.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Tested-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: snd_ak411{3,4}_reinit were previously using
flush_delayed_work_sync() rather than flush_delayed_work()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a43bd7e125 upstream.
According to the I2S specification information as following:
- WS = 0, channel 1 (left)
- WS = 1, channel 2 (right)
So, the start event should be TF/RF falling edge.
Reported-by: Songjun Wu <songjun.wu@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3e6c1eff5 upstream.
If the irq_chip does not define .irq_disable, any call to disable_irq
will defer disabling the IRQ until it fires while marked as disabled.
This assumes that the handler function checks for this condition, which
handle_percpu_irq does not. In this case, calling disable_irq leads to
an IRQ storm, if the interrupt fires while disabled.
This optimization is only useful when disabling the IRQ is slow, which
is not true for the MIPS CPU IRQ.
Disable this optimization by implementing .irq_disable and .irq_enable
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8949/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When backporting commit 33692f2759 ('vm: add VM_FAULT_SIGSEGV
handling support') I didn't notice that it depended on a recent change
to the locking context of mm_fault_error() (commit 7fb08eca45,
'x86: mm: move mmap_sem unlock from mm_fault_error() to caller').
That isn't easily applicable to 3.2, so instead make sure we drop
mm->mmap_sem on the new branch of mm_fault_error().
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Commit 06cf35f903 ('PCI: Handle read-only BARs on AMD CS553x
devices') added the function quirk_io() which calls
pcibios_bus_to_resource().
Prior to Linux 3.14, pcibios_bus_to_resource() takes a pointer to
struct pci_dev and looks up the device's bus itself, so we need
to pass dev not dev->bus.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06cf35f903 upstream.
Some AMD CS553x devices have read-only BARs because of a firmware or
hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no
longer works after 36e8164882 ("PCI: Restore detection of read-only
BARs"). Prior to 36e8164882, we filled in res->start; afterwards we
leave it zeroed out. The quirk only updated the size, so the driver tried
to use a region starting at zero, which didn't work.
Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
hard-code the sizes.
On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882,
we interpret that as a 512-byte BAR based on the lowest-order bit set. Per
datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
avoid clearing any address bits if a platform uses only 64-byte alignment.
[bhelgaas: changelog, reduce BAR 2 size to 64]
Fixes: 36e8164882 ("PCI: Restore detection of read-only BARs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf
Reported-and-tested-by: Nix <nix@esperi.org.uk>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3747379ac upstream.
SYSENTER emulation is broken in several ways:
1. It misses the case of 16-bit code segments completely (CVE-2015-0239).
2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can
still be set without causing #GP).
3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in
legacy-mode.
4. There is some unneeded code.
Fix it.
Cc: stable@vger.linux.org
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a18a69b76 upstream.
If the guest thinks it's an AMD, it will not have prepared the SYSENTER MSRs,
and if the guest executes SYSENTER in compatibility mode, it will fails.
Detect this condition and #UD instead, like the spec says.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db29a9508a upstream.
Given following iptables ruleset:
-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.
This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).
All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.
Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.
[1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
Joint work with Daniel Borkmann.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
We need to check the position and size of file writes against various
limits, using generic_write_check(). This was not being done for
the splice write path. It was fixed upstream by commit 8d0207652c
("->splice_write() via ->write_iter()") but we can't apply that.
CVE-2014-7822
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
When backporting commit 4023bfc9f3 ("be careful with nd->inode in
path_init() and follow_dotdot_rcu()"), I failed to account for the
vfsmount_lock that is used in 3.2 but not upstream. path_init() takes
the lock if performing RCU lookup, but must drop it if (and only if)
it subsequently fails.
Reported-by: nuxi@vault24.org
References: https://bugzilla.kernel.org/show_bug.cgi?id=92531
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: nuxi@vault24.org
[ Upstream commit 2c26d34bbc ]
When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:
[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
These are reliably reproduced in a network topology of:
container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet. Note that the warning is
associated with the container eth0.
The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.
The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb. __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.
This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 05b0aa5793 ]
During driver load in tg3_init_one, if the driver detects DMA activity before
intializing the chip tg3_halt is called. As part of tg3_halt interrupts are
disabled using routine tg3_disable_ints. This routine was using mailbox value
which was not initialized (default value is 0). As a result driver was writing
0x00000001 to pci config space register 0, which is the vendor id / device id.
This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only
the Vendor ID to identify Configuration Request Retry). Also this issue is only
seen in older generation chipsets like 5722 because config space write to offset
0 from driver is possible. The newer generation chips ignore writes to offset 0.
Also without commit a7877b17a667, for these older chips when a GRC reset is
issued the Bootcode would reprogram the vendor id/device id, which is the reason
this bug was masked earlier.
Fixed by initializing the interrupt mailbox registers before calling tg3_halt.
Please queue for -stable.
Reported-by: Nils Holland <nholland@tisys.org>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Steven Rostedt reported:
> Porting -rt to the latest 3.2 stable tree I triggered this bug:
>
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> rm/1638 is trying to release lock (rcu_read_lock) at:
> [<c04fde6c>] rcu_read_unlock+0x0/0x23
> but there are no more locks to release!
>
> other info that might help us debug this:
> 2 locks held by rm/1638:
> #0: (&sb->s_type->i_mutex_key#9/1){+.+.+.}, at: [<c04f93eb>] do_rmdir+0x5f/0xd2
> #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c04f9329>] vfs_rmdir+0x49/0xac
>
> stack backtrace:
> Pid: 1638, comm: rm Not tainted 3.2.66-test-rt96+ #2
> Call Trace:
> [<c083f390>] ? printk+0x1d/0x1f
> [<c0463cdf>] print_unlock_inbalance_bug+0xc3/0xcd
> [<c04653a8>] lock_release_non_nested+0x98/0x1ec
> [<c046228d>] ? trace_hardirqs_off_caller+0x18/0x90
> [<c0456f1c>] ? local_clock+0x2d/0x50
> [<c04fde6c>] ? d_hash+0x2f/0x2f
> [<c04fde6c>] ? d_hash+0x2f/0x2f
> [<c046568e>] lock_release+0x192/0x1ad
> [<c04fde83>] rcu_read_unlock+0x17/0x23
> [<c04ff344>] shrink_dcache_parent+0x227/0x270
> [<c04f9348>] vfs_rmdir+0x68/0xac
> [<c04f9424>] do_rmdir+0x98/0xd2
> [<c04f03ad>] ? fput+0x1a3/0x1ab
> [<c084dd42>] ? sysenter_exit+0xf/0x1a
> [<c0465b58>] ? trace_hardirqs_on_caller+0x118/0x149
> [<c04fa3e0>] sys_unlinkat+0x2b/0x35
> [<c084dd13>] sysenter_do_call+0x12/0x12
>
>
>
>
> There's a path to calling rcu_read_unlock() without calling
> rcu_read_lock() in have_submounts().
>
> goto positive;
>
> positive:
> if (!locked && read_seqretry(&rename_lock, seq))
> goto rename_retry;
>
> rename_retry:
> rcu_read_unlock();
>
> in the above path, rcu_read_lock() is never done before calling
> rcu_read_unlock();
I reviewed locking contexts in all three functions that I changed when
backporting "deal with deadlock in d_walk()". It's actually worse
than this:
- We don't hold this_parent->d_lock at the 'positive' label in
have_submounts(), but it is unlocked after 'rename_retry'.
- There is an rcu_read_unlock() after the 'out' label in
select_parent(), but it's not held at the 'goto out'.
Fix all three lock imbalances.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
commit 2196937e12 upstream.
We could be reading 8 bytes into a 4 byte buffer here. It seems
harmless but adding a check is the right thing to do and it silences a
static checker warning.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3a8784454 upstream.
When a key is being garbage collected, it's key->user would get put before
the ->destroy() callback is called, where the key is removed from it's
respective tracking structures.
This leaves a key hanging in a semi-invalid state which leaves a window open
for a different task to try an access key->user. An example is
find_keyring_by_name() which would dereference key->user for a key that is
in the process of being garbage collected (where key->user was freed but
->destroy() wasn't called yet - so it's still present in the linked list).
This would cause either a panic, or corrupt memory.
Fixes CVE-2014-9529.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6424babfd6 upstream.
During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.
Multiple crash dumps showed:
The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself. As this is not the value of list upon entry to the
function, the kernel never exits the loop.
To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del. This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.
Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.
We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.
The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget. However, the code doesn't do the
__iget call on next_i
if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)
The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list. This makes the handling of next_i more closely match the
handling of the variable "inode."
The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.
During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate. Advance next_i in those cases where
__iget is not done.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Jan Kara <jack@suse.cz>
commit 3b56496865 upstream.
This adds the workaround for erratum 793 as a precaution in case not
every BIOS implements it. This addresses CVE-2013-6885.
Erratum text:
[Revision Guide for AMD Family 16h Models 00h-0Fh Processors,
document 51810 Rev. 3.04 November 2013]
793 Specific Combination of Writes to Write Combined Memory Types and
Locked Instructions May Cause Core Hang
Description
Under a highly specific and detailed set of internal timing
conditions, a locked instruction may trigger a timing sequence whereby
the write to a write combined memory type is not flushed, causing the
locked instruction to stall indefinitely.
Potential Effect on System
Processor core hang.
Suggested Workaround
BIOS should set MSR
C001_1020[15] = 1b.
Fix Planned
No fix planned
[ hpa: updated description, fixed typo in MSR name ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2:
- Adjust filename
- Venkatesh Srinivas pointed out we should use {rd,wr}msrl_safe() to
avoid crashing on KVM. This was fixed upstream by commit 8f86a7373a
("x86, AMD: Convert to the new bit access MSR accessors") but that's too
much trouble to backport. Here we must use {rd,wr}msrl_amd_safe().]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Moritz Muehlenhoff <jmm@debian.org>
Cc: Venkatesh Srinivas <venkateshs@google.com>
commit e512d56c79 upstream.
git commit 37f81fa1f6
"n_tty: do O_ONLCR translation as a single write"
surfaced a bug in the 3215 device driver. In combination this
broke tab expansion for tty ouput.
The cause is an asymmetry in the behaviour of tty3215_ops->write
vs tty3215_ops->put_char. The put_char function scans for '\t'
but the write function does not.
As the driver has logic for the '\t' expansion remove XTABS
from c_oflag of the initial termios as well.
Reported-by: Stephen Powell <zlinuxman@wowway.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7914900110 upstream.
It is reported that Samsung laptops that need to poll events are broken by
the following commit:
Commit 3afcf2ece4
Subject: ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set
The behaviors of the 2 vendor firmwares are conflict:
1. Acer: OSPM shouldn't issue QR_EC unless SCI_EVT is set, firmware
automatically sets SCI_EVT as long as there is event queued up.
2. Samsung: OSPM should issue QR_EC whatever SCI_EVT is set, firmware
returns 0 when there is no event queued up.
This patch is a quick fix to distinguish the behaviors to make Acer
behavior only effective for Acer EC firmware so that the breakages on
Samsung EC firmware can be avoided.
Fixes: 3afcf2ece4 (ACPI / EC: Add support to disallow QR_EC to be issued ...)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161
Reported-and-tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw : Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Kamal Mostafa <kamal@canonical.com>
This reverts commit e105c8187b which
was commit 72212675d1 upstream.
This caused suspend/resume to stop working on at least some systems -
specifically, the system would reboot when woken.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
This reverts commit a5c187d92d which
was commit 45e2a9d470 upstream.
The previous commit caused suspend/resume to stop working on at least
some systems - specifically, the system would reboot when woken.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
commit 9c145c56d0 upstream.
The stack guard page error case has long incorrectly caused a SIGBUS
rather than a SIGSEGV, but nobody actually noticed until commit
fee7e49d45 ("mm: propagate error from stack expansion even for guard
page") because that error case was never actually triggered in any
normal situations.
Now that we actually report the error, people noticed the wrong signal
that resulted. So far, only the test suite of libsigsegv seems to have
actually cared, but there are real applications that use libsigsegv, so
let's not wait for any of those to break.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 33692f2759 upstream.
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust filenames, context
- Drop arc, metag, nios2 and lustre changes
- For sh, patch both 32-bit and 64-bit implementations to use goto bad_area
- For s390, pass int_code and trans_exc_code as arguments to do_no_context()
and do_sigsegv()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 600ddd6825 upstream.
When hitting an INIT collision case during the 4WHS with AUTH enabled, as
already described in detail in commit 1be9a950c6 ("net: sctp: inherit
auth_capable on INIT collisions"), it can happen that we occasionally
still remotely trigger the following panic on server side which seems to
have been uncovered after the fix from commit 1be9a950c6 ...
[ 533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
[ 533.940559] PGD 5030f2067 PUD 0
[ 533.957104] Oops: 0000 [#1] SMP
[ 533.974283] Modules linked in: sctp mlx4_en [...]
[ 534.939704] Call Trace:
[ 534.951833] [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
[ 534.984213] [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
[ 535.015025] [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
[ 535.045661] [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
[ 535.074593] [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
[ 535.105239] [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
[ 535.138606] [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
[ 535.166848] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
... or depending on the the application, for example this one:
[ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
[ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
[ 1370.054568] PGD 633c94067 PUD 0
[ 1370.070446] Oops: 0000 [#1] SMP
[ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
[ 1370.963431] Call Trace:
[ 1370.974632] [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
[ 1371.000863] [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
[ 1371.027154] [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
[ 1371.054679] [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
[ 1371.080183] [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
With slab debugging enabled, we can see that the poison has been overwritten:
[ 669.826368] BUG kmalloc-128 (Tainted: G W ): Poison overwritten
[ 669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
[ 669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
[ 669.826424] __slab_alloc+0x4bf/0x566
[ 669.826433] __kmalloc+0x280/0x310
[ 669.826453] sctp_auth_create_key+0x23/0x50 [sctp]
[ 669.826471] sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
[ 669.826488] sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
[ 669.826505] sctp_do_sm+0x29d/0x17c0 [sctp] [...]
[ 669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
[ 669.826635] __slab_free+0x39/0x2a8
[ 669.826643] kfree+0x1d6/0x230
[ 669.826650] kzfree+0x31/0x40
[ 669.826666] sctp_auth_key_put+0x19/0x20 [sctp]
[ 669.826681] sctp_assoc_update+0x1ee/0x2d0 [sctp]
[ 669.826695] sctp_do_sm+0x674/0x17c0 [sctp]
Since this only triggers in some collision-cases with AUTH, the problem at
heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
when having refcnt 1, once directly in sctp_assoc_update() and yet again
from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
the already kzfree'd memory, which is also consistent with the observation
of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
at a later point in time when poison is checked on new allocation).
Reference counting of auth keys revisited:
Shared keys for AUTH chunks are being stored in endpoints and associations
in endpoint_shared_keys list. On endpoint creation, a null key is being
added; on association creation, all endpoint shared keys are being cached
and thus cloned over to the association. struct sctp_shared_key only holds
a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
keeps track of users internally through refcounting. Naturally, on assoc
or enpoint destruction, sctp_shared_key are being destroyed directly and
the reference on sctp_auth_bytes dropped.
User space can add keys to either list via setsockopt(2) through struct
sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
with refcount 1 and in case of replacement drops the reference on the old
sctp_auth_bytes. A key can be set active from user space through setsockopt()
on the id via sctp_auth_set_active_key(), which iterates through either
endpoint_shared_keys and in case of an assoc, invokes (one of various places)
sctp_auth_asoc_init_active_key().
sctp_auth_asoc_init_active_key() computes the actual secret from local's
and peer's random, hmac and shared key parameters and returns a new key
directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
the reference if there was a previous one. The secret, which where we
eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
intitial refcount of 1, which also stays unchanged eventually in
sctp_assoc_update(). This key is later being used for crypto layer to
set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
To close the loop: asoc->asoc_shared_key is freshly allocated secret
material and independant of the sctp_shared_key management keeping track
of only shared keys in endpoints and assocs. Hence, also commit 4184b2a79a
("net: sctp: fix memory leak in auth key management") is independant of
this bug here since it concerns a different layer (though same structures
being used eventually). asoc->asoc_shared_key is reference dropped correctly
on assoc destruction in sctp_association_free() and when active keys are
being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
to remove that sctp_auth_key_put() from there which fixes these panics.
Fixes: 730fc3d05c ("[SCTP]: Implete SCTP-AUTH parameter processing")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0767e95bb9 upstream.
When the last subscriber to a "Through" port has been removed, the
subscribed destination ports might still be active, so it would be
wrong to send "all sounds off" and "reset controller" events to them.
The proper place for such a shutdown would be the closing of the actual
MIDI port (and close_substream() in rawmidi.c already can do this).
This also fixes a deadlock when dummy_unuse() tries to send events to
its own port that is already locked because it is being freed.
Reported-by: Peter Billam <peter@www.pjb.com.au>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af1a7301c7 upstream.
When creating a fence for a tiled object, only fence the area that
makes up the actual tiles. The object may be larger than the tiled
area and if we allow those extra addresses to be fenced, they'll
get converted to addresses beyond where the object is mapped. This
opens up the possiblity of writes beyond the end of object.
To prevent this, we adjust the size of the fence to only encompass
the area that makes up the actual tiles. The extra space is considered
un-tiled and now behaves as if it was a linear object.
Testcase: igt/gem_tiled_fence_overflow
Reported-by: Dan Hettena <danh@ghs.com>
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[bwh: Backported to 3.2:
- Adjust context, indentation
- Apply to both i965_write_fence_reg() and sandybridge_write_fence_reg(),
which have been combined into one function upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd83ce9e61 upstream.
This patch adds a usb quirk to support devices with interupt endpoints
and bInterval values expressed as microframes. The quirk causes the
parse endpoint function to modify the reported bInterval to a standards
conforming value.
There is currently code in the endpoint parser that checks for
bIntervals that are outside of the valid range (1-16 for USB 2+ high
speed and super speed interupt endpoints). In this case, the code assumes
the bInterval is being reported in 1ms frames. As well, the correction
is only applied if the original bInterval value is out of the 1-16 range.
With this quirk applied to the device, the bInterval will be
accurately adjusted from microframes to an exponent.
Signed-off-by: James P Michels III <james.p.michels@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0fa7b39131 upstream.
In case userspace attempts to obtain key information for or delete a
unicast key, this is currently erroneously rejected unless the driver
sets the WIPHY_FLAG_IBSS_RSN flag. Apparently enough drivers do so it
was never noticed.
Fix that, and while at it fix a potential memory leak: the error path
in the get_key() function was placed after allocating a message but
didn't free it - move it to a better place. Luckily admin permissions
are needed to call this operation.
Fixes: e31b82136d ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3669ef9fa7 upstream.
The Witcher 2 did something like this to allocate a TLS segment index:
struct user_desc u_info;
bzero(&u_info, sizeof(u_info));
u_info.entry_number = (uint32_t)-1;
syscall(SYS_set_thread_area, &u_info);
Strictly speaking, this code was never correct. It should have set
read_exec_only and seg_not_present to 1 to indicate that it wanted
to find a free slot without putting anything there, or it should
have put something sensible in the TLS slot if it wanted to allocate
a TLS entry for real. The actual effect of this code was to
allocate a bogus segment that could be used to exploit espfix.
The set_thread_area hardening patches changed the behavior, causing
set_thread_area to return -EINVAL and crashing the game.
This changes set_thread_area to interpret this as a request to find
a free slot and to leave it empty, which isn't *quite* what the game
expects but should be close enough to keep it working. In
particular, using the code above to allocate two segments will
allocate the same segment both times.
According to FrostbittenKing on Github, this fixes The Witcher 2.
If this somehow still causes problems, we could instead allocate
a limit==0 32-bit data segment, but that seems rather ugly to me.
Fixes: 41bdc78544 x86/tls: Validate TLS entries to protect espfix
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d90d6d552 upstream.
Without this the aux port does not get detected, and consequently the touchpad
will not work.
With this patch the touchpad is detected:
$ dmesg | grep -E "(SYN|i8042|serio)"
pnp 00:03: Plug and Play ACPI device, IDs SYN1d22 PNP0f13 (active)
i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input4
psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1, id: 0x1e2b1, caps: 0xd00123/0x840300/0x126800, board id: 2863, fw id: 1473085
input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input6
dmidecode excerpt for this laptop is:
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Medion
Product Name: Akoya E7225
Version: 1.0
Signed-off-by: Jochen Hein <jochen@jochen.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ce75145267 upstream.
It is possible for ata_sff_flush_pio_task() to set ap->hsm_task_state to
HSM_ST_IDLE in between the time __ata_sff_port_intr() checks for HSM_ST_IDLE
and before it calls ata_sff_hsm_move() causing ata_sff_hsm_move() to BUG().
This problem is hard to reproduce making this patch hard to verify, but this
fix will prevent the race.
I have not been able to reproduce the problem, but here is a crash dump from
a 2.6.32 kernel.
On examining the ata port's state, its hsm_task_state field has a value of HSM_ST_IDLE:
crash> struct ata_port.hsm_task_state ffff881c1121c000
hsm_task_state = 0
Normally, this should not be possible as ata_sff_hsm_move() was called from ata_sff_host_intr(),
which checks hsm_task_state and won't call ata_sff_hsm_move() if it has a HSM_ST_IDLE value.
PID: 11053 TASK: ffff8816e846cae0 CPU: 0 COMMAND: "sshd"
#0 [ffff88008ba03960] machine_kexec at ffffffff81038f3b
#1 [ffff88008ba039c0] crash_kexec at ffffffff810c5d92
#2 [ffff88008ba03a90] oops_end at ffffffff8152b510
#3 [ffff88008ba03ac0] die at ffffffff81010e0b
#4 [ffff88008ba03af0] do_trap at ffffffff8152ad74
#5 [ffff88008ba03b50] do_invalid_op at ffffffff8100cf95
#6 [ffff88008ba03bf0] invalid_op at ffffffff8100bf9b
[exception RIP: ata_sff_hsm_move+317]
RIP: ffffffff813a77ad RSP: ffff88008ba03ca0 RFLAGS: 00010097
RAX: 0000000000000000 RBX: ffff881c1121dc60 RCX: 0000000000000000
RDX: ffff881c1121dd10 RSI: ffff881c1121dc60 RDI: ffff881c1121c000
RBP: ffff88008ba03d00 R8: 0000000000000000 R9: 000000000000002e
R10: 000000000001003f R11: 000000000000009b R12: ffff881c1121c000
R13: 0000000000000000 R14: 0000000000000050 R15: ffff881c1121dd78
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff88008ba03d08] ata_sff_host_intr at ffffffff813a7fbd
#8 [ffff88008ba03d38] ata_sff_interrupt at ffffffff813a821e
#9 [ffff88008ba03d78] handle_IRQ_event at ffffffff810e6ec0
>--- <IRQ stack> ---
[exception RIP: pipe_poll+48]
RIP: ffffffff81192780 RSP: ffff880f26d459b8 RFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff880f26d459c8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff881a0539fa80
RBP: ffffffff8100bb8e R8: ffff8803b23324a0 R9: 0000000000000000
R10: ffff880f26d45dd0 R11: 0000000000000008 R12: ffffffff8109b646
R13: ffff880f26d45948 R14: 0000000000000246 R15: 0000000000000246
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
RIP: 00007f26017435c3 RSP: 00007fffe020c420 RFLAGS: 00000206
RAX: 0000000000000017 RBX: ffffffff8100b072 RCX: 00007fffe020c45c
RDX: 00007f2604a3f120 RSI: 00007f2604a3f140 RDI: 000000000000000d
RBP: 0000000000000000 R8: 00007fffe020e570 R9: 0101010101010101
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffe020e5f0
R13: 00007fffe020e5f4 R14: 00007f26045f373c R15: 00007fffe020e5e0
ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
Somewhere between the ata_sff_hsm_move() check and the ata_sff_host_intr() check, the value changed.
On examining the other cpus to see what else was running, another cpu was running the error handler
routines:
PID: 326 TASK: ffff881c11014aa0 CPU: 1 COMMAND: "scsi_eh_1"
#0 [ffff88008ba27e90] crash_nmi_callback at ffffffff8102fee6
#1 [ffff88008ba27ea0] notifier_call_chain at ffffffff8152d515
#2 [ffff88008ba27ee0] atomic_notifier_call_chain at ffffffff8152d57a
#3 [ffff88008ba27ef0] notify_die at ffffffff810a154e
#4 [ffff88008ba27f20] do_nmi at ffffffff8152b1db
#5 [ffff88008ba27f50] nmi at ffffffff8152aaa0
[exception RIP: _spin_lock_irqsave+47]
RIP: ffffffff8152a1ff RSP: ffff881c11a73aa0 RFLAGS: 00000006
RAX: 0000000000000001 RBX: ffff881c1121deb8 RCX: 0000000000000000
RDX: 0000000000000246 RSI: 0000000000000020 RDI: ffff881c122612d8
RBP: ffff881c11a73aa0 R8: ffff881c17083800 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881c1121c000
R13: 000000000000001f R14: ffff881c1121dd50 R15: ffff881c1121dc60
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
>--- <NMI exception stack> ---
#6 [ffff881c11a73aa0] _spin_lock_irqsave at ffffffff8152a1ff
#7 [ffff881c11a73aa8] ata_exec_internal_sg at ffffffff81396fb5
#8 [ffff881c11a73b58] ata_exec_internal at ffffffff81397109
#9 [ffff881c11a73bd8] atapi_eh_request_sense at ffffffff813a34eb
Before it tried to acquire a spinlock, ata_exec_internal_sg() called ata_sff_flush_pio_task().
This function will set ap->hsm_task_state to HSM_ST_IDLE, and has no locking around setting this
value. ata_sff_flush_pio_task() can then race with the interrupt handler and potentially set
HSM_ST_IDLE at a fatal moment, which will trigger a kernel BUG.
v2: Fixup comment in ata_sff_flush_pio_task()
tj: Further updated comment. Use ap->lock instead of shost lock and
use the [un]lock_irq variant instead of the irqsave/restore one.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72dd299d50 upstream.
Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
"Since commit 8a4aeec8d "libata/ahci: accommodate tag ordered
controllers" the access to the harddisk on the first SATA-port is
failing on its first access. The access to the harddisk on the
second port is working normal.
When reverting the above commit, access to both harddisks is working
fine again."
Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Ronny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2061dcd6bf upstream.
I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.
Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.
The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.
strace from example application (shortened):
socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0
tcpdump before patch (fooling the application):
22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]
tcpdump after patch:
14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]
Looks like this bug is from the pre-git history museum. ;)
Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ebbeba120a upstream.
Fix attribute-creation race with userspace by using the default group
to create also the contingent gpio device attributes.
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2:
- Adjust filenames, context
- Use gpio_to_desc(), not gpiod_to_desc(), in gpio_is_visible()
- gpio_is_visible() must return mode_t]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0915e6feb3 upstream.
The gpio device attributes were never destroyed when the gpio was
unexported (or on export failures).
Use device_create_with_groups() to create the default device attributes
of the gpio class device. Note that this also fixes the
attribute-creation race with userspace for these attributes.
Remove contingent attributes in export error path and on unexport.
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc4e251499 upstream.
The gpio_export function uses nested if statements and the status
variable to handle the failure cases. This makes the function logic
difficult to follow. Refactor the code to abort immediately on failure
using goto. This makes the code slightly longer, but significantly
reduces the nesting and number of split lines and makes the code easier
to read.
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 121b6a7995 upstream.
The gpio-chip device attributes were never destroyed when the device was
removed.
Fix by using device_create_with_groups() to create the device attributes
of the chip class device.
Note that this also fixes the attribute-creation race with userspace.
Fixes: d8f388d8dc ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 39ef311204 upstream.
device_create_groups lets callers create devices as well as associated
sysfs attributes with a single call. This avoids race conditions seen
if sysfs attributes on new devices are created later.
[fixed up comment block placement and add checks for printk buffer
formats - gregkh]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2f37f58b1 upstream.
To make it easier for driver subsystems to work with attribute groups,
create the ATTRIBUTE_GROUPS macro to remove some of the repetitive
typing for the most common use for attribute groups.
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9b1087aa5e upstream.
When changing flags in the CAN drivers ctrlmode the provided new content has to
be checked whether the bits are allowed to be changed. The bits that are to be
changed are given as a bitfield in cm->mask. Therefore checking against
cm->flags is wrong as the content can hold any kind of values.
The iproute2 tool sets the bits in cm->mask and cm->flags depending on the
detected command line options. To be robust against bogus user space
applications additionally sanitize the provided flags with the provided mask.
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 237d28db03 upstream.
If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.
# modprobe jprobe_example.ko
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# ls
The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)
The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.
For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.
If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.
To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.
Some other updates:
Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).
Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.
Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5539b3c938 upstream.
Memory allocated and references taken by of_gpiochip_add and
acpi_gpiochip_add were never released on errors in gpiochip_add (e.g.
failure to find free gpio range).
Fixes: 391c970c0d ("of/gpio: add default of_xlate function if device
has a node pointer")
Fixes: 664e3e5ac6 ("gpio / ACPI: register to ACPI events
automatically")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[bwh: Backported to 3.2:
- Move call to of_gpiochip_add() into conditional section rather
than rearranging gotos and labels which are in different places
here
- There's no ACPI support]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e14dcf7cb upstream.
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.
Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.
Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.
Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2:
- Adjust filenames
- Drop changes to algorithms and drivers we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4943ba16bb upstream.
This adds the module loading prefix "crypto-" to the template lookup
as well.
For example, attempting to load 'vfat(blowfish)' via AF_ALG now correctly
includes the "crypto-" prefix at every level, correctly rejecting "vfat":
net-pf-38
algif-hash
crypto-vfat(blowfish)
crypto-vfat(blowfish)-all
crypto-vfat
Reported-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: drop changes to cmac and mcryptd which we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d26a105b5 upstream.
This prefixes all crypto module loading with "crypto-" so we never run
the risk of exposing module auto-loading to userspace via a crypto API,
as demonstrated by Mathias Krause:
https://lkml.org/lkml/2013/3/4/70
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2:
- Adjust filenames
- Drop changes to algorithms and drivers we don't have
- Add aliases to generic C implementations that didn't need them before]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b800c91a05 upstream.
Fix for BUG_ON(anon_vma->degree) splashes in unlink_anon_vmas() ("kernel
BUG at mm/rmap.c:399!") caused by commit 7a3ef208e6 ("mm: prevent
endless growth of anon_vma hierarchy")
Anon_vma_clone() is usually called for a copy of source vma in
destination argument. If source vma has anon_vma it should be already
in dst->anon_vma. NULL in dst->anon_vma is used as a sign that it's
called from anon_vma_fork(). In this case anon_vma_clone() finds
anon_vma for reusing.
Vma_adjust() calls it differently and this breaks anon_vma reusing
logic: anon_vma_clone() links vma to old anon_vma and updates degree
counters but vma_adjust() overrides vma->anon_vma right after that. As
a result final unlink_anon_vmas() decrements degree for wrong anon_vma.
This patch assigns ->anon_vma before calling anon_vma_clone().
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com>
Reported-and-tested-by: Oded Gabbay <oded.gabbay@amd.com>
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: vma_adjust() didn't use a variable to propagate
the error code from anon_vma_clone(); change that at the same time]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 690eac53da upstream.
Commit fee7e49d45 ("mm: propagate error from stack expansion even for
guard page") made sure that we return the error properly for stack
growth conditions. It also theorized that counting the guard page
towards the stack limit might break something, but also said "Let's see
if anybody notices".
Somebody did notice. Apparently android-x86 sets the stack limit very
close to the limit indeed, and including the guard page in the rlimit
check causes the android 'zygote' process problems.
So this adds the (fairly trivial) code to make the stack rlimit check be
against the actual real stack size, rather than the size of the vma that
includes the guard page.
Reported-and-tested-by: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 32a4bf2e81 upstream.
Use tty kref to release the fake tty in usb_console_setup to avoid use
after free if the underlying serial driver has acquired a reference.
Note that using the tty destructor release_one_tty requires some more
state to be initialised.
Fixes: 4a90f09b20 ("tty: usb-serial krefs")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5fb694f96e upstream.
When unloading the module 'g_hid.ko', the urb request will be dequeued and the
completion routine will be excuted. If there is no urb packet, the urb request
will not be added to the endpoint queue and the completion routine pointer in
urb request is NULL.
Accessing to this NULL function pointer will cause the Oops issue reported
below.
Add the code to check if the urb request is in the endpoint queue
or not. If the urb request is not in the endpoint queue, a negative
error code will be returned.
Here is the Oops log:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = dedf0000
[00000000] *pgd=3ede5831, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] ARM
Modules linked in: g_hid(-) usb_f_hid libcomposite
CPU: 0 PID: 923 Comm: rmmod Not tainted 3.18.0+ #2
Hardware name: Atmel SAMA5 (Device Tree)
task: df6b1100 ti: dedf6000 task.ti: dedf6000
PC is at 0x0
LR is at usb_gadget_giveback_request+0xc/0x10
pc : [<00000000>] lr : [<c02ace88>] psr: 60000093
sp : dedf7eb0 ip : df572634 fp : 00000000
r10: 00000000 r9 : df52e210 r8 : 60000013
r7 : df6a9858 r6 : df52e210 r5 : df6a9858 r4 : df572600
r3 : 00000000 r2 : ffffff98 r1 : df572600 r0 : df6a9868
Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c53c7d Table: 3edf0059 DAC: 00000015
Process rmmod (pid: 923, stack limit = 0xdedf6230)
Stack: (0xdedf7eb0 to 0xdedf8000)
7ea0: 00000000 c02adbbc df572580 deced608
7ec0: df572600 df6a9868 df572634 c02aed3c df577c00 c01b8608 00000000 df6be27c
7ee0: 00200200 00100100 bf0162f4 c000e544 dedf6000 00000000 00000000 bf010c00
7f00: bf0162cc bf00159c 00000000 df572980 df52e218 00000001 df5729b8 bf0031d0
[..]
[<c02ace88>] (usb_gadget_giveback_request) from [<c02adbbc>] (request_complete+0x64/0x88)
[<c02adbbc>] (request_complete) from [<c02aed3c>] (usba_ep_dequeue+0x70/0x128)
[<c02aed3c>] (usba_ep_dequeue) from [<bf010c00>] (hidg_unbind+0x50/0x7c [usb_f_hid])
[<bf010c00>] (hidg_unbind [usb_f_hid]) from [<bf00159c>] (remove_config.isra.6+0x98/0x9c [libcomposite])
[<bf00159c>] (remove_config.isra.6 [libcomposite]) from [<bf0031d0>] (__composite_unbind+0x34/0x98 [libcomposite])
[<bf0031d0>] (__composite_unbind [libcomposite]) from [<c02acee0>] (usb_gadget_remove_driver+0x50/0x78)
[<c02acee0>] (usb_gadget_remove_driver) from [<c02ad570>] (usb_gadget_unregister_driver+0x64/0x94)
[<c02ad570>] (usb_gadget_unregister_driver) from [<bf0160c0>] (hidg_cleanup+0x10/0x34 [g_hid])
[<bf0160c0>] (hidg_cleanup [g_hid]) from [<c0056748>] (SyS_delete_module+0x118/0x19c)
[<c0056748>] (SyS_delete_module) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
Code: bad PC value
Signed-off-by: Songjun Wu <songjun.wu@atmel.com>
[nicolas.ferre@atmel.com: reworked the commit message]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Fixes: 914a3f3b37 ("USB: add atmel_usba_udc driver")
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6785a10344 upstream.
When receive data, the RXRDY in status register set by hardware
after a new packet has been stored in the endpoint FIFO. When it
is copied from FIFO, this bit is cleared which make the FIFO can
be accessed again.
In the receive_data() function, this bit RXRDY has been cleared.
So, after the receive_data() function return, this bit should
not be cleared again, or else it may cause the accessing FIFO
corrupt, which will make the data loss.
Fixes: 914a3f3b37 (USB: add atmel_usba_udc driver)
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f40afdddeb upstream.
According to the datasheet, when transfer using DMA, the control
setting for IN packet only need END_BUF_EN, END_BUF_IE, CH_EN,
while for OUT packet, need more two bits END_TR_EN and END_TR_IE
to be configured.
Fixes: 914a3f3b37 (USB: add atmel_usba_udc driver)
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 56abcab833 upstream.
Commit 8dccddbc23 ("OHCI: final fix for NVIDIA problems (I hope)")
introduced into 3.1.9 broke boot on e.g. Freescale P2020DS development
board. The code path that was previously specific to NVIDIA controllers
had then become taken for all chips.
However, the M5237 installed on the board wedges solid when accessing
its base+OHCI_FMINTERVAL register, making it impossible to boot any
kernel newer than 3.1.8 on this particular and apparently other similar
machines.
Don't readl() and writel() base+OHCI_FMINTERVAL on PCI ID 10b9:5237.
The patch is suitable for the -next tree as well as all maintained
kernels up to 3.2 inclusive.
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 606185b20c upstream.
This is a static checker fix. We write some binary settings to the
sysfs file. One of the settings is the "->startup_profile". There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.
I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2: pyra_sysfs_write_settings() doesn't define a
settings variable, so write the cast-expression inline]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d6d7f9828 upstream.
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:
__set_page_dirty_nobuffers() __delete_from_page_cache()
if (TestSetPageDirty(page))
page->mapping = NULL
if (PageDirty())
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
if (page->mapping)
account_page_dirtied(page)
__inc_zone_page_state(page, NR_FILE_DIRTY);
__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.
Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held. The notable exception to this rule, though,
is do_wp_page(), for which this race exists. However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes. Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.
Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed. Remove it.
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Use VM_BUG_ON() rather than VM_BUG_ON_PAGE()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed6d7c8e57 upstream.
There's only one caller of set_page_dirty_balance() and that will call it
with page_mkwrite == 0.
The page_mkwrite argument was unused since commit b827e496c8 "mm: close
page_mkwrite races".
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7a3ef208e6 upstream.
Constantly forking task causes unlimited grow of anon_vma chain. Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.
This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one. It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two. As a result each anon_vma has either vma
or at least two descending anon_vmas. In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.
This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.
Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb493052 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 83b0302d34 upstream.
The regulator framework maintains a list of consumer regulators
for a regulator device and protects it from concurrent access using
the regulator device's mutex lock.
In the case of regulator_put() the consumer is removed and regulator
device's parameters are updated without holding the regulator device's
mutex. This would lead to a race condition between the regulator_put()
and any function which traverses the consumer list or modifies regulator
device's parameters.
Fix this race condition by holding the regulator device's mutex in case
of regulator_put.
Signed-off-by: Ashay Jaiswal <ashayj@codeaurora.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2:
- Adjust context
- Don't touch the comment; __regulator_put() has not been split out of
regulator_put() here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 148e9a711e upstream.
On some laptops, keyboard needs to be reset in order to successfully detect
touchpad (e.g., some Gigabyte laptop models with Elantech touchpads).
Without resettin keyboard touchpad pretends to be completely dead.
Based on the original patch by Mateusz Jończyk this version has been
expanded to include DMI based detection & application of the fix
automatically on the affected models of laptops. This has been confirmed to
fix problem by three users already on three different models of laptops.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81331
Signed-off-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com>
Acked-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Tested-by: Srihari Vijayaraghavan <linux.bug.reporting@gmail.com>
Tested by: Zakariya Dehlawi <zdehlawi@gmail.com>
Tested-by: Guillaum Bouchard <guillaum.bouchard@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6ada1fc0e1 upstream.
An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz <john.stultz@linaro.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4aaa71873d upstream.
DMA mapped IO should be unmapped on the error path in probe() and
unconditionally on remove().
Fixes: 62936009f3 ([libata] Add 460EX on-chip SATA driver, sata_dwc_460ex)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fee7e49d45 upstream.
Jay Foad reports that the address sanitizer test (asan) sometimes gets
confused by a stack pointer that ends up being outside the stack vma
that is reported by /proc/maps.
This happens due to an interaction between RLIMIT_STACK and the guard
page: when we do the guard page check, we ignore the potential error
from the stack expansion, which effectively results in a missing guard
page, since the expected stack expansion won't have been done.
And since /proc/maps explicitly ignores the guard page (commit
d7824370e2: "mm: fix up some user-visible effects of the stack guard
page"), the stack pointer ends up being outside the reported stack area.
This is the minimal patch: it just propagates the error. It also
effectively makes the guard page part of the stack limit, which in turn
measn that the actual real stack is one page less than the stack limit.
Let's see if anybody notices. We could teach acct_stack_growth() to
allow an extra page for a grow-up/grow-down stack in the rlimit test,
but I don't want to add more complexity if it isn't needed.
Reported-and-tested-by: Jay Foad <jay.foad@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1eb03f546 upstream.
The reason we defer kfree until release function is because it's a
general rule for kobjects: kfree of the reference counter itself is only
legal in the release function.
Previous patch didn't make this clear, document this in code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 63bd62a08c upstream.
A struct device which has just been unregistered can live on past the
point at which a driver decides to drop it's initial reference to the
kobject gained on allocation.
This implies that when releasing a virtio device, we can't free a struct
virtio_device until the underlying struct device has been released,
which might not happen immediately on device_unregister().
Unfortunately, this is exactly what virtio pci does:
it has an empty release callback, and frees memory immediately
after unregistering the device.
This causes an easy to reproduce crash if CONFIG_DEBUG_KOBJECT_RELEASE
it enabled.
To fix, free the memory only once we know the device is gone in the release
callback.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 67bf9cda4b upstream.
The FIFO size is 40 accordingly to the specifications, but this means 0x40,
i.e. 64 bytes. This patch fixes the typo and enables FIFO size autodetection
for Intel MID devices.
Fixes: 7063c0d942 (spi/dw_spi: add DMA support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d297933cc7 upstream.
Current code tries to find the highest valid fifo depth by checking the value
it wrote to DW_SPI_TXFLTR. There are a few problems in current code:
1) There is an off-by-one in dws->fifo_len setting because it assumes the latest
register write fails so the latest valid value should be fifo - 1.
2) We know the depth could be from 2 to 256 from HW spec, so it is not necessary
to test fifo == 257. In the case fifo is 257, it means the latest valid
setting is fifo = 256. So after the for loop iteration, we should check
fifo == 2 case instead of fifo == 257 if detecting the FIFO depth fails.
This patch fixes above issues.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Reviewed-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c507de88f6 upstream.
stac_store_hints() does utterly wrong for masking the values for
gpio_dir and gpio_data, likely due to copy&paste errors. Fortunately,
this feature is used very rarely, so the impact must be really small.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 9f871e8832, which
was commit 1485348d24 upstream.
It can cause connections to stall when a PMTU event occurs. This was
fixed by commit 843925f33f ("tcp: Do not apply TSO segment limit to
non-TSO packets") upstream, but that depends on other changes to TSO.
The original issue this fixed was a performance regression for the sfc
driver in extreme cases of TSO (skb with > 100 segments). This is not
really very important and it seems best to revert it rather than try
to fix it up.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: linux-net-drivers@solarflare.com
commit 90441b4dbe upstream.
Fixing typo for MeshConnect IDs. The original PID (0x8875) is not in
production and is not needed. Instead it has been changed to the
official production PID (0x8857).
Signed-off-by: Preston Fick <pffick@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30ea9c5218 upstream.
fb_deferred_io_fsync() returns the value of schedule_delayed_work() as
an error code, but schedule_delayed_work() does not return an error. It
returns true/false depending on whether the work was already queued.
Fix this by ignoring the return value of schedule_delayed_work().
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 92b004d1aa upstream.
If the probe of an fb driver has been deferred due to missing
dependencies, and the probe is later ran when a module is loaded, the
fbdev framework will try to find a logo to use.
However, the logos are __initdata, and have already been freed. This
causes sometimes page faults, if the logo memory is not mapped,
sometimes other random crashes as the logo data is invalid, and
sometimes nothing, if the fbdev decides to reject the logo (e.g. the
random value depicting the logo's height is too big).
This patch adds a late_initcall function to mark the logos as freed. In
reality the logos are freed later, and fbdev probe may be ran between
this late_initcall and the freeing of the logos. In that case we will
miss drawing the logo, even if it would be possible.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 796f2da81b upstream.
When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.
Fixes: 58e998c6d2 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- We don't support 802.1ad tag offload
- Keep passing protocol to harmonize_features()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e77bdebff upstream.
If a request is backlogged, it's complete() handler will get called
twice: once with -EINPROGRESS, and once with the final error code.
af_alg's complete handler, unlike other users, does not handle the
-EINPROGRESS but instead always completes the completion that recvmsg()
is waiting on. This can lead to a return to user space while the
request is still pending in the driver. If userspace closes the sockets
before the requests are handled by the driver, this will lead to
use-after-frees (and potential crashes) in the kernel due to the tfm
having been freed.
The crashes can be easily reproduced (for example) by reducing the max
queue length in cryptod.c and running the following (from
http://www.chronox.de/libkcapi.html) on AES-NI capable hardware:
$ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \
-k 00000000000000000000000000000000 \
-p 00000000000000000000000000000000 >/dev/null & done
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e237ec37ec upstream.
Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 394f56fe48 upstream.
The theory behind vdso randomization is that it's mapped at a random
offset above the top of the stack. To avoid wasting a page of
memory for an extra page table, the vdso isn't supposed to extend
past the lowest PMD into which it can fit. Other than that, the
address should be a uniformly distributed address that meets all of
the alignment requirements.
The current algorithm is buggy: the vdso has about a 50% probability
of being at the very end of a PMD. The current algorithm also has a
decent chance of failing outright due to incorrect handling of the
case where the top of the stack is near the top of its PMD.
This fixes the implementation. The paxtest estimate of vdso
"randomisation" improves from 11 bits to 18 bits. (Disclaimer: I
don't know what the paxtest code is actually calculating.)
It's worth noting that this algorithm is inherently biased: the vdso
is more likely to end up near the end of its PMD than near the
beginning. Ideally we would either nix the PMD sharing requirement
or jointly randomize the vdso and the stack to reduce the bias.
In the mean time, this is a considerable improvement with basically
no risk of compatibility issues, since the allowed outputs of the
algorithm are unchanged.
As an easy test, doing this:
for i in `seq 10000`
do grep -P vdso /proc/self/maps |cut -d- -f1
done |sort |uniq -d
used to produce lots of output (1445 lines on my most recent run).
A tiny subset looks like this:
7fffdfffe000
7fffe01fe000
7fffe05fe000
7fffe07fe000
7fffe09fe000
7fffe0bfe000
7fffe0dfe000
Note the suspicious fe000 endings. With the fix, I get a much more
palatable 76 repeated addresses.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
[bwh: Backported to 3.2:
- Adjust context
- The whole file is only built for x86_64; adjust comment for this]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0e5cc9a40a upstream.
Symlink reading code does not check whether the resulting path fits into
the page provided by the generic code. This isn't as easy as just
checking the symlink size because of various encoding conversions we
perform on path. So we have to check whether there is still enough space
in the buffer on the fly.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fef2e9f330 upstream.
Currently, we ignore symlink component of type 2. But mkisofs and other OS'
seem to treat it as / so do the same for compatibility.
Reported-by: "Gábor S." <otnaccess@hotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1d47b2629 upstream.
UDF specification allows arbitrarily large symlinks. However we support
only symlinks at most one block large. Check the length of the symlink
so that we don't access memory beyond end of the symlink block.
Reported-by: Carl Henrik Lunde <chlunde@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e159332b9a upstream.
Verify that inode size is sane when loading inode with data stored in
ICB. Otherwise we may get confused later when working with the inode and
inode size is too big.
Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: on error, call make_bad_inode() then return]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e2024624e upstream.
We didn't check length of rock ridge ER records before printing them.
Thus corrupted isofs image can cause us to access and print some memory
behind the buffer with obvious consequences.
Reported-and-tested-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 136f49b917 upstream.
For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
for the read lock of journal->j_trans_barrier. Holding page lock and
ask for journal->j_trans_barrier breaks the locking order.
This will cause a deadlock with journal commit threads, ocfs2cmt will
get write lock of journal->j_trans_barrier first, then it wakes up
kjournald2 to do the commit work, at last it waits until done. To
commit journal, kjournald2 needs flushing data first, it needs get the
cache page lock.
Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.
unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d70a1b9893 upstream.
The Arcam rPAC seems to have the same problem - whenever anything
(alsamixer, udevd, 3.9+ kernel from 60af3d037e, ..) attempts to
access mixer / control interface of the card, the firmware "locks up"
the entire device, resulting in
SNDRV_PCM_IOCTL_HW_PARAMS failed (-5): Input/output error
from alsa-lib.
Other operating systems can somehow read the mixer (there seems to be
playback volume/mute), but any manipulation is ignored by the device
(which has hardware volume controls).
Signed-off-by: Jiri Jaburek <jjaburek@redhat.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3fb2f4237b upstream.
It turns out that there's a lurking ABI issue. GCC, when
compiling this in a 32-bit program:
struct user_desc desc = {
.entry_number = idx,
.base_addr = base,
.limit = 0xfffff,
.seg_32bit = 1,
.contents = 0, /* Data, grow-up */
.read_exec_only = 0,
.limit_in_pages = 1,
.seg_not_present = 0,
.useable = 0,
};
will leave .lm uninitialized. This means that anything in the
kernel that reads user_desc.lm for 32-bit tasks is unreliable.
Revert the .lm check in set_thread_area(). The value never did
anything in the first place.
Fixes: 0e58af4e1d ("x86/tls: Disallow unusual TLS segments")
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97c85a828f upstream.
Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()
The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.
Fixes: http://tracker.ceph.com/issues/9928
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- As we don't have ceph_create_snap_context(), open-code it in
ceph_snap_init()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6bf6ca7515 upstream.
This patch changes iscsit_do_tx_data() to fail on short writes
when kernel_sendmsg() returns a value different than requested
transfer length, returning -EPIPE and thus causing a connection
reset to occur.
This avoids a potential bug in the original code where a short
write would result in kernel_sendmsg() being called again with
the original iovec base + length.
In practice this has not been an issue because iscsit_do_tx_data()
is only used for transferring 48 byte headers + 4 byte digests,
along with seldom used control payloads from NOPIN + TEXT_RSP +
REJECT with less than 32k of data.
So following Al's audit of iovec consumers, go ahead and fail
the connection on short writes for now, and remove the bogus
logic ahead of his proper upstream fix.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f54e18f1b8 upstream.
Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.
Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0e58af4e1d upstream.
Users have no business installing custom code segments into the
GDT, and segments that are not present but are otherwise valid
are a historical source of interesting attacks.
For completeness, block attempts to set the L bit. (Prior to
this patch, the L bit would have been silently dropped.)
This is an ABI break. I've checked glibc, musl, and Wine, and
none of them look like they'll have any trouble.
Note to stable maintainers: this is a hardening patch that fixes
no known bugs. Given the possibility of ABI issues, this
probably shouldn't be backported quickly.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: security@kernel.org <security@kernel.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b5c8afe5be upstream.
"origPtr" is used as an offset into the bd->dbuf[] array. That array is
allocated in start_bunzip() and has "bd->dbufSize" number of elements so
the test here should be >= instead of >.
Later we check "origPtr" again before using it as an offset so I don't
know if this bug can be triggered in real life.
Fixes: bc22c17e12 ('bzip2/lzma: library support for gzip, bzip2 and lzma decompression')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c291ee6221 upstream.
Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.
CPU0 CPU1
show_interrupts()
desc = irq_to_desc(X);
free_desc(desc)
remove_from_radix_tree();
kfree(desc);
raw_spinlock_irq(&desc->lock);
/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.
The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.
For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.
Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.
Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.
Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.
Fixes: 1f5a5b87f7 "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2:
- Adjust context
- Handle the CONFIG_GENERIC_HARDIRQS=n case]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d025933e29 upstream.
As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
stopped being incremented after the use-after-free fix. Furthermore, the
RX-LED will be triggered by every multicast frame (which wouldn't happen
before) which wouldn't allow the LED to rest at all.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
patch.
Fixes: b8fff407a1 ("mac80211: fix use-after-free in defragmentation")
Signed-off-by: Andreas Müller <goo@stapelspeicher.org>
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a682e9c28c upstream.
If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error
return value is then (in most cases) just overwritten before we return.
This can result in reporting success to userspace although error happened.
This bug was introduced by commit 2e54eb96e2 ("BKL: Remove BKL from
ncpfs"). Propagate the errors correctly.
Coverity id: 1226925.
Fixes: 2e54eb96e2 ("BKL: Remove BKL from ncpfs")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 678886bdc6 upstream.
When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.
This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.
I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:
[PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
The corruption always happened when a transaction aborted and then fsck complained
like this:
_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
*** fsck.btrfs output ***
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
Check tree block failed, want=94945280, have=0
read block failed check_tree_block
Couldn't open file system
In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:
1) transaction N started, fs_info->pinned_extents pointed to
fs_info->freed_extents[0];
2) node/eb 94945280 is created;
3) eb is persisted to disk;
4) transaction N commit starts, fs_info->pinned_extents now points to
fs_info->freed_extents[1], and transaction N completes;
5) transaction N + 1 starts;
6) eb is COWed, and btrfs_free_tree_block() called for this eb;
7) eb range (94945280 to 94945280 + 16Kb) is added to
fs_info->pinned_extents (fs_info->freed_extents[1]);
8) Something goes wrong in transaction N + 1, like hitting ENOSPC
for example, and the transaction is aborted, turning the fs into
readonly mode. The stack trace I got for example:
[112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
[112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
[112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
[112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
[112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
[112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
[112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
[112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
[112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
(...)
[112065.264493] ---[ end trace dd7903a975a31a08 ]---
[112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
[112065.264997] BTRFS info (device sdc): forced readonly
9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
turn calls btrfs_destroy_pinned_extent();
10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
marked as dirty in fs_info->freed_extents[], and for each one
it calls discard, if the fs was mounted with "-o discard", and
adds the range to the free space cache of the respective block
group;
11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
sees the free space entries and performs a discard;
12) After an umount and mount (or fsck), our eb's location on disk was full
of zeroes, and it should have been untouched, because it was marked as
dirty in the fs_info->pinned_extents tree, and therefore used by the
trees that the last committed superblock points to.
Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.
This isn't a new problem, as it's been present since 2011 (git commit
acce952b02).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a5a519b271 upstream.
In recent testing I had disabled CONFIG_IP_MULTIPLE_TABLES and as a result
when I ran "cat /proc/net/fib_trie" the main trie was displayed multiple
times. I found that the problem line of code was in the function
fib_trie_seq_next. Specifically the line below caused the indexes to go in
the opposite direction of our traversal:
h = tb->tb_id & (FIB_TABLE_HASHSZ - 1);
This issue was that the RT tables are defined such that RT_TABLE_LOCAL is ID
255, while it is located at TABLE_LOCAL_INDEX of 0, and RT_TABLE_MAIN is 254
with a TABLE_MAIN_INDEX of 1. This means that the above line will return 1
for the local table and 0 for main. The result is that fib_trie_seq_next
will return NULL at the end of the local table, fib_trie_seq_start will
return the start of the main table, and then fib_trie_seq_next will loop on
main forever as h will always return 0.
The fix for this is to reverse the ordering of the two tables. It has the
advantage of making it so that the tables now print in the same order
regardless of if multiple tables are enabled or not. In order to make the
definition consistent with the multiple tables case I simply masked the to
RT_TABLE_XXX values by (FIB_TABLE_HASHSZ - 1). This way the two table
layouts should always stay consistent.
Fixes: 93456b6 ("[IPV4]: Unify access to the routing tables")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b26bdde5bb upstream.
When loading encrypted-keys module, if the last check of
aes_get_sizes() in init_encrypted() fails, the driver just returns an
error without unregistering its key type. This results in the stale
entry in the list. In addition to memory leaks, this leads to a kernel
crash when registering a new key type later.
This patch fixes the problem by swapping the calls of aes_get_sizes()
and register_key_type(), and releasing resources properly at the error
paths.
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f563a6a46 upstream.
Kernel side fence objects are used when unbinding resources and may thus be
created as part of a memory reclaim operation. This might trigger recursive
memory reclaims and result in the kernel running out of stack space.
So a simple way out is to avoid accounting of these fence objects.
In principle this is OK since while user-space can trigger the creation of
such objects, it can't really hold on to them. However, their lifetime is
quite long, so some form of accounting should perhaps be implemented in the
future.
Fixes kernel crashes when running, for example viewperf11 ensight-04 test 3
with low system memory settings.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cc4f14aa17 upstream.
There's an off-by-one bug in function __domain_mapping(), which may
trigger the BUG_ON(nr_pages < lvl_pages) when
(nr_pages + 1) & superpage_mask == 0
The issue was introduced by commit 9051aa0268 "intel-iommu: Combine
domain_pfn_mapping() and domain_sg_mapping()", which sets sg_res to
"nr_pages + 1" to avoid some of the 'sg_res==0' code paths.
It's safe to remove extra "+1" because sg_res is only used to calculate
page size now.
Reported-And-Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9e4982f6a5 upstream.
Like with ath9k, ath5k queues also need to be ordered by priority.
queue_info->tqi_subtype already contains the correct index, so use it
instead of relying on the order of ath5k_hw_setup_tx_queue calls.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78063d81d3 upstream.
Hardware queues are ordered by priority. Use queue index 0 for BK, which
has lower priority than BE.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad8fdccf9c upstream.
The driver passes the desired hardware queue index for a WMM data queue
in qinfo->tqi_subtype. This was ignored in ath9k_hw_setuptxqueue, which
instead relied on the order in which the function is called.
Reported-by: Hubert Feurstein <h.feurstein@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1c6156fe4 upstream.
This function isn't right and it causes a static checker warning:
drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
error: potentially using uninitialized 'sb_data_size'.
It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.
Fixes: 3241b1d3e0 ('dm: add persistent data library')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 69eba10e60 upstream.
In olden times the snd_hda_param_read() function always set "*start_id"
but in 2007 we introduced a new return and it causes uninitialized data
bugs in a couple of the callers: print_codec_info() and
hdmi_parse_codec().
Fixes: e8a7f136f5 ('[ALSA] hda-intel - Improve HD-audio codec probing robustness')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc625960ed upstream.
Both "dev->udev" and "interface->dev" are NULL. These printks are not
very interesting so I just deleted them.
Fixes: 03270634e2 ('USB: Add ADU support for Ontrak ADU devices')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 942080643b upstream.
Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Reported-by: Dmitry Chernenkov <dmitryc@google.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d92f2df056 upstream.
The isochronous endpoints are not valid when the Intel Bluetooth
controller boots up in bootloader mode. So just mark these endpoints
as broken and then they will not be configured.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0b8800623d upstream.
This will help to manage table of supported IDs.
There is no functional change.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.2: sort 04ca:3007 which was added after this upstream
but already added here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1ff383a4c3 upstream.
This patch adds waiting until transmit buffer and shifter will be empty
before clock disabling.
Without this fix it's possible to have clock disabled while data was
not transmited yet, which causes unproper state of TX line and problems
in following data transfers.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1a5fb99de4 upstream.
Some boards with TC6393XB chip require full state restore during system
resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000
tosa is one of them). Failing to do so would result in ohci Oops on
resume due to internal memory contentes being changed. Fail ohci suspend
on tc6393xb is full state restore is required.
Recommended workaround is to unbind tmio-ohci driver before suspend and
rebind it after resume.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5fabcb4c33 upstream.
We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
with a user passed in partno value. If we pass in 0x7fffffff, the
new target in disk_expand_part_tbl() overflows the 'int' and we
access beyond the end of ptbl->part[] and even write to it when we
do the rcu_assign_pointer() to assign the new partition.
Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c4cf0935a2 upstream.
Correct returning IRQ_HANDLED unconditionally in the irq handler.
Return IRQ_NONE for some interrupt which we do not expect to be
handled in this handler. This prevents kernel stalling with back
to back spurious interrupts.
Fixes: 2722e56de6 ("OMAP4: l3: Introduce l3-interconnect error handling driver")
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6c92b7e0a upstream.
The .eh_abort_handler needs to return SUCCESS, FAILED, or
FAST_IO_FAIL. So fixup all callers to adhere to this requirement.
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[bwh: Backported to 3.2: drop changes to esas2r]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36e8164882 upstream.
Commit 6ac665c63d ("PCI: rewrite PCI BAR reading code") masked off
low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(),
which compares 'base == maxbase' to check for read-only BARs. The masking
of 'l' means that comparison will never be 'true', so the check for
read-only BARs no longer works.
Resolve this by also masking off the low-order bits of 'sz' before passing
it into pci_size() as 'maxbase'. With this change, pci_size() will once
again catch the problems that have been encountered to date:
- AGP aperture BAR of AMD-7xx host bridges: if the AGP window is
disabled, this BAR is read-only and read as 0x00000008 [1]
- BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]
Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git))
Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b9d35d744 upstream.
This was not noticed for many years. Affects operation if
md raid is used a backing device for DRBD.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2: s/device/mdev/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f38aed975c upstream.
The logic of vfree()'ing vol->upd_buf is tied to vol->updating.
In ubi_start_update() vol->updating is set long before vmalloc()'ing
vol->upd_buf. If we encounter a write failure in ubi_start_update()
before vmalloc() the UBI device release function will try to vfree()
vol->upd_buf because vol->updating is set.
Fix this by allocating vol->upd_buf directly after setting vol->updating.
Fixes:
[ 31.559338] UBI warning: vol_cdev_release: update of volume 2 not finished, volume is damaged
[ 31.559340] ------------[ cut here ]------------
[ 31.559343] WARNING: CPU: 1 PID: 2747 at mm/vmalloc.c:1446 __vunmap+0xe3/0x110()
[ 31.559344] Trying to vfree() nonexistent vm area (ffffc90001f2b000)
[ 31.559345] Modules linked in:
[ 31.565620] 0000000000000bba ffff88002a0cbdb0 ffffffff818f0497 ffff88003b9ba148
[ 31.566347] ffff88002a0cbde0 ffffffff8156f515 ffff88003b9ba148 0000000000000bba
[ 31.567073] 0000000000000000 0000000000000000 ffff88002a0cbe88 ffffffff8156c10a
[ 31.567793] Call Trace:
[ 31.568034] [<ffffffff818f0497>] dump_stack+0x4e/0x7a
[ 31.568510] [<ffffffff8156f515>] ubi_io_write_vid_hdr+0x155/0x160
[ 31.569084] [<ffffffff8156c10a>] ubi_eba_write_leb+0x23a/0x870
[ 31.569628] [<ffffffff81569b36>] vol_cdev_write+0x226/0x380
[ 31.570155] [<ffffffff81179265>] vfs_write+0xb5/0x1f0
[ 31.570627] [<ffffffff81179f8a>] SyS_pwrite64+0x6a/0xa0
[ 31.571123] [<ffffffff818fde12>] system_call_fastpath+0x16/0x1b
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2dca485f87 upstream.
some control register changes will flush some aspects of the CPU, e.g.
POP explicitely mentions that for CR9-CR11 "TLBs may be cleared".
Instead of trying to be clever and only flush on specific CRs, let
play safe and flush on all lctl(g) as future machines might define
new bits in CRs. Load control intercept should not happen that often.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4c672e4b42 upstream.
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
dev:port1
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:100!
invalid opcode: 0000 [#1] SMP
Modules linked in: ixgbe(O)
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
[...]
Call Trace:
<IRQ>
[<ffffffff80578226>] ? skb_put+0x3a/0x3b
[<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
[<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
[<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
[<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
[<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
[<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
[<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
[<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
[<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
[<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6ead
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com>
Fixes: 72e09ad107 ("ipv6: avoid high order allocations")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a7ae199224 upstream.
ipv6: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6608824329 upstream.
ipv4: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11432050f0 upstream.
This patch fixes an issue that the NULL pointer dereference happens
when we uses g_audio driver. Since the g_audio driver will call
usb_ep_disable() in afunc_set_alt() before it calls usb_ep_enable(),
the uep->pipe of renesas usbhs driver will be NULL. So, this patch
adds a condition to avoid the oops.
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
Signed-off-by: Takeshi Kihara <takeshi.kihara.df@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes: 2f98382dc (usb: renesas_usbhs: Add Renesas USBHS Gadget)
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9c6ac78eb3 upstream.
After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
tests inode->i_state locklessly to see whether it already has all the
necessary I_DIRTY bits set. The comment above the barrier doesn't
contain any useful information - memory barriers can't ensure "changes
are seen by all cpus" by itself.
And it sure enough was broken. Please consider the following
scenario.
CPU 0 CPU 1
-------------------------------------------------------------------------------
enters __writeback_single_inode()
grabs inode->i_lock
tests PAGECACHE_TAG_DIRTY which is clear
enters __set_page_dirty()
grabs mapping->tree_lock
sets PAGECACHE_TAG_DIRTY
releases mapping->tree_lock
leaves __set_page_dirty()
enters __mark_inode_dirty()
smp_mb()
sees I_DIRTY_PAGES set
leaves __mark_inode_dirty()
clears I_DIRTY_PAGES
releases inode->i_lock
Now @inode has dirty pages w/ I_DIRTY_PAGES clear. This doesn't seem
to lead to an immediately critical problem because requeue_inode()
later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
deciding whether the inode needs to be requeued for IO and there are
enough unintentional memory barriers inbetween, so while the inode
ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
IO list.
The lack of explicit barrier may also theoretically affect the other
I_DIRTY bits which deal with metadata dirtiness. There is no
guarantee that a strong enough barrier exists between
I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
inode. Filesystem inode writeout path likely has enough stuff which
can behave as full barrier but it's theoretically possible that the
writeout may not see all the updates from ->dirty_inode().
Fix it by adding an explicit smp_mb() after I_DIRTY clearing. Note
that I_DIRTY_PAGES needs a special treatment as it always needs to be
cleared to be interlocked with the lockless test on
__mark_inode_dirty() side. It's cleared unconditionally and
reinstated after smp_mb() if the mapping still has dirty pages.
Also add comments explaining how and why the barriers are paired.
Lightly tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6290be1c1d upstream.
Instead of clearing I_DIRTY_PAGES and resetting it when we didn't succeed in
writing them all, just clear the bit only when we succeeded writing all the
pages. We also move the clearing of the bit close to other i_state handling to
separate it from writeback list handling. This is desirable because list
handling will differ for flusher thread and other writeback_single_inode()
callers in future. No filesystem plays any tricks with I_DIRTY_PAGES (like
checking it in ->writepages or ->write_inode implementation) so this movement
is safe.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2279948735 upstream.
This patches fixes an ancient bug in the dvb_usb_af9005 driver, which
has been reported at least in the following threads:
https://lkml.org/lkml/2009/2/4/350https://lkml.org/lkml/2014/9/18/558
If the driver is compiled in without any IR support (neither
DVB_USB_AF9005_REMOTE nor custom symbols), the symbol_request calls in
af9005_usb_module_init() return pointers != NULL although the IR
symbols are not available.
This leads to the following oops:
...
[ 8.529751] usbcore: registered new interface driver dvb_usb_af9005
[ 8.531584] BUG: unable to handle kernel paging request at 02e00000
[ 8.533385] IP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d
[ 8.535613] *pde = 00000000
[ 8.536416] Oops: 0000 [#1] PREEMPT PREEMPT DEBUG_PAGEALLOCDEBUG_PAGEALLOC
[ 8.537863] CPU: 0 PID: 1 Comm: swapper Not tainted 3.15.0-rc6-00151-ga5c075c #1
[ 8.539827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 8.541519] task: 89c9a670 ti: 89c9c000 task.ti: 89c9c000
[ 8.541519] EIP: 0060:[<7d9d67c6>] EFLAGS: 00010206 CPU: 0
[ 8.541519] EIP is at af9005_usb_module_init+0x6b/0x9d
[ 8.541519] EAX: 02e00000 EBX: 00000000 ECX: 00000006 EDX: 00000000
[ 8.541519] ESI: 00000000 EDI: 7da33ec8 EBP: 89c9df30 ESP: 89c9df2c
[ 8.541519] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
[ 8.541519] CR0: 8005003b CR2: 02e00000 CR3: 05a54000 CR4: 00000690
[ 8.541519] Stack:
[ 8.541519] 7d9d675b 89c9df90 7d992a49 7d7d5914 89c9df4c 7be3a800 7d08c58c 8a4c3968
[ 8.541519] 89c9df80 7be3a966 00000192 00000006 00000006 7d7d3ff4 8a4c397a 00000200
[ 8.541519] 7d6b1280 8a4c3979 00000006 000009a6 7da32db8 b13eec81 00000006 000009a6
[ 8.541519] Call Trace:
[ 8.541519] [<7d9d675b>] ? ttusb2_driver_init+0x16/0x16
[ 8.541519] [<7d992a49>] do_one_initcall+0x77/0x106
[ 8.541519] [<7be3a800>] ? parameqn+0x2/0x35
[ 8.541519] [<7be3a966>] ? parse_args+0x113/0x25c
[ 8.541519] [<7d992bc2>] kernel_init_freeable+0xea/0x167
[ 8.541519] [<7cf01070>] kernel_init+0x8/0xb8
[ 8.541519] [<7cf27ec0>] ret_from_kernel_thread+0x20/0x30
[ 8.541519] [<7cf01068>] ? rest_init+0x10c/0x10c
[ 8.541519] Code: 08 c2 c7 05 44 ed f9 7d 00 00 e0 02 c7 05 40 ed f9 7d 00 00 e0 02 c7 05 3c ed f9 7d 00 00 e0 02 75 1f b8 00 00 e0 02 85 c0 74 16 <a1> 00 00 e0 02 c7 05 54 84 8e 7d 00 00 e0 02 a3 58 84 8e 7d eb
[ 8.541519] EIP: [<7d9d67c6>] af9005_usb_module_init+0x6b/0x9d SS:ESP 0068:89c9df2c
[ 8.541519] CR2: 0000000002e00000
[ 8.541519] ---[ end trace 768b6faf51370fc7 ]---
The prefered fix would be to convert the whole IR code to use the kernel IR
infrastructure (which wasn't available at the time this driver had been created).
Until anyone who still has this old hardware steps up an does the conversion,
fix it by not calling the symbol_request calls if the driver is compiled in
without the default IR symbols (CONFIG_DVB_USB_AF9005_REMOTE).
Due to the IR related pointers beeing NULL by default, IR support will then be disabled.
The downside of this solution is, that it will no longer be possible to
compile custom IR symbols (not using CONFIG_DVB_USB_AF9005_REMOTE) in.
Please note that this patch has NOT been tested with all possible cases.
I don't have the hardware and could only verify that it fixes the reported
bug.
Reported-by: Fengguag Wu <fengguang.wu@intel.com>
Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Acked-by: Luca Olivetti <luca@ventoso.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 678fa12fb8 upstream.
The au0828 quirks table is currently not in sync with the au0828
media driver.
Syncronize it and put them on the same order as found at au0828
driver, as all the au0828 devices with analog TV need the
same quirks.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d1f00a20d upstream.
Add a macro to simplify au0828 quirk table. That makes easier
to check it against the USB IDs at drivers/media/usb/au0828/au0828-cards.c.
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2:
- Adjust filename
- Quirks were in a different order]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 332b122d39 upstream.
The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:
e77a56d [PATCH] eCryptfs: Encrypted passthrough
This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Priya Bansal <p.bansal@samsung.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1118b3602 upstream.
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.
The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed. However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.
Reported-by: Chris Webb <chris@arachsys.com>
Tested-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9772b54c55 upstream.
To accomodate for enough headroom for tunnels, use MAX_HEADER instead
of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs
of trinity an skb_under_panic() via SCTP output path (see reference).
I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere
in other protocols might be one possible cause for this.
In any case, it looks like accounting on chunks themself seems to look
good as the skb already passed the SCTP output path and did not hit
any skb_over_panic(). Given tunneling was enabled in his .config, the
headroom would have been expanded by MAX_HEADER in this case.
Reported-by: Robert Święcki <robert@swiecki.net>
Reference: https://lkml.org/lkml/2014/12/1/507
Fixes: 594ccc14df ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit de11b0e8c5 upstream.
These drivers now call ipv6_proxy_select_ident(), which is defined
only if CONFIG_INET is enabled. However, they have really depended
on CONFIG_INET for as long as they have allowed sending GSO packets
from userland.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Fixes: b9fb9ee07e ("macvtap: add GSO/csum offload support")
Fixes: 5188cd44c5 ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets")
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 4062090e3e upstream.
ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
and in case errors in __ip_append_data() nobody frees stolen dst entry
Fixes: 2e77d89b2f ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
Signed-off-by: Vasily Averin <vvs@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 349ce993ac upstream.
percpu tcp_md5sig_pool contains memory blobs that ultimately
go through sg_set_buf().
-> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
This requires that whole area is in a physically contiguous portion
of memory. And that @buf is not backed by vmalloc().
Given that alloc_percpu() can use vmalloc() areas, this does not
fit the requirements.
Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool
is small anyway, there is no gain to dynamically allocate it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 765cf9976e ("tcp: md5: remove one indirection level in tcp_md5sig_pool")
Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: the deleted code differs slightly due to API changes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71cea17ed3 upstream.
TCP md5 code uses per cpu variables but protects access to them with
a shared spinlock, which is a contention point.
[ tcp_md5sig_pool_lock is locked twice per incoming packet ]
Makes things much simpler, by allocating crypto structures once, first
time a socket needs md5 keys, and not deallocating them as they are
really small.
Next step would be to allow crypto allocations being done in a NUMA
aware way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
- Adjust context
- Conditions for alloc/free are quite different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f76936d07c upstream.
fib_nh_match does not match nexthops correctly. Example:
ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
nexthop via 192.168.122.15 dev eth0
Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process
Please consider this for stable trees as well.
Fixes: 4e902c5741 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4184b2a79a upstream.
A very minimal and simple user space application allocating an SCTP
socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
the socket again will leak the memory containing the authentication
key from user space:
unreferenced object 0xffff8800837047c0 (size 16):
comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
hex dump (first 16 bytes):
01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811c88d8>] __kmalloc+0xe8/0x270
[<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
[<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
[<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
[<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
[<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
[<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff
This is bad because of two things, we can bring down a machine from
user space when auth_enable=1, but also we would leave security sensitive
keying material in memory without clearing it after use. The issue is
that sctp_auth_create_key() already sets the refcount to 1, but after
allocation sctp_auth_set_key() does an additional refcount on it, and
thus leaving it around when we free the socket.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5188cd44c5 upstream.
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: For 3.2, net/ipv6/output_core.c is a completely new file]
commit 8ceee72808 upstream.
The GHASH setkey() function uses SSE registers but fails to call
kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
then having to deal with the restriction that they cannot be called from
interrupt context, move the setkey() implementation to the C domain.
Note that setkey() does not use any particular SSE features and is not
expected to become a performance bottleneck.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Fixes: 0e1227d356 (crypto: ghash - Add PCLMULQDQ accelerated implementation)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 101b96f329 upstream.
DRM_IOCTL_MODE_GETFB is used to retrieve information about a given
framebuffer ID. It is a read-only helper and was thus declassified for
unprivileged access in:
commit a14b1b4247
Author: Mandeep Singh Baines <mandeep.baines@gmail.com>
Date: Fri Jan 20 12:11:16 2012 -0800
drm: remove master fd restriction on mode setting getters
However, alongside width, height and stride information,
DRM_IOCTL_MODE_GETFB also passes back a handle to the underlying buffer of
the framebuffer. This handle allows users to mmap() it and read or write
into it. Obviously, this should be restricted to DRM-Master.
With the current setup, *any* process with access to /dev/dri/card0 (which
means any process with access to hardware-accelerated rendering) can
access the current screen framebuffer and modify it ad libitum.
For backwards-compatibility reasons we want to keep the
DRM_IOCTL_MODE_GETFB call unprivileged. Besides, it provides quite useful
information regarding screen setup. So we simply test whether the caller
is the current DRM-Master and if not, we return 0 as handle, which is
always invalid. A following DRM_IOCTL_GEM_CLOSE on this handle will fail
with EINVAL, but we accept this. Users shouldn't test for errors during
GEM_CLOSE, anyway. And it is still better as a failing MODE_GETFB call.
v2: add capable(CAP_SYS_ADMIN) check for compatibility with i-g-t
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2:
- drm_framebuffer_funcs::create_handle must be non-null
- Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8adbf78ec4 upstream.
Git commit 4f37a68cda
"s390: Use direct ktime path for s390 clockevent device" makes use
of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta
calculation with ktime_get() in clockevents_program_event and the
get_tod_clock() in s390_next_event. This is based on the assumption
that the difference between the internal ktime and the hardware
clock is reflected in the wall_to_monotonic delta. But this is not
true, the ntp corrections are applied via changes to the tk->mult
multiplier and this is not reflected in wall_to_monotonic.
In theory this could be solved by using the raw monotonic clock
but it is simpler to switch back to the standard clock delta
calculation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: s/get_tod_clock()/get_clock()/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c9b92530a7 upstream.
Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Adjust context to apply after commit 0e9a9a1ad6
('ext4: avoid hang when mounting non-journal filesystems with orphan list')
and commit e2bfb088fa
('ext4: don't orphan or truncate the boot loader inode')]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca5358ef75 upstream.
... by not hitting rename_retry for reasons other than rename having
happened. In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill(). Skip the killed siblings instead...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- As we only have try_to_ascend() and not d_walk(), apply this
change to all callers of try_to_ascend()
- Adjust context to make __dentry_kill() apply to d_kill()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 946e51f2bf upstream.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2:
- Apply name changes in all the different places we use d_alias and d_child
- Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 29fa682546 upstream.
paravirt_enabled has the following effects:
- Disables the F00F bug workaround warning. There is no F00F bug
workaround any more because Linux's standard IDT handling already
works around the F00F bug, but the warning still exists. This
is only cosmetic, and, in any event, there is no such thing as
KVM on a CPU with the F00F bug.
- Disables 32-bit APM BIOS detection. On a KVM paravirt system,
there should be no APM BIOS anyway.
- Disables tboot. I think that the tboot code should check the
CPUID hypervisor bit directly if it matters.
- paravirt_enabled disables espfix32. espfix32 should *not* be
disabled under KVM paravirt.
The last point is the purpose of this patch. It fixes a leak of the
high 16 bits of the kernel stack address on 32-bit KVM paravirt
guests. Fixes CVE-2014-8134.
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2e323ec96 upstream.
We need to add a limit check here so we don't overflow the buffer.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a2b9e6c1a3 upstream.
Commit fc3a9157d3 ("KVM: X86: Don't report L2 emulation failures to
user-space") disabled the reporting of L2 (nested guest) emulation failures to
userspace due to race-condition between a vmexit and the instruction emulator.
The same rational applies also to userspace applications that are permitted by
the guest OS to access MMIO area or perform PIO.
This patch extends the current behavior - of injecting a #UD instead of
reporting it to userspace - also for guest userspace code.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e40607cbe2 upstream.
An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
in the form of:
------------ INIT[PARAM: SET_PRIMARY_IP] ------------>
While the INIT chunk parameter verification dissects through many things
in order to detect malformed input, it misses to actually check parameters
inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
IP address' parameter in ASCONF, which has as a subparameter an address
parameter.
So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
and thus sctp_get_af_specific() returns NULL, too, which we then happily
dereference unconditionally through af->from_addr_param().
The trace for the log:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
PGD 0
Oops: 0000 [#1] SMP
[...]
Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffffa01e9c62>] [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
[...]
Call Trace:
<IRQ>
[<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
[<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
[<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
[<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
[<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
[<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
[<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
[<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
[<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
[<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
[...]
A minimal way to address this is to check for NULL as we do on all
other such occasions where we know sctp_get_af_specific() could
possibly return with NULL.
Fixes: d6de309759 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c03aa9f6e1 upstream.
We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.
Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of indirect ICBs we follow to avoid
infinite loops.
Signed-off-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ea359f731 upstream.
According to I2C specification the NACK should be handled as follows:
"When SDA remains HIGH during this ninth clock pulse, this is defined as the Not
Acknowledge signal. The master can then generate either a STOP condition to
abort the transfer, or a repeated START condition to start a new transfer."
[I2C spec Rev. 6, 3.1.6: http://www.nxp.com/documents/user_manual/UM10204.pdf]
Currently the Davinci i2c driver interrupts the transfer on receipt of a
NACK but fails to send a STOP in some situations and so makes the bus
stuck until next I2C IP reset (idle/enable).
For example, the issue will happen during SMBus read transfer which
consists from two i2c messages write command/address and read data:
S Slave Address Wr A Command Code A Sr Slave Address Rd A D1..Dn A P
<--- write -----------------------> <--- read --------------------->
The I2C client device will send NACK if it can't recognize "Command Code"
and it's expected from I2C master to generate STP in this case.
But now, Davinci i2C driver will just exit with -EREMOTEIO and STP will
not be generated.
Hence, fix it by generating Stop condition (STP) always when NACK is received.
This patch fixes Davinci I2C in the same way it was done for OMAP I2C
commit cda2109a26 ("i2c: omap: query STP always when NACK is received").
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Hein Tibosch <hein_tibosch@yahoo.es>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2022b4d18a upstream.
I've been seeing swapoff hangs in recent testing: it's cycling around
trying unsuccessfully to find an mm for some remaining pages of swap.
I have been exercising swap and page migration more heavily recently,
and now notice a long-standing error in copy_one_pte(): it's trying to
add dst_mm to swapoff's mmlist when it finds a swap entry, but is doing
so even when it's a migration entry or an hwpoison entry.
Which wouldn't matter much, except it adds dst_mm next to src_mm,
assuming src_mm is already on the mmlist: which may not be so. Then if
pages are later swapped out from dst_mm, swapoff won't be able to find
where to replace them.
There's already a !non_swap_entry() test for stats: move that up before
the swap_duplicate() and the addition to mmlist.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kelley Nielsen <kelleynnn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aad0b62412 upstream.
irq_of_parse_and_map() returns 0 on error (the result is unsigned int),
so testing for negative result never works.
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f55fefd1a5 upstream.
The WARN_ON checking whether i_mutex is held in
pagecache_isize_extended() was wrong because some filesystems (e.g.
XFS) use different locks for serialization of truncates / writes. So
just remove the check.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b3f207855f upstream.
When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.
For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.
This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.
Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
[lizf: Backported to 3.4 by David Ahern]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e2cffb5f49 upstream.
On archs with PAGE_SIZE >= 64 KiB the function skcipher_alloc_sgl()
fails with -ENOMEM no matter what user space actually requested.
This is caused by the fact sock_kmalloc call inside the function tried
to allocate more memory than allowed by the default kernel socket buffer
size (kernel param net.core.optmem_max).
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
With commits 73f156a6e8 ("inetpeer: get rid of ip_id_count") and
04ca6973f7 ("ip: make IP identifiers less predictable"), IP
identifiers are generated from a counter chosen from an array of
counters indexed by the hash of the outgoing packet header's source
address, destination address, and protocol number. Thus, in
__ip_make_skb(), we must now call ip_select_ident() only after setting
these fields in the IP header to prevent IP identifiers from being
generated from bogus counters.
IP id sequence before fix: 18174, 5789, 5953, 59420, 59637, ...
After fix: 5967, 6185, 6374, 6600, 6795, 6892, 7051, 7288, ...
Signed-off-by: Jeffrey Knockel <jeffk@cs.unm.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
commit 2cc5bfaf85 upstream.
When the driver calls scsi_done and after that frees it's internal
preallocated memory it can happen that a new job is enqueud before
the memory is freed. The allocation fails and the message
"cmd_alloc returned NULL" is shown.
Patch below fixes it by moving cmd->scsi_done after cmd_free.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Masoud Sharbiani <msharbiani@twopensource.com>
commit 9a123f1983 upstream.
The main purpose of this function is to exclude ME devices
without support for MEI/HECI interface from binding
Currently affected systems are C600/X79 based servers
that expose PCI device even though it doesn't supported ME Interface.
MEI driver accessing such nonfunctional device can corrupt
the system.
[Backported to 3.2: files were moved]
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1f9a40726 upstream.
The xpad wireless endpoint is not a bulk endpoint on my devices, but
rather an interrupt one, so the USB core complains when it is submitted.
I'm guessing that the author really did mean that this should be an
interrupt urb, but as there are a zillion different xpad devices out
there, let's cover out bases and handle both bulk and interrupt
endpoints just as easily.
Signed-off-by: "Pierre-Loup A. Griffais" <pgriffais@valvesoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 413cbf469a upstream.
AMD/ATI HDMI controller chip models, we already have a filter to lower
to 32bit DMA, but the rest are supposed to be working with 64bit
although the hardware doesn't really work with 63bit but only with 40
or 48bit DMA. In this patch, we take 40bit DMA for safety for the
AMD/ATI controllers as the graphics drivers does.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2:
- Adjust context
- s/AZX_GCAP_64OK/ICH6_GCAP_64OK/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b645af2d59 upstream.
It's possible for iretq to userspace to fail. This can happen because
of a bad CS, SS, or RIP.
Historically, we've handled it by fixing up an exception from iretq to
land at bad_iret, which pretends that the failed iret frame was really
the hardware part of #GP(0) from userspace. To make this work, there's
an extra fixup to fudge the gs base into a usable state.
This is suboptimal because it loses the original exception. It's also
buggy because there's no guarantee that we were on the kernel stack to
begin with. For example, if the failing iret happened on return from an
NMI, then we'll end up executing general_protection on the NMI stack.
This is bad for several reasons, the most immediate of which is that
general_protection, as a non-paranoid idtentry, will try to deliver
signals and/or schedule from the wrong stack.
This patch throws out bad_iret entirely. As a replacement, it augments
the existing swapgs fudge into a full-blown iret fixup, mostly written
in C. It's should be clearer and more correct.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- We didn't use the _ASM_EXTABLE macro
- Don't use __visible]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af726f21ed upstream.
There's nothing special enough about the espfix64 double fault fixup to
justify writing it in assembly. Move it to C.
This also fixes a bug: if the double fault came from an IST stack, the
old asm code would return to a partially uninitialized stack frame.
Fixes: 3891a04aaf
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Keep using the paranoiderrorentry macro to generate the asm code
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f442be2fb upstream.
On a 32-bit kernel, this has no effect, since there are no IST stacks.
On a 64-bit kernel, #SS can only happen in user code, on a failed iret
to user space, a canonical violation on access via RSP or RBP, or a
genuine stack segment violation in 32-bit kernel code. The first two
cases don't need IST, and the latter two cases are unlikely fatal bugs,
and promoting them to double faults would be fine.
This fixes a bug in which the espfix64 code mishandles a stack segment
violation.
This saves 4k of memory per CPU and a tiny bit of code.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- No need to define trace_stack_segment
- Use the errorentry macro to generate #SS asm code
- Adjust context
- Checked that this matches Luis's backport for Ubuntu]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1377e5397 upstream.
When system is being suspended, if host device is not allowed to do wakeup,
xhci_suspend() needs to clear all root port wake on bits. Otherwise, some
platforms may generate spurious wakeup, even if PCI PME# is disabled.
The initial commit ff8cbf250b ("xhci: clear root port wake on bits"),
which also got into stable, turned out to not work correctly and had to
be reverted, and is now rewritten.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
[Mathias Nyman: reword commit message]
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context; drop xhci-plat changes]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8e71a322fd upstream.
If a device is halted and reuturns a STALL, then the halted endpoint
needs to be cleared both on the host and device side. The host
side halt is cleared by issueing a xhci reset endpoint command. The device side
is cleared with a ClearFeature(ENDPOINT_HALT) request, which should
be issued by the device driver if a URB reruen -EPIPE.
Previously we cleared the host side halt after the device side was cleared.
To make sure the host side halt is cleared in time we want to issue the
reset endpoint command immedialtely when a STALL status is encountered.
Otherwise we end up not following the specs and not returning -EPIPE
several times in a row when trying to transfer data to a halted endpoint.
Fixes: bcef3fd (USB: xhci: Handle errors that cause endpoint halts.)
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: xhci_endpoint_reset() looked a little different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c3492dbfa1 upstream.
A halted endpoint ring must first be reset, then move the ring
dequeue pointer past the problematic TRB. If we start the ring too
early after reset, but before moving the dequeue pointer we
will end up executing the same problematic TRB again.
As we always issue a set transfer dequeue command after a reset
endpoint command we can skip starting endpoint rings at reset endpoint
command completion.
Without this fix we end up trying to handle the same faulty TD for
contol endpoints. causing timeout, and failing testusb ctrl_out write
tests.
Fixes: e9df17e (USB: xhci: Correct assumptions about number of rings per endpoint.)
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef59a20ba3 upstream.
According to the manuals I have, XScale auxiliary register should be
reached with opc_2 = 1 instead of crn = 1. cpu_xscale_proc_init
correctly uses c1, c0, 1 arguments, but cpu_xscale_do_suspend and
cpu_xscale_do_resume use c1, c1, 0. Correct suspend/resume functions to
also use c1, c0, 1.
The issue was primarily noticed thanks to qemu reporing "unsupported
instruction" on the pxa suspend path. Confirmed in PXA210/250 and PXA255
XScale Core manuals and in PXA270 and PXA320 Developers Guides.
Harware tested by me on tosa (pxa255). Robert confirmed on pxa270 board.
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c6c15e1ed3 upstream.
The currect code for nfsd41_cb_get_slot() and nfsd4_cb_done() has no
locking in order to guarantee atomicity, and so allows for races of
the form.
Task 1 Task 2
====== ======
if (test_and_set_bit(0) != 0) {
clear_bit(0)
rpc_wake_up_next(queue)
rpc_sleep_on(queue)
return false;
}
This patch breaks the race condition by adding a retest of the bit
after the call to rpc_sleep_on().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 093a1468b6 upstream.
Both xprt_lookup_rqst() and xprt_complete_rqst() require that you
take the transport lock in order to avoid races with xprt_transmit().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71efecb3f5 upstream.
xprt_lookup_rqst() and bc_send_request() display a byte-swapped XID,
but receive_cb_reply() does not.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 75bcbf29c2 upstream.
Fix reporting of overrun errors, which should only be reported once
using the inserted null character.
Fixes: 6b8f1ca558 ("USB: ssu100: set tty_flags in ssu100_process_packet")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- Use tty_port_tty_get() to look up tty_struct
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 855515a6d3 upstream.
Fix reporting of overrun errors, which are not associated with a
character. Instead insert a null character and report only once.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- s/\&port->port/tty/
- Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d1678a33c upstream.
Fix handling of TTY error flags, which are not bitmasks and must
specifically not be ORed together as this prevents the line discipline
from recognising them.
Also insert null characters when reporting overrun errors as these are
not associated with the received character.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- s/\&port->port/tty/
- Adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 746c9e9f92 upstream.
We have a historical hack that treats missing ranges properties as the
equivalent of an empty one. This is needed for ancient PowerMac "bad"
device-trees, and shouldn't be enabled for any other PowerPC platform,
otherwise we get some nasty layout of devices in sysfs or even
duplication when a set of otherwise identically named devices is
created multiple times under a different parent node with no ranges
property.
This fix is needed for the PowerNV i2c busses to be exposed properly
and will fix a number of other embedded cases.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
[bwh: Backported to 3.2: use #ifdef because IS_ENABLED() only works for
config symbols that are defined on the current architecture]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 45e2a9d470 upstream.
When setting up permissions on kernel memory at boot, the end of the
PMD that was split from bss remained executable. It should be NX like
the rest. This performs a PMD alignment instead of a PAGE alignment to
get the correct span of memory.
Before:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte
0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd
0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte
0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte
0xffffffff82e00000-0xffffffffc0000000 978M pmd
After:
---[ High Kernel Mapping ]---
...
0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte
0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd
0xffffffff82e00000-0xffffffffc0000000 978M pmd
[ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment.
We really should unmap the reminder along with the holes
caused by init,initdata etc. but thats a different issue ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: BAckported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit efbd50d2f6 upstream.
It seems struct esd_usb2 dev is not deallocated on disconnect. The patch adds
the missing deallocation.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Matthias Fuchs <matthias.fuchs@esd.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2cd3949f70 upstream.
We have some very similarly named command-line options:
arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);
__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:
__setup("foo", x86_foo_func...);
The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar". If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.
This makes the "noxsave" handler only return success when it finds
an *exact* match.
[ tglx: We really need to make __setup() more robust. ]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ccf54555da upstream.
The direction field is set on 7 bits, thus we need to AND it with 0111 111 mask
in order to retrieve it, that is 0x7F, not 0xCF as it is now.
Fixes: ade7ef7ba (staging:iio: Differential channel handling)
Signed-off-by: Cristina Ciocan <cristina.ciocan@intel.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eaca2d8e75 upstream.
Found by the UC-KLEE tool: A user could supply less input to
firewire-cdev ioctls than write- or write/read-type ioctl handlers
expect. The handlers used data from uninitialized kernel stack then.
This could partially leak back to the user if the kernel subsequently
generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
which notably would contain the _u64 closure field which many of the
ioctl argument structures contain.
The fact that the handlers would act on random garbage input is a
lesser issue since all handlers must check their input anyway.
The fix simply always null-initializes the entire ioctl argument buffer
regardless of the actual length of expected user input. That is, a
runtime overhead of memset(..., 40) is added to each firewirew-cdev
ioctl() call. [Comment from Clemens Ladisch: This part of the stack is
most likely to be already in the cache.]
Remarks:
- There was never any leak from kernel stack to the ioctl output
buffer itself. IOW, it was not possible to read kernel stack by a
read-type or write/read-type ioctl alone; the leak could at most
happen in combination with read()ing subsequent event data.
- The actual expected minimum user input of each ioctl from
include/uapi/linux/firewire-cdev.h is, in bytes:
[0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
[0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20,
[0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8,
[0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
[0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4.
Reported-by: David Ramos <daramos@stanford.edu>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c251ea7bd7 upstream.
On a mx28evk with a sgtl5000 codec we notice a loud 'click' sound to happen
5 seconds after the end of a playback.
The SMALL_POP bit should fix this, but its definition is incorrect:
according to the sgtl5000 manual it is bit 0 of CHIP_REF_CTRL register, not
bit 1.
Fix the definition accordingly and enable the bit as intended per the code
comment.
After applying this change, no loud 'click' sound is heard after playback
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cfd9167af1 upstream.
RT2800 and newer hardware require padding between header and payload if
header length is not multiple of 4.
For historical reasons we also align payload to to 4 bytes boundary, but
such alignment is not needed on modern H/W.
Patch fixes skb_under_panic problems reported from time to time:
https://bugzilla.kernel.org/show_bug.cgi?id=84911https://bugzilla.kernel.org/show_bug.cgi?id=72471http://marc.info/?l=linux-wireless&m=139108549530402&w=2https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1087591
Panic happened because we eat 4 bytes of skb headroom on each
(re)transmission when sending frame without the payload and the header
length not being multiple of 4 (i.e. QoS header has 26 bytes). On such
case because paylad_aling=2 is bigger than header_align=0 we increase
header_align by 4 bytes. To prevent that we could change the check to:
if (payload_length && payload_align > header_align)
header_align += 4;
but not aligning payload at all is more effective and alignment is not
really needed by H/W (that has been tested on OpenWrt project for few
years now).
Reported-and-tested-by: Antti S. Lankila <alankila@bel.fi>
Debugged-by: Antti S. Lankila <alankila@bel.fi>
Reported-by: Henrik Asp <solenskiner@gmail.com>
Originally-From: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 799b601451 upstream.
Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.
The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.
Adding any mask should fix this.
Fixes: 90b1e7a578 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 48379270fe upstream.
Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH. Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state. Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Meelis Roos <meelis.roos@ut.ee>
Tested-by: Meelis Roos <meelis.roos@ut.ee>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9d720b34c0 upstream.
On some Dell Latitude laptops ALPS device or Dell EC send one invalid byte
in 6 bytes ALPS packet. In this case psmouse driver enter out of sync
state. It looks like that all other bytes in packets are valid and also
device working properly. So there is no need to do full device reset, just
need to wait for byte which match condition for first byte (start of
packet). Because ALPS packets are bigger (6 or 8 bytes) default limit is
small.
This patch increase number of invalid bytes to size of 2 ALPS packets which
psmouse driver can drop before do full reset.
Resetting ALPS devices take some time and when doing reset on some Dell
laptops touchpad, trackstick and also keyboard do not respond. So it is
better to do it only if really necessary.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ab8f7f320 upstream.
5th and 6th byte of ALPS trackstick V3 protocol match condition for first
byte of PS/2 3 bytes packet. When driver enters out of sync state and ALPS
trackstick is sending data then driver match 5th, 6th and next 1st bytes as
PS/2.
It basically means if user is using trackstick when driver is in out of
sync state driver will never resync. Processing these bytes as 3 bytes PS/2
data cause total mess (random cursor movements, random clicks) and make
trackstick unusable until psmouse driver decide to do full device reset.
Lot of users reported problems with ALPS devices on Dell Latitude E6440,
E6540 and E7440 laptops. ALPS device or Dell EC for unknown reason send
some invalid ALPS PS/2 bytes which cause driver out of sync. It looks like
that i8042 and psmouse/alps driver always receive group of 6 bytes packets
so there are no missing bytes and no bytes were inserted between valid
ones.
This patch does not fix root of problem with ALPS devices found in Dell
Latitude laptops but it does not allow to process some (invalid)
subsequence of 6 bytes ALPS packets as 3 bytes PS/2 when driver is out of
sync.
So with this patch trackstick input device does not report bogus data when
also driver is out of sync, so trackstick should be usable on those
machines.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0a8727e697 upstream.
An IOCTL call that calls spi_setup() and then dw_spi_setup() will
overwrite the persisted last transfer speed. On each transfer, the
SPI speed is compared to the last transfer speed to determine if the
clock divider registers need to be updated (did the speed change?).
This bug was observed with the spidev driver using spi-config to
update the max transfer speed.
This fix: Don't overwrite the persisted last transaction clock speed
when updating the SPI parameters in dw_spi_setup(). On the next
transaction, the new speed won't match the persisted last speed
and the hardware registers will be updated.
On initialization, the persisted last transaction clock
speed will be 0 but will be updated after the first SPI
transaction.
Move zeroed clock divider check into clock change test because
chip->clk_div is zero on startup and would cause a divide-by-zero
error. The calculation was wrong as well (can't support odd #).
Reported-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9e326f7871 upstream.
We can call this function for a dummy console that doesn't support
setting the font mapping, which will result in a null ptr BUG. So check
for this case and return error for consoles w/o font mapping support.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=59321
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: this function doesn't take a lock, so doesn't
need to unlock on error]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37b1645788 upstream.
Kernel oops can cause the tty to be unreleaseable (for example, if
n_tty_read() crashes while on the read_wait queue). This will cause
tty_release() to endlessly loop without sleeping.
Use a killable sleep timeout which grows by 2n+1 jiffies over the interval
[0, 120 secs.) and then jumps to forever (but still killable).
NB: killable just allows for the task to be rewoken manually, not
to be terminated.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 547039ec50 upstream.
uart_get_baud_rate() will return baud == 0 if the max rate is set
to the "magic" 38400 rate and the SPD_* flags are also specified.
On the first iteration, if the current baud rate is higher than the
max, the baud rate is clamped at the max (which in the degenerate
case is 38400). On the second iteration, the now-"magic" 38400 baud
rate selects the possibly higher alternate baud rate indicated by
the SPD_* flag. Since only two loop iterations are performed, the
loop is exited, a kernel WARNING is generated and a baud rate of
0 is returned.
Reproducible with:
setserial /dev/ttyS0 spd_hi base_baud 38400
Only perform the "magic" 38400 -> SPD_* baud transform on the first
loop iteration, which prevents the degenerate case from recognizing
the clamped baud rate as the "magic" 38400 value.
Reported-by: Robert Święcki <robert@swiecki.net>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4473d054ce upstream.
Make sure to only raise DTR on transitions from B0 in set_termios.
Also allow set_termios to be called from open with a termios_old of
NULL. Note that DTR will not be raised prematurely in this case.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b598aacc29 upstream.
"raw" is a property of a channel, but should not be part of the name of
channel.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: using IIO_CHAN() macro to initialise the structures]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0725dda207 upstream.
Some USB-audio devices show weird sysfs warnings at disconnecting the
devices, e.g.
usb 1-3: USB disconnect, device number 3
------------[ cut here ]------------
WARNING: CPU: 0 PID: 973 at fs/sysfs/group.c:216 device_del+0x39/0x180()
sysfs group ffffffff8183df40 not found for kobject 'midiC1D0'
Call Trace:
[<ffffffff814a3e38>] ? dump_stack+0x49/0x71
[<ffffffff8103cb72>] ? warn_slowpath_common+0x82/0xb0
[<ffffffff8103cc55>] ? warn_slowpath_fmt+0x45/0x50
[<ffffffff813521e9>] ? device_del+0x39/0x180
[<ffffffff81352339>] ? device_unregister+0x9/0x20
[<ffffffff81352384>] ? device_destroy+0x34/0x40
[<ffffffffa00ba29f>] ? snd_unregister_device+0x7f/0xd0 [snd]
[<ffffffffa025124e>] ? snd_rawmidi_dev_disconnect+0xce/0x100 [snd_rawmidi]
[<ffffffffa00c0192>] ? snd_device_disconnect+0x62/0x90 [snd]
[<ffffffffa00c025c>] ? snd_device_disconnect_all+0x3c/0x60 [snd]
[<ffffffffa00bb574>] ? snd_card_disconnect+0x124/0x1a0 [snd]
[<ffffffffa02e54e8>] ? usb_audio_disconnect+0x88/0x1c0 [snd_usb_audio]
[<ffffffffa015260e>] ? usb_unbind_interface+0x5e/0x1b0 [usbcore]
[<ffffffff813553e9>] ? __device_release_driver+0x79/0xf0
[<ffffffff81355485>] ? device_release_driver+0x25/0x40
[<ffffffff81354e11>] ? bus_remove_device+0xf1/0x130
[<ffffffff813522b9>] ? device_del+0x109/0x180
[<ffffffffa01501d5>] ? usb_disable_device+0x95/0x1f0 [usbcore]
[<ffffffffa014634f>] ? usb_disconnect+0x8f/0x190 [usbcore]
[<ffffffffa0149179>] ? hub_thread+0x539/0x13a0 [usbcore]
[<ffffffff810669f5>] ? sched_clock_local+0x15/0x80
[<ffffffff81066c98>] ? sched_clock_cpu+0xb8/0xd0
[<ffffffff81070730>] ? bit_waitqueue+0xb0/0xb0
[<ffffffffa0148c40>] ? usb_port_resume+0x430/0x430 [usbcore]
[<ffffffffa0148c40>] ? usb_port_resume+0x430/0x430 [usbcore]
[<ffffffff8105973e>] ? kthread+0xce/0xf0
[<ffffffff81059670>] ? kthread_create_on_node+0x1c0/0x1c0
[<ffffffff814a8b7c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81059670>] ? kthread_create_on_node+0x1c0/0x1c0
---[ end trace 40b1928d1136b91e ]---
This comes from the fact that usb-audio driver may receive the
disconnect callback multiple times, per each usb interface. When a
device has both audio and midi interfaces, it gets called twice, and
currently the driver tries to release resources at the last call.
At this point, the first parent interface has been already deleted,
thus deleting a child of the first parent hits such a warning.
For fixing this problem, we need to call snd_card_disconnect() and
cancel pending operations at the very first disconnect while the
release of the whole objects waits until the last disconnect call.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80931
Reported-and-tested-by: Tomas Gayoso <tgayoso@gmail.com>
Reported-and-tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b45abacde3 upstream.
The switch back is limited to ULT even on HP. The contrary
finding arose by bad luck in BIOS versions for testing.
This fixes spontaneous resume from S3 on some HP laptops.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 93c9bf4d18 upstream.
Sometimes mass-storage devices using the Bulk-only transport will
mistakenly skip the data phase of a command. Rather than sending the
data expected by the host or sending a zero-length packet, they go
directly to the status phase and send the CSW.
This causes problems for usb-storage, for obvious reasons. The driver
will interpret the CSW as a short data transfer and will wait to
receive a CSW. The device won't have anything left to send, so the
command eventually times out.
The SCSI layer doesn't retry commands after they time out (this is a
relatively recent change). Therefore we should do our best to detect
a skipped data phase and handle it promptly.
This patch adds code to do that. If usb-storage receives a short
13-byte data transfer from the device, and if the first four bytes of
the data match the CSW signature, the driver will set the residue to
the full transfer length and interpret the data as a CSW.
This fixes Bugzilla #86611.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Tested-by: Paul Osmialowski <newchief@king.net.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: use US_DEBUGP() not usb_stor_dbg()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90a646c770 upstream.
This commit fixes the following oops:
[10238.622067] scsi host3: uas_eh_bus_reset_handler start
[10240.766164] usb 3-4: reset SuperSpeed USB device number 3 using xhci_hcd
[10245.779365] usb 3-4: device descriptor read/8, error -110
[10245.883331] usb 3-4: reset SuperSpeed USB device number 3 using xhci_hcd
[10250.897603] usb 3-4: device descriptor read/8, error -110
[10251.058200] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[10251.058244] IP: [<ffffffff815ac6e1>] xhci_check_streams_endpoint+0x91/0x140
<snip>
[10251.059473] Call Trace:
[10251.059487] [<ffffffff815aca6c>] xhci_calculate_streams_and_bitmask+0xbc/0x130
[10251.059520] [<ffffffff815aeb5f>] xhci_alloc_streams+0x10f/0x5a0
[10251.059548] [<ffffffff810a4685>] ? check_preempt_curr+0x75/0xa0
[10251.059575] [<ffffffff810a46dc>] ? ttwu_do_wakeup+0x2c/0x100
[10251.059601] [<ffffffff810a49e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[10251.059635] [<ffffffff815779ab>] usb_alloc_streams+0xab/0xf0
[10251.059662] [<ffffffffc0616b48>] uas_configure_endpoints+0x128/0x150 [uas]
[10251.059694] [<ffffffffc0616bac>] uas_post_reset+0x3c/0xb0 [uas]
[10251.059722] [<ffffffff815727d9>] usb_reset_device+0x1b9/0x2a0
[10251.059749] [<ffffffffc0616f42>] uas_eh_bus_reset_handler+0xb2/0x190 [uas]
[10251.059781] [<ffffffff81514293>] scsi_try_bus_reset+0x53/0x110
[10251.059808] [<ffffffff815163b7>] scsi_eh_bus_reset+0xf7/0x270
<snip>
The problem is the following call sequence (simplified):
1) usb_reset_device
2) usb_reset_and_verify_device
2) hub_port_init
3) hub_port_finish_reset
3) xhci_discover_or_reset_device
This frees xhci->devs[slot_id]->eps[ep_index].ring for all eps but 0
4) usb_get_device_descriptor
This fails
5) hub_port_init fails
6) usb_reset_and_verify_device fails, does not restore device config
7) uas_post_reset
8) xhci_alloc_streams
NULL deref on the free-ed ring
This commit fixes this by not allowing usb_alloc_streams to continue if
the device is not configured.
Note that we do allow usb_free_streams to continue after a (logical)
disconnect, as it is necessary to explicitly free the streams at the xhci
controller level.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8fff407a1 upstream.
Upon receiving the last fragment, all but the first fragment
are freed, but the multicast check for statistics at the end
of the function refers to the current skb (the last fragment)
causing a use-after-free bug.
Since multicast frames cannot be fragmented and we check for
this early in the function, just modify that check to also
do the accounting to fix the issue.
Reported-by: Yosef Khyal <yosefx.khyal@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e681286de2 upstream.
Write may be called from interrupt context so make sure to use
GFP_ATOMIC for all allocations in write.
Fixes: 0d930e51cf ("USB: opticon: Add Opticon OPN2001 write support")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1912528376 upstream.
Write may be called from interrupt context so make sure to use
GFP_ATOMIC for all allocations in write.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Johan Hovold <johan@kernel.org>
[bwh: Backported to 3.2:
- s/interrupt_out_urb/write_urb/
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ece9c72acc upstream.
Priority of a merged request is computed by ioprio_best(). If one of the
requests has undefined priority (IOPRIO_CLASS_NONE) and another request
has priority from IOPRIO_CLASS_BE, the function will return the
undefined priority which is wrong. Fix the function to properly return
priority of a request with the defined priority.
Fixes: d58cdfb89c
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9a72384d86 upstream.
When screen objects are enabled, the bpp is assumed to be 32, otherwise
it is set to 16.
v2:
* Use u32 instead of u64 for assumed_bpp.
* Fixed mechanism to check for screen objects
* Limit the back buffer size to VRAM.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
[bwh: Backported to 3.2: drop changes for dev_priv->prim_bb_mem]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 086ba77a6d upstream.
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls. If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.
# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264
true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
true-653 [000] 384.675988: sys_exit: NR 983045 = 0
...
# trace-cmd record -e syscalls:* true
[ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace
[ 17.289590] pgd = 9e71c000
[ 17.289696] [aaaaaace] *pgd=00000000
[ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 17.290169] Modules linked in:
[ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
[ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[ 17.290866] LR is at syscall_trace_enter+0x124/0x184
Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
Commit cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.
Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
Fixes: cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c5bcded11 upstream.
The Tevii S480 outputs 18V on startup for the LNB supply voltage and does not
automatically power down. This blocks other receivers connected
to a satellite channel router (EN50494), since the receivers can not send the
required DiSEqC sequences when the Tevii card is connected to a the same SCR.
This patch switches off the LNB supply voltage on initialization of the frontend.
[mchehab@osg.samsung.com: add a comment about why we're explicitly
turning off voltage at device init]
Signed-off-by: Ulrich Eckhardt <uli@uli-eckhardt.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6050d47adc upstream.
When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).
Coverity-id: 741300
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- We have ext4_handle_dirty_metadata() not
ext4_handle_dirty_{dx,dirent}_node()]
- do_split() returns errors by reference not with ERR_PTR()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 599a9b77ab upstream.
When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().
Coverity-id: 989065
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9378c6768e upstream.
When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.
Coverity-id: 741252
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46238845bd upstream.
When an interface is deleted, an ongoing hardware scan is canceled and
the driver must abort the scan, at the very least reporting completion
while the interface is removed.
However, if it scheduled the work that might only run after everything
is said and done, which leads to cfg80211 warning that the scan isn't
reported as finished yet; this is no fault of the driver, it already
did, but mac80211 hasn't processed it.
To fix this situation, flush the delayed work when the interface being
removed is the one that was executing the scan.
Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2:
- No rcu_access_pointer() needed
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ea5d05b34a upstream.
If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.
Coverity id: 1192175
Coverity id: 1192174
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6d50e60cd2 upstream.
If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.
This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.
Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.
It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 401507d67d upstream.
Commit ff7ee93f47 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.
This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.
bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak
Offlined Pages 32768
kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x46/0x58
create_object+0x266/0x2c0
kmemleak_alloc+0x26/0x50
kmem_cache_alloc+0xd3/0x160
__sigqueue_alloc+0x49/0xd0
__send_signal+0xcb/0x410
send_signal+0x45/0x90
__group_send_sig_info+0x13/0x20
do_notify_parent+0x1bb/0x260
do_exit+0x767/0xa40
do_group_exit+0x44/0xa0
SyS_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff880016900000 (size 524288):
kmemleak: comm "swapper/0", pid 0, jiffies 4294667296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
log_early+0x63/0x77
kmemleak_alloc+0x4b/0x50
init_section_page_cgroup+0x7f/0xf5
page_cgroup_init+0xc5/0xd0
start_kernel+0x333/0x408
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf5/0xfc
Fixes: ff7ee93f47 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1b9b9b1ad upstream.
FSI doesn't support PAUSE.
Remove SNDRV_PCM_INFO_PAUSE flags from snd_pcm_hardware info
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ce9ec37bdd upstream.
When unmapping a range of pages in zap_pte_range, the page being
unmapped is added to an mmu_gather_batch structure for asynchronous
freeing. If we run out of space in the batch structure before the range
has been completely unmapped, then we break out of the loop, force a
TLB flush and free the pages that we have batched so far. If there are
further pages to unmap, then we resume the loop where we left off.
Unfortunately, we forget to update addr when we break out of the loop,
which causes us to truncate the range being invalidated as the end
address is exclusive. When we re-enter the loop at the same address, the
page has already been freed and the pte_present test will fail, meaning
that we do not reconsider the address for invalidation.
This patch fixes the problem by incrementing addr by the PAGE_SIZE
before breaking out of the loop on batch failure.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context; add braces]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 317168d0c7 upstream.
In compat mode, we copy each field of snd_pcm_status struct but don't
touch the reserved fields, and this leaves uninitialized values
there. Meanwhile the native ioctl does zero-clear the whole
structure, so we should follow the same rule in compat mode, too.
Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 94fb823fcb upstream.
If a device's dev_pm_ops::freeze callback fails during the QUIESCE
phase, we don't rollback things correctly calling the thaw and complete
callbacks. This could leave some devices in a suspended state in case of
an error during resuming from hibernation.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66a7cbc303 upstream.
Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so
67809f85d3 ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
disabled NCQ on them. It turns out that NCQ is fine as long as MSI is
not used, so let's turn off MSI and leave NCQ on.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731
Tested-by: <dorin@i51.org>
Tested-by: Imre Kaloz <kaloz@openwrt.org>
Fixes: 67809f85d3 ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30a6b8031f upstream.
free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:
task A | task B
exit_pi_state_list |
pi_state = |
curr->pi_state_list->next |
| futex_requeue(requeue_pi=1)
| // pi_state is the same as
| // the one in task A
| free_pi_state(pi_state)
| list_del_init(&pi_state->list)
| kfree(pi_state)
list_del_init(&pi_state->list) |
Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.
[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
debugging. The latter comes via a seperate patch ]
Signed-off-by: Brian Silverman <bsilver16384@gmail.com>
Cc: austin.linux@gmail.com
Cc: darren@dvhart.com
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1414282837-23092-1-git-send-email-bsilver16384@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6891c4509c upstream.
If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.
Initialize sigev_value with 0 to plug the stack info leak.
Found in the PaX patch, written by the PaX Team.
Fixes: 5a9fa73072 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d32e4dbe7 upstream.
The third parameter of kvm_unpin_pages() when called from
kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
and not the page size.
This error was facilitated with an inconsistent API: kvm_pin_pages() takes
a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
by matching the two.
This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
un-pinning for pages intended to be un-pinned (i.e. memory leak) but
unfortunately potentially aggravated the number of pages we un-pin that
should have stayed pinned. As far as I understand though, the same
practical mitigations apply.
This issue was found during review of Red Hat 6.6 patches to prepare
Ksplice rebootless updates.
Thanks to Vegard for his time on a late Friday evening to help me in
understanding this code.
Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
Cc: stable@vger.kernel.org
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: kvm_pin_pages() also takes a struct kvm *kvm param]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bc19dc375 upstream.
KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application. Let's not kill the guest: WARN
and inject #UD instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aedd153f5b upstream.
Code before the .fixup section needs to have the .insn directive.
This has no side effects on MIPS32/64 but it affects the way microMIPS
loads the address for the return label.
Fixes the following build problem:
mips-linux-gnu-ld: arch/mips/built-in.o: .fixup+0x4a0: Unsupported jump between
ISA modes; consider recompiling with interlinking enabled.
mips-linux-gnu-ld: final link failed: Bad value
Makefile:819: recipe for target 'vmlinux' failed
The fix is similar to 1658f914ff ("MIPS: microMIPS:
Disable LL/SC and fix linker bug.")
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8117/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 51904b0807 upstream.
Unknown operation numbers are caught in nfsd4_decode_compound() which
sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
error causes the main loop in nfsd4_proc_compound() to skip most
processing. But nfsd4_proc_compound also peeks ahead at the next
operation in one case and doesn't take similar precautions there.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 84ce0f0e94 upstream.
When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-scsi@vger.kernel.org
Coverity-id: 1226871
Signed-off-by: Jan Kara <jack@suse.cz>
Fixed up the, now unused, out label.
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b47dcbdc51 upstream.
If the TSC is unusable or disabled, then this patch fixes:
- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.
This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline
This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 42fa425043 upstream.
On virtual environments, apic_read could take a long time. As a
result, under certain conditions the ack pending loop may exit
without any queued irqs left, but after more than one second. A
warning will be printed needlessly in this case.
If the loop is about to exit regardless of max_loops, don't
update it.
Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ rebased and reworded the commit message]
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d0eb862dd upstream.
Add VID/PID for Telit LE910 modem. Interfaces description is almost the
same than LE920, except that the qmi interface is number 2 (instead than
5).
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit edd74ffab1 upstream.
Add new IDs for the Xsens Awinda Station and Awinda Dongle.
While at it, order the definitions by PID and add a logical separation
between devices using Xsens' VID and those using FTDI's VID.
Signed-off-by: Frans Klaver <frans.klaver@xsens.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 35cc83eab0 upstream.
Enable Silicon Labs Ember VID chips to enumerate with the cp210x usb serial
driver. EM358x devices operating with the Ember Z-Net 5.1.2 stack may now
connect to host PCs over a USB serial link.
Signed-off-by: Nathaniel Ting <nathaniel.ting@silabs.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7938db449b upstream.
The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 40d43c4b4c upstream.
The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.
Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.
Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work. Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.
[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" <lwang@suse.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2651cc6974 upstream.
Userspace actually passes single parameter (path name) to the umount
syscall, so new umount just fails. Fix it by requesting old umount
syscall implementation and re-wiring umount to it.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d4c5efdb97 upstream.
zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.
Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]
Fixes kernel bugzilla: 82041
Reported-by: zatimend@hotmail.co.uk
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- extract_buf() needs to use this for the 'extract' array as well
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Part of upstream commit fe8c8a1268 ('crypto: more robust
crypto_memneq'), needed by commit d4c5efdb97 ('random: add and use
memzero_explicit() for clearing data').
commit 9d28eb1244 upstream.
The shrinker uses gfp flags to indicate what kind of operation can the
driver wait for. If __GFP_IO flag is present, the driver can wait for
block I/O operations, if __GFP_FS flag is present, the driver can wait on
operations involving the filesystem.
dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
device that makes calls into the filesystem. If __GFP_IO is present and
__GFP_FS isn't, dm-bufio could still block on filesystem operations if it
runs on a loop block device.
The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
unreproducible) deadlock involving dm-bufio and loop device.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2:
- There's only one shrinker callback
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 923190d32d upstream.
sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount(). This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux: Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element of the inode security structure for call_rcu()
upon an inode_free_security(). But the underlying issue
was already present before that commit as a possible use-after-free
of isec.
Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element. However,
this would merely hide the issue and not truly fix the code.
This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially. Then,
if the inode is dropped subsequently, there will be no further
references to the isec.
Reported-by: Shivnandan Kumar <shivnandan.k@samsung.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f9865f06f7 upstream.
Commit f363e45fd1 ("net/ceph: make ceph_msgr_wq non-reentrant")
effectively removed WQ_MEM_RECLAIM flag from ceph_msgr_wq. This is
wrong - libceph is very much a memory reclaim path, so restore it.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Tested-by: Micha Krause <micha@krausam.de>
Reviewed-by: Sage Weil <sage@redhat.com>
[bwh: Backported to 3.2:
- Keep passing the WQ_NON_REENTRANT flag too
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 95926035b1 upstream.
The emu10k1 voice allocator takes voice_lock spinlock. When there is
no empty stream available, it tries to release a voice used by synth,
and calls get_synth_voice. The callback function,
snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
thus it deadlocks.
The fix is simply removing the voice_lock holds in
snd_emu10k1_synth_get_voice(), as this is always called in the
spinlock context.
Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71458cfc78 upstream.
We're missing include/linux/compiler-gcc5.h which is required now
because gcc branched off to v5 in trunk.
Just copy the relevant bits out of include/linux/compiler-gcc4.h,
no new code is added as of now.
This fixes a build error when using gcc 5.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3ffa6158f0 upstream.
When mapped RX DMA entries are unmapped in an error condition when DMA
is firstly configured in the driver, the number of TX DMA entries was
passed in, which is incorrect
Signed-off-by: Ray Jui <rjui@broadcom.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ff8947fc5 upstream.
Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.
This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:
dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile
leads to:
EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
[bwh: Backported to 3.2:
- ext4_journal_start() doesn't have a type parameter
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aa97240995 upstream.
Unfortunately, ForcePad capability is not actually exported over PS/2, so
we have to resort to DMI checks.
Reported-by: Nicole Faerber <nicole.faerber@kernelconcepts.de>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b8839b8c55 upstream.
The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2. Fix the math such that it works for non-power-of-2 io_min.
This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 24dff96a37 upstream.
we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are. That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared. The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.
It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1. The bug existed
well before that, though, and the same fix used to apply in old
location of file.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: drop changes to netlink_mmap_sendmsg()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c2ca0fcd20 upstream.
This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.
It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 475d0db742 upstream.
total_objects could be 0 and is used as a denom.
While total_objects is a "long", total_objects == 0 unlikely happens for
3.12 and later kernels because 32-bit architectures would not be able to
hold (1 << 32) objects. However, total_objects == 0 may happen for kernels
between 3.1 and 3.11 because total_objects in prune_super() was an "int"
and (e.g.) x86_64 architecture might be able to hold (1 << 32) objects.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 37017ac684 upstream.
The Broadcom OSB4 IDE Controller (vendor and device IDs: 1166:0211)
does not support 64-KB DMA transfers.
Whenever a 64-KB DMA transfer is attempted,
the transfer fails and messages similar to the following
are written to the console log:
[ 2431.851125] sr 0:0:0:0: [sr0] Unhandled sense code
[ 2431.851139] sr 0:0:0:0: [sr0] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2431.851152] sr 0:0:0:0: [sr0] Sense Key : Hardware Error [current]
[ 2431.851166] sr 0:0:0:0: [sr0] Add. Sense: Logical unit communication time-out
[ 2431.851182] sr 0:0:0:0: [sr0] CDB: Read(10): 28 00 00 00 76 f4 00 00 40 00
[ 2431.851210] end_request: I/O error, dev sr0, sector 121808
When the libata and pata_serverworks modules
are recompiled with ATA_DEBUG and ATA_VERBOSE_DEBUG defined in libata.h,
the 64-KB transfer size in the scatter-gather list can be seen
in the console log:
[ 2664.897267] sr 9:0:0:0: [sr0] Send:
[ 2664.897274] 0xf63d85e0
[ 2664.897283] sr 9:0:0:0: [sr0] CDB:
[ 2664.897288] Read(10): 28 00 00 00 7f b4 00 00 40 00
[ 2664.897319] buffer = 0xf6d6fbc0, bufflen = 131072, queuecommand 0xf81b7700
[ 2664.897331] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 00 7f b4 00 00 40
[ 2664.897338] ata_scsi_translate: ENTER
[ 2664.897345] ata_sg_setup: ENTER, ata1
[ 2664.897356] ata_sg_setup: 3 sg elements mapped
[ 2664.897364] ata_bmdma_fill_sg: PRD[0] = (0x66FD2000, 0xE000)
[ 2664.897371] ata_bmdma_fill_sg: PRD[1] = (0x65000000, 0x10000)
------------------------------------------------------> =======
[ 2664.897378] ata_bmdma_fill_sg: PRD[2] = (0x66A10000, 0x2000)
[ 2664.897386] ata1: ata_dev_select: ENTER, device 0, wait 1
[ 2664.897422] ata_sff_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0xFC
[ 2664.897428] ata_sff_tf_load: device 0xA0
[ 2664.897448] ata_sff_exec_command: ata1: cmd 0xA0
[ 2664.897457] ata_scsi_translate: EXIT
[ 2664.897462] leaving scsi_dispatch_cmnd()
[ 2664.897497] Doing sr request, dev = sr0, block = 0
[ 2664.897507] sr0 : reading 64/256 512 byte blocks.
[ 2664.897553] ata_sff_hsm_move: ata1: protocol 7 task_state 1 (dev_stat 0x58)
[ 2664.897560] atapi_send_cdb: send cdb
[ 2666.910058] ata_bmdma_port_intr: ata1: host_stat 0x64
[ 2666.910079] __ata_sff_port_intr: ata1: protocol 7 task_state 3
[ 2666.910093] ata_sff_hsm_move: ata1: protocol 7 task_state 3 (dev_stat 0x51)
[ 2666.910101] ata_sff_hsm_move: ata1: protocol 7 task_state 4 (dev_stat 0x51)
[ 2666.910129] sr 9:0:0:0: [sr0] Done:
[ 2666.910136] 0xf63d85e0 TIMEOUT
lspci shows that the driver used for the Broadcom OSB4 IDE Controller is
pata_serverworks:
00:0f.1 IDE interface: Broadcom OSB4 IDE Controller (prog-if 8e [Master SecP SecO PriP])
Flags: bus master, medium devsel, latency 64
[virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
[virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
I/O ports at 0170 [size=8]
I/O ports at 0374 [size=4]
I/O ports at 1440 [size=16]
Kernel driver in use: pata_serverworks
The pata_serverworks driver supports five distinct device IDs,
one being the OSB4 and the other four belonging to the CSB series.
The CSB series appears to support 64-KB DMA transfers,
as tests on a machine with an SAI2 motherboard
containing a Broadcom CSB5 IDE Controller (vendor and device IDs: 1166:0212)
showed no problems with 64-KB DMA transfers.
This problem was first discovered when attempting to install openSUSE
from a DVD on a machine with an STL2 motherboard.
Using the pata_serverworks module,
older releases of openSUSE will not install at all due to the timeouts.
Releases of openSUSE prior to 11.3 can be installed by disabling
the pata_serverworks module using the brokenmodules boot parameter,
which causes the serverworks module to be used instead.
Recent releases of openSUSE (12.2 and later) include better error recovery and
will install, though very slowly.
On all openSUSE releases, the problem can be recreated
on a machine containing a Broadcom OSB4 IDE Controller
by mounting an install DVD and running a command similar to the following:
find /mnt -type f -print | xargs cat > /dev/null
The patch below corrects the problem.
Similar to the other ATA drivers that do not support 64-KB DMA transfers,
the patch changes the ata_port_operations qc_prep vector to point to a routine
that breaks any 64-KB segment into two 32-KB segments and
changes the scsi_host_template sg_tablesize element to reduce by half
the number of scatter/gather elements allowed.
These two changes affect only the OSB4.
Signed-off-by: Scott Carter <ccscott@funsoft.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f4bb298102 upstream.
If there is a corrupted file system which has directory entries that
point at reserved, metadata inodes, prohibit them from being used by
treating them the same way we treat Boot Loader inodes --- that is,
mark them to be bad inodes. This prohibits them from being opened,
deleted, or modified via chmod, chown, utimes, etc.
In particular, this prevents a corrupted file system which has a
directory entry which points at the journal inode from being deleted
and its blocks released, after which point Much Hilarity Ensues.
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e2bfb088fa upstream.
The boot loader inode (inode #5) should never be visible in the
directory hierarchy, but it's possible if the file system is corrupted
that there will be a directory entry that points at inode #5. In
order to avoid accidentally trashing it, when such a directory inode
is opened, the inode will be marked as a bad inode, so that it's not
possible to modify (or read) the inode from userspace.
Unfortunately, when we unlink this (invalid/illegal) directory entry,
we will put the bad inode on the ophan list, and then when try to
unlink the directory, we don't actually remove the bad inode from the
orphan list before freeing in-memory inode structure. This means the
in-memory orphan list is corrupted, leading to a kernel oops.
In addition, avoid truncating a bad inode in ext4_destroy_inode(),
since truncating the boot loader inode is not a smart thing to do.
Reported-by: Sami Liedes <sami.liedes@iki.fi>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 56ec16cb1e upstream.
If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
deallocate prealloced memory but calls cn_del_callback().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eb76faf53b upstream.
The 'last_accessed' member of the dm_buffer structure was only set when
the the buffer was created. This led to each buffer being discarded
after dm_bufio_max_age time even if it was used recently. In practice
this resulted in all thinp metadata being evicted soon after being read
-- this is particularly problematic for metadata intensive workloads
like multithreaded small random IO.
'last_accessed' is now updated each time the buffer is moved to the head
of the LRU list, so the buffer is now properly discarded if it was not
used in dm_bufio_max_age time.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4dc601bf9 upstream.
hwreg_present() and hwreg_write() temporarily change the VBR register to
another vector table. This table contains a valid bus error handler
only, all other entries point to arbitrary addresses.
If an interrupt comes in while the temporary table is active, the
processor will start executing at such an arbitrary address, and the
kernel will crash.
While most callers run early, before interrupts are enabled, or
explicitly disable interrupts, Finn Thain pointed out that macsonic has
one callsite that doesn't, causing intermittent boot crashes.
There's another unsafe callsite in hilkbd.
Fix this for good by disabling and restoring interrupts inside
hwreg_present() and hwreg_write().
Explicitly disabling interrupts can be removed from the callsites later.
Reported-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90a8020278 upstream.
->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.
However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called
At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.
This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- truncate_setsize() already has an oldsize variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 082f58ac4a upstream.
During temporary resource starvation at lower transport layer, command
is placed on queue full retry path, which expose this problem. The TCM
queue full handling of SCF_TRANSPORT_TASK_SENSE currently sends the same
cmd twice to lower layer. The 1st time led to cmd normal free path.
The 2nd time cause Null pointer access.
This regression bug was originally introduced v3.1-rc code in the
following commit:
commit e057f53308
Author: Christoph Hellwig <hch@infradead.org>
Date: Mon Oct 17 13:56:41 2011 -0400
target: remove the transport_qf_callback se_cmd callback
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1f456b0b9 upstream.
Commit 2f60ea6b8c ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.
The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:
In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.
Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.
An example application:
open, lock, write a file.
sleep for 6 * lease (could be less)
ulock, close.
In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.
This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.
Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8c (NFSv4: The NFSv4.0 client must send RENEW calls...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b789da8a7 upstream.
The function bitcpy_rev has a bug that may result in screen corruption.
The bug happens under these conditions:
* the end of the destination area of a copy operation is aligned on a long
word boundary
* the end of the source area is not aligned on a long word boundary
* we are copying more than one long word
In this case, the variable shift is non-zero and the variable first is
zero. The statements FB_WRITEL(comp(d0, FB_READL(dst), first), dst) reads
the last long word of the destination and writes it back unchanged
(because first is zero). Correctly, we should write the variable d0 to the
last word of the destination in this case.
This patch fixes the bug by introducing and extra test if first is zero.
The patch also removes the references to fb_memmove in the code that is
commented out because fb_memmove was removed from framebuffer subsystem.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f74a289b94 upstream.
The framebuffer code uses the current background color to fill the border
when switching consoles, however, this results in inconsistent behavior.
For example:
- start Midnigh Commander
- the border is black
- switch to another console and switch back
- the border is cyan
- type something into the command line in mc
- the border is cyan
- switch to another console and switch back
- the border is black
- press F9 to go to menu
- the border is black
- switch to another console and switch back
- the border is dark blue
When switching to a console with Midnight Commander, the border is random
color that was left selected by the slang subsystem.
This patch fixes this inconsistency by always using black as the
background color when switching consoles.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit df817ba357 upstream.
The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72cf90124e upstream.
This fix ensures that we never meet an integer overflow while adding
255 while parsing a variable length encoding. It works differently from
commit 206a81c ("lzo: properly check for overruns") because instead of
ensuring that we don't overrun the input, which is tricky to guarantee
due to many assumptions in the code, it simply checks that the cumulated
number of 255 read cannot overflow by bounding this number.
The MAX_255_COUNT is the maximum number of times we can add 255 to a base
count without overflowing an integer. The multiply will overflow when
multiplying 255 by more than MAXINT/255. The sum will overflow earlier
depending on the base count. Since the base count is taken from a u8
and a few bits, it is safe to assume that it will always be lower than
or equal to 2*255, thus we can always prevent any overflow by accepting
two less 255 steps.
This patch also reduces the CPU overhead and actually increases performance
by 1.1% compared to the initial code, while the previous fix costs 3.1%
(measured on x86_64).
The fix needs to be backported to all currently supported stable kernels.
Reported-by: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af958a38a6 upstream.
This reverts commit 206a81c ("lzo: properly check for overruns").
As analysed by Willem Pinckaers, this fix is still incomplete on
certain rare corner cases, and it is easier to restart from the
original code.
Reported-by: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d98a052643 upstream.
Add a complete description of the LZO format as processed by the
decompressor. I have not found a public specification of this format
hence this analysis, which will be used to better understand the code.
Cc: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6822ee34ad upstream.
"raw" is the name of a channel property, but should not be part of the
channel name itself.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
[bwh: Backported to 3.2: using IIO_CHAN() macro to initialise the structures]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01f7feeaf4 upstream.
Two bits control TX power on BBP_R1 register. Correct the mask,
otherwise we clear additional bit on BBP_R1 register, what can have
unknown, possible negative effect.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 173b3afcee upstream.
If rpc.statd is restarted, upcalls to monitor hosts can fail with
ECONNREFUSED. In that case force a lookup of statd's new port and retry the
upcall.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[bwh: Backported to 3.2: not using RPC_TASK_SOFTCONN]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ea75be321 upstream.
vcpu ioctls can hang the calling thread if issued while a vcpu is running.
However, invalid ioctls can happen when userspace tries to probe the kind
of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case,
we know the ioctl is going to be rejected as invalid anyway and we can
fail before trying to take the vcpu mutex.
This patch does not change functionality, it just makes invalid ioctls
fail faster.
Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b41583e729 upstream.
In case of 8 bit mode and DMA usage we end up with every second byte written as
0. We have to respect bits_per_word settings what this patch actually does.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b29ef3546a upstream.
Minimize failures in this function by pre-allocating the buffer
for posting messages. The hypercall for posting the message can fail
for a number of reasons:
1. Transient resource related issues
2. Buffer alignment
3. Buffer cannot span a page boundry
We address issues 2 and 3 by preallocating a per-cpu page for the buffer.
Transient resource related failures are handled by retrying by the callers
of this function.
This patch is based on the investigation
done by Dexuan Cui <decui@microsoft.com>.
I would like to thank Sitsofe Wheeler <sitsofe@yahoo.com>
for reporting the issue and helping in debuggging.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- s/NR_CPUS/MAX_NUM_CPUS/
- Adjust context, indentation
- Also free the page in hv_synic_init() error path]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98d731bb06 upstream.
Eliminate calls to BUG_ON() in vmbus_close_internal().
We have chosen to potentially leak memory, than crash the guest
in case of failures.
In this version of the patch I have addressed comments from
Dan Carpenter (dan.carpenter@oracle.com).
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: function is extern; don't change the return
type to int as callers will ignore the value]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72c6b71c24 upstream.
Eliminate the call to BUG_ON() by waiting for the host to respond. We are
trying to reclaim the ownership of memory that was given to the host and so
we will have to wait until the host responds.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66be653083 upstream.
Eliminate calls to BUG_ON() by properly handling errors. In cases where
rollback is possible, we will return the appropriate error to have the
calling code decide how to rollback state. In the case where we are
transferring ownership of the guest physical pages to the host,
we will wait for the host to respond.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fdeebcc622 upstream.
Posting messages to the host can fail because of transient resource
related failures. Correctly deal with these failures and increase the
number of attempts to post the message before giving up.
In this version of the patch, I have normalized the error code to
Linux error code.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 471b095dfe upstream.
An empty firmware request name will trigger warnings when building
device names. Make sure this is caught earlier and rejected.
The warning was visible via the test_firmware.ko module interface:
echo -ne "\x00" > /sys/devices/virtual/misc/test_firmware/trigger_request
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ddbe1fca0b upstream.
This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.
This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.
This patch could be back-ported to kernels as old as 2.6.39.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 526a4045c6 upstream.
The usb device will autoresume from choose_wakeup() if it is
autosuspended with the wrong wakeup setting, but below errors occur
because usb3503 misc driver will switch to standby mode when suspended.
As add USB_QUIRK_RESET_RESUME, it can stop setting wrong wakeup from
autosuspend_check().
[ 7.734717] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 7.854658] usb 1-3: device descriptor read/64, error -71
[ 8.079657] usb 1-3: device descriptor read/64, error -71
[ 8.294664] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 8.414658] usb 1-3: device descriptor read/64, error -71
[ 8.639657] usb 1-3: device descriptor read/64, error -71
[ 8.854667] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 9.264598] usb 1-3: device not accepting address 3, error -71
[ 9.374655] usb 1-3: reset high-speed USB device number 3 using exynos-ehci
[ 9.784601] usb 1-3: device not accepting address 3, error -71
[ 9.784838] usb usb1-port3: device 1-3 not suspended yet
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3bacc10cd4 upstream.
Fix clamp_align() used in v4l_bound_align_image() to prevent overflow
when passed large value like UINT32_MAX.
In the current implementation:
clamp_align(UINT32_MAX, 8, 8192, 3)
returns 8, because in line:
x = (x + (1 << (align - 1))) & mask;
x overflows to (-1 + 4) & 0x7 = 3, while expected value is 8192.
v4l_bound_align_image() is heavily used in VIDIOC_S_FMT and
VIDIOC_SUBDEV_S_FMT ioctls handlers, and documentation of the latter
explicitly states that:
"The modified format should be as close as possible to the original
request."
-- http://linuxtv.org/downloads/v4l-dvb-apis/vidioc-subdev-g-fmt.html
Thus one would expect, that passing UINT32_MAX as format width and
height will result in setting maximum possible resolution for the
device. Particularly, when the driver doesn't support
VIDIOC_ENUM_FRAMESIZES ioctl, which is common in the codebase.
Fixes changeset: b0d3159be9
Signed-off-by: Maciej Matraszek <m.matraszek@samsung.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89ec3dcf17 upstream.
Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.
The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D. Most interface types defined in the spec do not use
alpha characters, so they won't be affected. For example, 00h, 01h, 10h,
20h, etc. are unaffected.
Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.
[bhelgaas: changelog]
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bfc2d7dfdd upstream.
Added support for Ketra N1 wireless interface, which uses the
Silicon Labs' CP2104 USB to UART bridge with customized PID 8946.
Signed-off-by: Joe Savage <joe.savage@goketra.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 279bf6d390 upstream.
The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9fe373f999 upstream.
The Crocodile chip occasionally comes up with 4k and 8k BAR sizes. Due to
an erratum, setting the SR-IOV page size causes the physical function BARs
to expand to the system page size. Since ppc64 uses 64k pages, when Linux
tries to assign the smaller resource sizes to the now 64k BARs the address
will be truncated and the BARs will overlap.
Force Linux to allocate the resource as a full page, which avoids the
overlap.
[bhelgaas: print expanded resource, too]
Signed-off-by: Douglas Lehr <dllehr@us.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Milton Miller <miltonm@us.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bceee4a97e upstream.
pciehp assumes that dev->subordinate, the struct pci_bus for a bridge's
secondary bus, exists. But we do not create that bus if we run out of bus
numbers during enumeration. This leads to a NULL dereference in
init_slot() (and other places).
Change pciehp_probe() to return -ENODEV when no secondary bus is present.
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0626e7595 upstream.
When loading extended attributes, check each entry's value offset to
make sure it doesn't collide with the entries.
Without this check it is easy to crash the kernel by mounting a
malicious FS containing a file with an EA wherein e_value_offs = 0 and
e_value_size > 0 and then deleting the EA, which corrupts the name
list.
(See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 85560c4a82 upstream.
Suspend could fail for some platforms because
btusb_suspend==> btusb_stop_traffic ==> usb_kill_anchored_urbs.
When btusb_bulk_complete returns before system suspend and resubmits
an URB, the system cannot enter suspend state.
Signed-off-by: Champion Chen <champion_chen@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba29e721eb upstream.
Hu (hujianyang <hujianyang@huawei.com>) discovered an issue in the
'empty_log_bytes()' function, which calculates how many bytes are left in the
log:
"
If 'c->lhead_lnum + 1 == c->ltail_lnum' and 'c->lhead_offs == c->leb_size', 'h'
would equalent to 't' and 'empty_log_bytes()' would return 'c->log_bytes'
instead of 0.
"
At this point it is not clear what would be the consequences of this, and
whether this may lead to any problems, but this patch addresses the issue just
in case.
Tested-by: hujianyang <hujianyang@huawei.com>
Reported-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 052c28073f upstream.
Hu (hujianyang@huawei.com) discovered a race condition which may lead to a
situation when UBIFS is unable to mount the file-system after an unclean
reboot. The problem is theoretical, though.
In UBIFS, we have the log, which basically a set of LEBs in a certain area. The
log has the tail and the head.
Every time user writes data to the file-system, the UBIFS journal grows, and
the log grows as well, because we append new reference nodes to the head of the
log. So the head moves forward all the time, while the log tail stays at the
same position.
At any time, the UBIFS master node points to the tail of the log. When we mount
the file-system, we scan the log, and we always start from its tail, because
this is where the master node points to. The only occasion when the tail of the
log changes is the commit operation.
The commit operation has 2 phases - "commit start" and "commit end". The former
is relatively short, and does not involve much I/O. During this phase we mostly
just build various in-memory lists of the things which have to be written to
the flash media during "commit end" phase.
During the commit start phase, what we do is we "clean" the log. Indeed, the
commit operation will index all the data in the journal, so the entire journal
"disappears", and therefore the data in the log become unneeded. So we just
move the head of the log to the next LEB, and write the CS node there. This LEB
will be the tail of the new log when the commit operation finishes.
When the "commit start" phase finishes, users may write more data to the
file-system, in parallel with the ongoing "commit end" operation. At this point
the log tail was not changed yet, it is the same as it had been before we
started the commit. The log head keeps moving forward, though.
The commit operation now needs to write the new master node, and the new master
node should point to the new log tail. After this the LEBs between the old log
tail and the new log tail can be unmapped and re-used again.
And here is the possible problem. We do 2 operations: (a) We first update the
log tail position in memory (see 'ubifs_log_end_commit()'). (b) And then we
write the master node (see the big lock of code in 'do_commit()').
But nothing prevents the log head from moving forward between (a) and (b), and
the log head may "wrap" now to the old log tail. And when the "wrap" happens,
the contends of the log tail gets erased. Now a power cut happens and we are in
trouble. We end up with the old master node pointing to the old tail, which was
erased. And replay fails because it expects the master node to point to the
correct log tail at all times.
This patch merges the abovementioned (a) and (b) operations by moving the master
node change code to the 'ubifs_log_end_commit()' function, so that it runs with
the log mutex locked, which will prevent the log from being changed benween
operations (a) and (b).
Reported-by: hujianyang <hujianyang@huawei.com>
Tested-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07e19dff63 upstream.
The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only
called on the mount path and commit path. The mount path is sequential and
there is no parallelism, and the commit path is also serialized - there is only
one commit going on at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 56f17dd3fb upstream.
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:
(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.
(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.
(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.
This patch fixes the issue by using the memslot generation number
to validate the mmio cache.
Signed-off-by: David Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-12-14 16:23:41 +00:00
1814 changed files with 22951 additions and 10674 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.