commit 3a93c841c2 upstream.
This patch disables translation(dma-remapping) before its initialization
if it is already enabled.
This is needed for kexec/kdump boot. If dma-remapping is enabled in the
first kernel, it need to be disabled before initializing its page table
during second kernel boot. Wei Hu also reported that this is needed
when second kernel boots with intel_iommu=off.
Basically iommu->gcmd is used to know whether translation is enabled or
disabled, but it is always zero at boot time even when translation is
enabled since iommu->gcmd is initialized without considering such a
case. Therefor this patch synchronizes iommu->gcmd value with global
command register when iommu structure is allocated.
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
[wyj: Backported to 3.4: adjust context]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8142b21550 upstream.
Commit 554086d ("x86_32, entry: Do syscall exit work on badsys
(CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
code, resulting in syscall() not returning proper errors for undefined
syscalls on CPUs supporting the sysenter feature.
The following code:
> int result = syscall(666);
> printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
results in:
> result=666 errno=0 error=Success
Obviously, the syscall return value is the called syscall number, but it
should have been an ENOSYS error. When run under ptrace it behaves
correctly, which makes it hard to debug in the wild:
> result=-1 errno=38 error=Function not implemented
The %eax register is the return value register. For debugging via ptrace
the syscall entry code stores the complete register context on the
stack. The badsys handlers only store the ENOSYS error code in the
ptrace register set and do not set %eax like a regular syscall handler
would. The old resume_userspace call chain contains code that clobbers
%eax and it restores %eax from the ptrace registers afterwards. The same
goes for the ptrace-enabled call chain. When ptrace is not used, the
syscall return value is the passed-in syscall number from the untouched
%eax register.
Use %eax as the return value register in syscall_badsys and
sysenter_badsys, like a real syscall handler does, and have the caller
push the value onto the stack for ptrace access.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.net
Reviewed-and-tested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1871ee134b upstream.
The sata on fsl mpc8315e is broken after the commit 8a4aeec8d2
("libata/ahci: accommodate tag ordered controllers"). The reason is
that the ata controller on this SoC only implement a queue depth of
16. When issuing the commands in tag order, all the commands in tag
16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata
malfunction. It makes no senses to use a 32 queue in software while
the hardware has less queue depth. So consider the queue depth
implemented by the hardware when requesting a command tag.
Fixes: 8a4aeec8d2 ("libata/ahci: accommodate tag ordered controllers")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f88f88f83 upstream.
Commit 248ac0e194 ("mm/vmalloc: remove guard page from between vmap
blocks") had the side effect of making vmap_area.va_end member point to
the next vmap_area.va_start. This was creating an artificial reference
to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.
This patch marks the vmap_area containing pointers explicitly and
reduces the min ref_count to 2 as vm_struct still contains a reference
to the vmalloc'ed object. The kmemleak add_scan_area() function has
been improved to allow a SIZE_MAX argument covering the rest of the
object (for simpler calling sites).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3860c1c5d upstream.
ULONG_MAX is often used to check for integer overflow when calculating
allocation size. While ULONG_MAX happens to work on most systems, there
is no guarantee that `size_t' must be the same size as `long'.
This patch introduces SIZE_MAX, the maximum value of `size_t', to improve
portability and readability for allocation size validation.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Alex Elder <elder@dreamhost.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80834312a4 upstream.
The overflow check for a + n * b should be (n > (ULONG_MAX - a) / b),
rather than (n > ULONG_MAX / b - a).
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 418df63ada upstream.
Commit 455bd4c430 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value. However the memset itself became broken
by that patch for misaligned pointers.
This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.
Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.
While at it, the entry instructions are slightly reworked to help dual
issue pipelines.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 455bd4c430 upstream.
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
For instance in the following function:
void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}
compiled as:
800554d0 <debug_mutex_lock_common>:
800554d0: e92d4008 push {r3, lr}
800554d4: e1a00001 mov r0, r1
800554d8: e3a02010 mov r2, #16 ; 0x10
800554dc: e3a01011 mov r1, #17 ; 0x11
800554e0: eb04426e bl 80165ea0 <memset>
800554e4: e1a03000 mov r3, r0
800554e8: e583000c str r0, [r3, #12]
800554ec: e5830000 str r0, [r3]
800554f0: e5830004 str r0, [r3, #4]
800554f4: e8bd8008 pop {r3, pc}
GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
save r8:
- str lr, [sp, #-4]!
+ stmfd sp!, {r8, lr}
and restore r8 on both exit paths:
- ldmeqfd sp!, {pc} @ Now <64 bytes to go.
+ ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
(...)
tst r2, #16
stmneia ip!, {r1, r3, r8, lr}
- ldr lr, [sp], #4
+ ldmfd sp!, {r8, lr}
Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
save r8:
- stmfd sp!, {r4-r7, lr}
+ stmfd sp!, {r4-r8, lr}
and restore r8 on both exit paths:
bgt 3b
- ldmeqfd sp!, {r4-r7, pc}
+ ldmeqfd sp!, {r4-r8, pc}
(...)
tst r2, #16
stmneia ip!, {r4-r7}
- ldmfd sp!, {r4-r7, lr}
+ ldmfd sp!, {r4-r8, lr}
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0253d634e0 upstream.
Commit 4a705fef98 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.
The test program for the problem is shown below:
$ cat heap.c
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#define HPS 0x200000
int main() {
int i;
char *p = malloc(HPS);
memset(p, '1', HPS);
for (i = 0; i < 5; i++) {
if (!fork()) {
memset(p, '2', HPS);
p = malloc(HPS);
memset(p, '3', HPS);
free(p);
return 0;
}
}
sleep(1);
free(p);
return 0;
}
$ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
Fixes 4a705fef98 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Guillaume Morin <guillaume@morinfr.org>
Suggested-by: Guillaume Morin <guillaume@morinfr.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ec7382036 upstream.
Update the LZO compression test vectors according to the latest compressor
version.
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 9802d21e7a ]
The tot_stats estimator is started only when CONFIG_SYSCTL
is defined. But it is stopped without checking CONFIG_SYSCTL.
Fix the crash by moving ip_vs_stop_estimator into
ip_vs_control_net_cleanup_sysctl.
The change is needed after commit 14e405461e
("IPVS: Add __ip_vs_control_{init,cleanup}_sysctl()") from 2.6.39.
Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c81c8a1eee upstream.
In __ioremap_caller() (the guts of ioremap), we loop over the range of
pfns being remapped and checks each one individually with page_is_ram().
For large ioremaps, this can be very slow. For example, we have a
device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+
seconds -- sometimes long enough to trigger the soft lockup detector!
Internally, page_is_ram() calls walk_system_ram_range() on a single
page. Instead, we can make a single call to walk_system_ram_range()
from __ioremap_caller(), and do our further checks only for any RAM
pages that we find. For the common case of MMIO, this saves an enormous
amount of work, since the range being ioremapped doesn't intersect
system RAM at all.
With this change, ioremap on our 256 GiB BAR takes less than 1 second.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd1232b214 upstream.
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 8bab797c6e upstream.
This is a static checker fix. The "dev" variable is always NULL after
the while statement so we would be dereferencing a NULL pointer here.
Fixes: 819a3eba42 ('[PATCH] applicom: fix error handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 246f2d2ee1 upstream.
It is not safe to use LAR to filter when to go down the espfix path,
because the LDT is per-process (rather than per-thread) and another
thread might change the descriptors behind our back. Fortunately it
is always *safe* (if a bit slow) to go down the espfix path, and a
32-bit LDT stack segment is extremely rare.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 25534eb770 upstream.
These functions are used in some PCI drivers with big-endian
MMIO space.
Admittedly it is almost certain that no one this side of the
Moon would use such a card in an Alpha but it does get us
closer to being able to build allyesconfig or allmodconfig,
and it enables the Debian default generic config to build.
Tested-by: Raúl Porcel <armin76@gentoo.org>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
There is no upstream commit for this, as arch/score/kernel/init_task.c
has been replaced by generic code and <linux/export.h> is included
indirectly by arch/score/mm/init.c.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5fbbf8a1a9 upstream.
Signed-off-by: Lennox Wu <lennox.wu@gmail.com>
[bwh: Backported to 3.2:
- Drop addition of 'select HAVE_GENERIC_HARDIRQS' which was not removed here
- Drop inapplicale change to copy_thread()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a50e4213e7 upstream.
Bugfix for following error messages:
lib/iomap.c: In function 'pci_iomap':
lib/iomap.c:274: error: implicit declaration of function 'ioremap_nocache'
lib/iomap.c:274: warning: return makes pointer from integer without a cast
Also see commit <f1ecc69838a2d7c8a3e1909f637d4083c071777d>
it will hide the ioremap_nocache function for systems with an MMU
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1a366500b upstream.
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8e205f779d upstream.
Commit f00cdc6df7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).
We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.
So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.
This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.
This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f00cdc6df7 upstream.
Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.
It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.
Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3a746f5aa upstream.
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76d095388b upstream.
When we fail to find an matching extent near the requested extent
specification during a left-right distance search in
xfs_alloc_ag_vextent_near, we fail to free the original cursor that
we used to look up the XFS_BTNUM_CNT tree and hence leak it.
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 278f2b3e2a upstream.
The ulog messages leak heap bytes by the means of padding bytes and
incompletely filled string arrays. Fix those by memset(0)'ing the
whole struct before filling it.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dab6cf55f8 upstream.
The PSW mask check of the PTRACE_POKEUSR_AREA command is incorrect.
For the default user_mode=home address space layout the psw_user_bits
variable has the home space address-space-control bits set. But the
PSW_MASK_USER contains PSW_MASK_ASC, the ptrace validity check for the
PSW mask will therefore always fail.
Fixes CVE-2014-3534
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0e576acbc1 upstream.
If CONFIG_NO_HZ=n tick_nohz_get_sleep_length() returns NSEC_PER_SEC/HZ.
If CONFIG_NO_HZ=y and the nohz functionality is disabled via the
command line option "nohz=off" or not enabled due to missing hardware
support, then tick_nohz_get_sleep_length() returns 0. That happens
because ts->sleep_length is never set in that case.
Set it to NSEC_PER_SEC/HZ when the NOHZ mode is inactive.
Reported-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e5eca6d41f upstream.
When running RHEL6 userspace on a current upstream kernel, "ip link"
fails to show VF information.
The reason is a kernel<->userspace API change introduced by commit
88c5b5ce5c ("rtnetlink: Call nlmsg_parse() with correct header length"),
after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
in the netlink request.
iproute2 adjusted for the API change in its commit 63338dca4513
("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").
The problem has been noticed before:
http://marc.info/?l=linux-netdev&m=136692296022182&w=2
(Subject: Re: getting VF link info seems to be broken in 3.9-rc8)
We can do better than tell those with old userspace to upgrade. We can
recognize the old iproute2 in the kernel by checking the netlink message
length. Even when including the IFLA_EXT_MASK attribute, its netlink
message is shorter than struct ifinfomsg.
With this patch "ip link" shows VF information in both old and new
iproute2 versions.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 640d7efe4c ]
*_result[len] is parsed as *(_result[len]) which is not at all what we
want to touch here.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 84a7c0b1db ("dns_resolver: assure that dns_query() result is null-terminated")
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 84a7c0b1db ]
dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a4b70a07ed ]
Nothing cleans up the objects created by
vnet_new(), they are completely leaked.
vnet_exit(), after doing the vio_unregister_driver() to clean
up ports, should call a helper function that iterates over vnet_list
and cleans up those objects. This includes unregister_netdevice()
as well as free_netdev().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Karl Volz <karl.volz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8f2e5ae40e ]
While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().
Both structures are defined by the IETF in RFC6458:
* Section 5.3.2. SCTP Header Information Structure:
The sctp_sndrcvinfo structure is defined below:
struct sctp_sndrcvinfo {
uint16_t sinfo_stream;
uint16_t sinfo_ssn;
uint16_t sinfo_flags;
<-- 2 bytes hole -->
uint32_t sinfo_ppid;
uint32_t sinfo_context;
uint32_t sinfo_timetolive;
uint32_t sinfo_tsn;
uint32_t sinfo_cumtsn;
sctp_assoc_t sinfo_assoc_id;
};
* 6.1.3. SCTP_REMOTE_ERROR:
A remote peer may send an Operation Error message to its peer.
This message indicates a variety of error conditions on an
association. The entire ERROR chunk as it appears on the wire
is included in an SCTP_REMOTE_ERROR event. Please refer to the
SCTP specification [RFC4960] and any extensions for a list of
possible error formats. An SCTP error notification has the
following format:
struct sctp_remote_error {
uint16_t sre_type;
uint16_t sre_flags;
uint32_t sre_length;
uint16_t sre_error;
<-- 2 bytes hole -->
sctp_assoc_t sre_assoc_id;
uint8_t sre_data[];
};
Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.
While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 52ad353a53 ]
The problem was triggered by these steps:
1) create socket, bind and then setsockopt for add mc group.
mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
mreq.imr_interface.s_addr = inet_addr("192.168.1.2");
setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(mreq));
2) drop the mc group for this socket.
mreq.imr_multiaddr.s_addr = inet_addr("255.0.0.37");
mreq.imr_interface.s_addr = inet_addr("0.0.0.0");
setsockopt(sockfd, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
3) and then drop the socket, I found the mc group was still used by the dev:
netstat -g
Interface RefCnt Group
--------------- ------ ---------------------
eth2 1 255.0.0.37
Normally even though the IP_DROP_MEMBERSHIP return error, the mc group still need
to be released for the netdev when drop the socket, but this process was broken when
route default is NULL, the reason is that:
The ip_mc_leave_group() will choose the in_dev by the imr_interface.s_addr, if input addr
is NULL, the default route dev will be chosen, then the ifindex is got from the dev,
then polling the inet->mc_list and return -ENODEV, but if the default route dev is NULL,
the in_dev and ifIndex is both NULL, when polling the inet->mc_list, the mc group will be
released from the mc_list, but the dev didn't dec the refcnt for this mc group, so
when dropping the socket, the mc_list is NULL and the dev still keep this group.
v1->v2: According Hideaki's suggestion, we should align with IPv6 (RFC3493) and BSDs,
so I add the checking for the in_dev before polling the mc_list, make sure when
we remove the mc group, dec the refcnt to the real dev which was using the mc address.
The problem would never happened again.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 916c1689a0 ]
skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2cd0d743b0 ]
If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.
This was visible now because the recently simplified TLP logic in
bef1909ee3 ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.
Consider the following example scenario:
mss: 1000
skb: seq: 0 end_seq: 4000 len: 4000
SACK: start_seq: 3999 end_seq: 4000
The tcp_match_skb_to_sack() code will compute:
in_sack = false
pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
new_len += mss = 4000
Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.
With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.
Fixes: adb92db857 ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bb86cf569b upstream.
When using USB 3.0 pen drive with the [AMD] FCH USB XHCI Controller
[1022:7814], the second hotplugging will experience the USB 3.0 pen
drive is recognized as high-speed device. After bisecting the kernel,
I found the commit number 41e7e056cd
(USB: Allow USB 3.0 ports to be disabled.) causes the bug. After doing
some experiments, the bug can be fixed by avoiding executing the function
hub_usb3_port_disable(). Because the port status with [AMD] FCH USB
XHCI Controlleris [1022:7814] is already in RxDetect
(I tried printing out the port status before setting to Disabled state),
it's reasonable to check the port status before really executing
hub_usb3_port_disable().
Fixes: 41e7e056cd (USB: Allow USB 3.0 ports to be disabled.)
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: use hub device as context for dev_dbg(),
as hub ports are not devices in their own right]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ac66effe7 upstream.
In some cases we fetch the edid in the detect() callback
in order to determine what sort of monitor is connected.
If that happens, don't fetch the edid again in the get_modes()
callback or we will leak the edid.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit de12d6f4b1 upstream.
Temperature limit registers are signed. Limits therefore need
to be clamped to (-128, 127) degrees C and not to (0, 255)
degrees C.
Without this fix, writing a limit of 128 degrees C sets the
actual limit to -128 degrees C.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Axel Lin <axel.lin@ingics.com>
[bwh: Backported to 3.2: driver was using SENSORS_LIMIT(), which we can
replace with clamp_val()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0ab99e773 upstream.
proc_sched_show_task() does:
if (nr_switches)
do_div(avg_atom, nr_switches);
nr_switches is unsigned long and do_div truncates it to 32 bits, which
means it can test non-zero on e.g. x86-64 and be truncated to zero for
division.
Fix the problem by using div64_ul() instead.
As a side effect calculations of avg_atom for big nr_switches are now correct.
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c2853c8df5 upstream.
There is div64_long() to handle the s64/long division, but no mocro do
u64/ul division. It is necessary in some scenarios, so add this
function.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97b8ee8453 upstream.
ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available. Otherwise, the following epoll and
read sequence will eventually hang forever:
1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever
~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
ring_buffer_poll_wait() returns immediately without adding poll_table,
which has poll_table->_qproc pointing to ep_poll_callback(), to its
wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
it has to call ep_poll_callback() because it is not in its wait queue.
Hence, block forever.
Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do. For example, tcp_poll() in tcp.c.
Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
Fixes: 2a2cc8f7c4 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Martin Lau <kafai@fb.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: the poll implementation looks rather different
but does have a conditional return before and after the poll_wait() call;
delete the return before it.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3cf521f7dc upstream.
The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.
As David Miller points out:
"If we wanted this to work, it'd have to look up the tunnel and then
use tunnel->sk, but I wonder how useful that would be"
Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f6be5e6450 upstream.
If there are error flags in the aux transaction return
-EIO rather than -EBUSY. -EIO restarts the whole transaction
while -EBUSY jus retries. Fixes problematic aux transfers.
Bug:
https://bugs.freedesktop.org/show_bug.cgi?id=80684
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: error code is returned directly here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 10f1d5d111 upstream.
There's a race condition between the atomic_dec_and_test(&io->count)
in dec_count() and the waking of the sync_io() thread. If the thread
is spuriously woken immediately after the decrement it may exit,
making the on stack io struct invalid, yet the dec_count could still
be using it.
Fix this race by using a completion in sync_io() and dec_count().
Reported-by: Minfei Huang <huangminfei@ucloud.cn>
Signed-off-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
[bwh: Backported to 3.2: use wait_for_completion() as wait_for_completion_io()
is not available]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a7fbe7e9e upstream.
This patch adds PID 0x0003 to the VID 0x128d (Testo). At least the
Testo 435-4 uses this, likely other gear as well.
Signed-off-by: Bert Vermeulen <bert@biot.com>
Cc: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f6c2dd2010 upstream.
It is customary to clamp limits instead of bailing out with an error
if a configured limit is out of the range supported by the driver.
This simplifies limit configuration, since the user will not typically
know chip and/or driver specific limits.
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 233a01fa9c upstream.
If the number in "user_id=N" or "group_id=N" mount options was larger than
INT_MAX then fuse returned EINVAL.
Fix this to handle all valid uid/gid values.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.2: no user namespace conversion]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 126b9d4365 upstream.
As suggested by checkpatch.pl, use time_before64() instead of direct
comparison of jiffies64 values.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 145e74a4e5 upstream.
Upper limit for write operations to temperature limit registers
was clamped to a fractional value. However, limit registers do
not support fractional values. As a result, upper limits of 127.5
degrees C or higher resulted in a rounded limit of 128 degrees C.
Since limit registers are signed, this was stored as -128 degrees C.
Clamp limits to (-55, +127) degrees C to solve the problem.
Value on writes to auto_temp[12]_min and auto_temp[12]_max were not
clamped at all, but masked. As a result, out-of-range writes resulted
in a more or less arbitrary limit. Clamp those attributes to (0, 127)
degrees C for more predictable results.
Cc: Axel Lin <axel.lin@ingics.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.2:
- Adjust context
- Driver was using SENSORS_LIMIT(), which we can replace with clamp_val()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 75646e758a upstream.
Some machines (eg. Lenovo Z480) ECs are not stable during boot up
and causes battery driver fails to be loaded due to failure of getting
battery information from EC sometimes. After several retries, the
operation will work. This patch is to retry to get battery information 5
times if the first try fails.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75581
Reported-and-tested-by: naszar <naszar@ya.ru>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: acpi_battery_update() doesn't take a second parameter]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0d653412f upstream.
There is a race condition in ec_transaction_completed().
When ec_transaction_completed() is called in the GPE handler, it could
return true because of (ec->curr == NULL). Then the wake_up() invocation
could complete the next command unexpectedly since there is no lock between
the 2 invocations. With the previous cleanup, the IBF=0 waiter race need
not be handled any more. It's now safe to return a flag from
advance_condition() to indicate the requirement of wakeup, the flag is
returned from a locked context.
The ec_transaction_completed() is now only invoked by the ec_poll() where
the ec->curr is ensured to be different from NULL.
After cleaning up, the EVT_SCI=1 check should be moved out of the wakeup
condition so that an EVT_SCI raised with (ec->curr == NULL) can trigger a
QR_SC command.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891
Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931
Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911
Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org>
Reported-by: Barton Xu <tank.xuhan@gmail.com>
Tested-by: Steffen Weber <steffen.weber@gmail.com>
Tested-by: Arthur Chen <axchen@nvidia.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f92fca0060 upstream.
Move the first command byte write into advance_transaction() so that all
EC register accesses that can affect the command processing state machine
can happen in this asynchronous state machine advancement function.
The advance_transaction() function then can be a complete implementation
of an asyncrhonous transaction for a single command so that:
1. The first command byte can be written in the interrupt context;
2. The command completion waiter can also be used to wait the first command
byte's timeout;
3. In BURST mode, the follow-up command bytes can be written in the
interrupt context directly, so that it doesn't need to return to the
task context. Returning to the task context reduces the throughput of
the BURST mode and in the worst cases where the system workload is very
high, this leads to the hardware driven automatic BURST mode exit.
In order not to increase memory consumption, convert 'done' into 'flags'
to contain multiple indications:
1. ACPI_EC_COMMAND_COMPLETE: converting from original 'done' condition,
indicating the completion of the command transaction.
2. ACPI_EC_COMMAND_POLL: indicating the availability of writing the first
command byte. A new command can utilize this flag to compete for the
right of accessing the underlying hardware. There is a follow-up bug
fix that has utilized this new flag.
The 2 flags are important because it also reflects a key concept of IO
programs' design used in the system softwares. Normally an IO program
running in the kernel should first be implemented in the asynchronous way.
And the 2 flags are the most common way to implement its synchronous
operations on top of the asynchronous operations:
1. POLL: This flag can be used to block until the asynchronous operations
can happen.
2. COMPLETE: This flag can be used to block until the asynchronous
operations have completed.
By constructing code cleanly in this way, many difficult problems can be
solved smoothly.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891
Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931
Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911
Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org>
Reported-by: Barton Xu <tank.xuhan@gmail.com>
Tested-by: Steffen Weber <steffen.weber@gmail.com>
Tested-by: Arthur Chen <axchen@nvidia.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- s/ec->lock/ec->curr_lock/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3cd8d2789 upstream.
Currently when advance_transaction() is called in EC interrupt handler,
if there is nothing driver can do with the interrupt, it will be taken
as a false one.
But this is not always true, as there may be a SCI EC interrupt fired
during normal read/write operation, which should not be counted as a
false one. This patch fixes the problem.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66b42b78bc upstream.
The advance_transaction() will be invoked from the IRQ context GPE handler
and the task context ec_poll(). The handling of this function is locked so
that the EC state machine are ensured to be advanced sequentially.
But there is a problem. Before invoking advance_transaction(), EC_SC(R) is
read. Then for advance_transaction(), there could be race condition around
the lock from both contexts. The first one reading the register could fail
this race and when it passes the stale register value to the state machine
advancement code, the hardware condition is totally different from when
the register is read. And the hardware accesses determined from the wrong
hardware status can break the EC state machine. And there could be cases
that the functionalities of the platform firmware are seriously affected.
For example:
1. When 2 EC_DATA(W) writes compete the IBF=0, the 2nd EC_DATA(W) write may
be invalid due to IBF=1 after the 1st EC_DATA(W) write. Then the
hardware will either refuse to respond a next EC_SC(W) write of the next
command or discard the current WR_EC command when it receives a EC_SC(W)
write of the next command.
2. When 1 EC_SC(W) write and 1 EC_DATA(W) write compete the IBF=0, the
EC_DATA(W) write may be invalid due to IBF=1 after the EC_SC(W) write.
The next EC_DATA(R) could never be responded by the hardware. This is
the root cause of the reported issue.
Fix this issue by moving the EC_SC(R) access into the lock so that we can
ensure that the state machine is advanced consistently.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891
Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931
Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911
Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org>
Reported-by: Barton Xu <tank.xuhan@gmail.com>
Tested-by: Steffen Weber <steffen.weber@gmail.com>
Tested-by: Arthur Chen <axchen@nvidia.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- Use PREFIX in log message]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36b15875a7 upstream.
A bug was introduced by commit b76b51ba0c ('ACPI / EC: Add more debug
info and trivial code cleanup') that erroneously caused the struct member
to be accessed before acquiring the required lock. This change fixes
it by ensuring the lock acquisition is done first.
Found by Aaron Durbin <adurbin@chromium.org>
Fixes: b76b51ba0c ('ACPI / EC: Add more debug info and trivial code cleanup')
References: http://crbug.com/319019
Signed-off-by: Puneet Kumar <puneetster@chromium.org>
Reviewed-by: Aaron Durbin <adurbin@chromium.org>
[olof: Commit message reworded a bit]
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b76b51ba0c upstream.
Add more debug info for EC transaction debugging, like the interrupt
status register value, the detail info of a EC transaction.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d28bd840b upstream.
Add ID of the Telewell 4G v2 hardware to option driver to get legacy
serial interface working
Signed-off-by: Bernd Wachter <bernd.wachter@jolla.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b9326057a3 upstream.
Corsair USB Dongles are shipped with Corsair AXi series PSUs.
These are cp210x serial usb devices, so make driver detect these.
I have a program, that can get information from these PSUs.
Tested with 2 different dongles shipped with Corsair AX860i and
AX1200i units.
Signed-off-by: Andras Kovacs <andras@sth.sze.hu>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5dd214248f upstream.
The mount manpage says of the max_batch_time option,
This optimization can be turned off entirely
by setting max_batch_time to 0.
But the code doesn't do that. So fix the code to do
that.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2: option parsing looks different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae0f78de2c upstream.
Make it clear that values printed are times, and that it is error
since last fsck. Also add note about fsck version required.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1035a9e3e9 upstream.
Writing to fanX_div does not clear the cache. As a result, reading
from fanX_div may return the old value for up to two seconds
after writing a new value.
This patch ensures the fan_div cache is updated in set_fan_div().
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e578080ed upstream.
Commit "drm/vmwgfx: correct fb_fix_screeninfo.line_length", while fixing a
vmwgfx fbdev bug, also writes the pitch to a supposedly read-only register:
SVGA_REG_BYTES_PER_LINE, while it should be (and also in fact is) written to
SVGA_REG_PITCHLOCK.
This patch is Cc'd stable because of the unknown effects writing to this
register might have, particularly on older device versions.
v2: Updated log message.
Cc: Christopher Friedt <chrisfriedt@gmail.com>
Tested-by: Christopher Friedt <chrisfriedt@gmail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43d826ca59 upstream.
We should always prefer to use full RTS protection. Using
CTS to self gives a meaningless improvement, but this flow
is much harder for the firmware which is likely to have
issues with it.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[bwh: Backported to 3.2:
- Adjust filename
- Condition for RXON_FLG_SELF_CTS_EN in iwlagn_commit_rxon() was different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b6478231c upstream.
Calling xen_console_resume() in xen_suspend() causes a warning because
it locks irq_mapping_update_lock (a mutex) and this may sleep. If a
userspace process is using the evtchn device then this mutex may be
locked at the point of the stop_machine() call and
xen_console_resume() would then deadlock.
Resuming the console after stop_machine() returns avoids this
deadlock.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 133d4527ea upstream.
When we write to a degraded array which has a bitmap, we
make sure the relevant bit in the bitmap remains set when
the write completes (so a 're-add' can quickly rebuilt a
temporarily-missing device).
If, immediately after such a write starts, we incorporate a spare,
commence recovery, and skip over the region where the write is
happening (because the 'needs recovery' flag isn't set yet),
then that write will not get to the new device.
Once the recovery finishes the new device will be trusted, but will
have incorrect data, leading to possible corruption.
We cannot set the 'needs recovery' flag when we start the write as we
do not know easily if the write will be "degraded" or not. That
depends on details of the particular raid level and particular write
request.
This patch fixes a corruption issue of long standing and so it
suitable for any -stable kernel. It applied correctly to 3.0 at
least and will minor editing to earlier kernels.
Reported-by: Bill <billstuff2001@sbcglobal.net>
Tested-by: Bill <billstuff2001@sbcglobal.net>
Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b292d7a104 upstream.
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
For example, we use external NMI to make system panic to get crash
dump, but in this case, the external NMI is falsely handled do to the
issue.
This commit deals with the issue simply by ignoring CondChgd bit.
Here is explanation in detail.
On x86 NMI watchdog uses performance monitoring feature to
periodically signal NMI each time performance counter gets overflowed.
intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
owner of a given NMI by looking at overflow status bits in
MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
handles the given NMI as its own NMI.
The problem is that the intel_pmu_handle_irq() doesn't distinguish
CondChgd bit from other bits. Unlike the other status bits, CondChgd
bit doesn't represent overflow status for performance counters. Thus,
CondChgd bit cannot be thought of as a mark indicating a given NMI is
NMI watchdog's.
As a result, if CondChgd bit is set, any NMI is falsely handled by the
NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
action is never performed until CondChgd bit is cleared.
I noticed this behavior on systems with Ivy Bridge processors: Intel
Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
in the beginning at boot. Then the CondChgd bit is immediately cleared
by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
0.
On the other hand, on older processors such as Nehalem, Xeon E7540,
CondChgd bit is not set in the beginning at boot.
I'm not sure about exact behavior of CondChgd bit, in particular when
this bit is set. Although I read Intel System Programmer's Manual to
figure out that, the descriptions I found are:
In 18.9.1:
"The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
indicate changes to the state of performancmonitoring hardware"
In Table 35-2 IA-32 Architectural MSRs
63 CondChg: status bits of this register has changed.
These are different from the bahviour I see on the actual system as I
explained above.
At least, I think ignoring CondChgd bit should be enough for NMI
watchdog perspective.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b14bf2d0c0 upstream.
Some buggy JMicron USB-ATA bridges don't know how to translate the FUA
bit in READs or WRITEs. This patch adds an entry in unusual_devs.h
and a blacklist flag to tell the sd driver not to use FUA.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Michael Büsch <m@bues.ch>
Tested-by: Michael Büsch <m@bues.ch>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Use sd_printk() not sd_first_printk()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f35f71244d upstream.
It appears that no one ever run ffs-test on a big-endian machine,
since it used cpu-endianess for fs_count and hs_count fields which
should be in little-endian format. Fix by wrapping the numbers in
cpu_to_le32.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76f47128f9 upstream.
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.
The vfs, on the other hand, wants null-terminated data.
The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.
The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.
But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.
Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.
In the NFSv2/v3 cases this actually appears to be safe:
- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.
In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.
The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.
Reported-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d76744a932 upstream.
https://bugzilla.kernel.org/show_bug.cgi?id=70191https://bugzilla.kernel.org/show_bug.cgi?id=77581
It is observed that sometimes Tx packet is downloaded without
adding driver's txpd header. This results in firmware parsing
garbage data as packet length. Sometimes firmware is unable
to read the packet if length comes out as invalid. This stops
further traffic and timeout occurs.
The root cause is uninitialized fields in tx_info(skb->cb) of
packet used to get garbage values. In this case if
MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd
header was skipped. This patch makes sure that tx_info is
correctly initialized to fix the problem.
Reported-by: Andrew Wiley <wiley.andrew.j@gmail.com>
Reported-by: Linus Gasser <list@markas-al-nour.org>
Reported-by: Michael Hirsch <hirsch@teufel.de>
Tested-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Maithili Hinge <maithili@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 391acf970d upstream.
When runing with the kernel(3.15-rc7+), the follow bug occurs:
[ 9969.258987] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
[ 9969.359906] in_atomic(): 1, irqs_disabled(): 0, pid: 160655, name: python
[ 9969.441175] INFO: lockdep is turned off.
[ 9969.488184] CPU: 26 PID: 160655 Comm: python Tainted: G A 3.15.0-rc7+ #85
[ 9969.581032] Hardware name: FUJITSU-SV PRIMEQUEST 1800E/SB, BIOS PRIMEQUEST 1000 Series BIOS Version 1.39 11/16/2012
[ 9969.706052] ffffffff81a20e60 ffff8803e941fbd0 ffffffff8162f523 ffff8803e941fd18
[ 9969.795323] ffff8803e941fbe0 ffffffff8109995a ffff8803e941fc58 ffffffff81633e6c
[ 9969.884710] ffffffff811ba5dc ffff880405c6b480 ffff88041fdd90a0 0000000000002000
[ 9969.974071] Call Trace:
[ 9970.003403] [<ffffffff8162f523>] dump_stack+0x4d/0x66
[ 9970.065074] [<ffffffff8109995a>] __might_sleep+0xfa/0x130
[ 9970.130743] [<ffffffff81633e6c>] mutex_lock_nested+0x3c/0x4f0
[ 9970.200638] [<ffffffff811ba5dc>] ? kmem_cache_alloc+0x1bc/0x210
[ 9970.272610] [<ffffffff81105807>] cpuset_mems_allowed+0x27/0x140
[ 9970.344584] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.409282] [<ffffffff811b1385>] __mpol_dup+0xe5/0x150
[ 9970.471897] [<ffffffff811b1303>] ? __mpol_dup+0x63/0x150
[ 9970.536585] [<ffffffff81068c86>] ? copy_process.part.23+0x606/0x1d40
[ 9970.613763] [<ffffffff810bf28d>] ? trace_hardirqs_on+0xd/0x10
[ 9970.683660] [<ffffffff810ddddf>] ? monotonic_to_bootbased+0x2f/0x50
[ 9970.759795] [<ffffffff81068cf0>] copy_process.part.23+0x670/0x1d40
[ 9970.834885] [<ffffffff8106a598>] do_fork+0xd8/0x380
[ 9970.894375] [<ffffffff81110e4c>] ? __audit_syscall_entry+0x9c/0xf0
[ 9970.969470] [<ffffffff8106a8c6>] SyS_clone+0x16/0x20
[ 9971.030011] [<ffffffff81642009>] stub_clone+0x69/0x90
[ 9971.091573] [<ffffffff81641c29>] ? system_call_fastpath+0x16/0x1b
The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.
This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():
"When the forker's task_struct is duplicated (which includes
->mems_allowed) and it races with an update to cpuset_being_rebound
in update_tasks_nodemask() then the task's mems_allowed doesn't get
updated. And the child task's mems_allowed can be wrong if the
cpuset's nodemask changes before the child has been added to the
cgroup's tasklist."
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7114aae027 upstream.
Add a memory barrier prior to sending a new command to the VIOS
to ensure the VIOS does not receive stale data in the command buffer.
Also add a memory barrier when processing the CRQ for completed commands.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[bwh: Backported to 3.2: as the iSeries code is still present, these
functions have different names and live in rpa_vscsi.c.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ee755974b upstream.
If a CRQ reset is triggered for some reason while in the middle
of performing VSCSI adapter initialization, we don't want to
call the done function for the initialization MAD commands as
this will only result in two threads attempting initialization
at the same time, resulting in failures.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6236f6d1d upstream.
The system suspend flow as following:
1, Freeze all user processes and kenrel threads.
2, Try to suspend all devices.
2.1, If pci device is in RPM suspended state, then pci driver will try
to resume it to RPM active state in the prepare stage.
2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
workqueue items to resume usb2&usb3 roothub devices.
2.3, Call suspend callbacks of devices.
2.3.1, All suspend callbacks of all hcd's children, including
roothub devices are called.
2.3.2, Finally, hcd_pci_suspend callback is called.
Due to workqueue threads were already frozen in step 1, the workqueue
items can't be scheduled, and the roothub devices can't be resumed in
this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
usb_hcd_resume_root_hub won't be cleared. Finally,
hcd_pci_suspend will return -EBUSY, and system suspend fails.
The reason why this issue doesn't show up very often is due to that
choose_wakeup will be called in step 2.3.1. In step 2.3.1, if
udev->do_remote_wakeup is not equal to device_may_wakeup(&udev->dev), then
udev will resume to RPM active for changing the wakeup settings. This
has been a lucky hit which hides this issue.
For some special xHCI controllers which have no USB2 port, then roothub
will not match hub driver due to probe failed. Then its
do_remote_wakeup will be set to zero, and we won't be as lucky.
xhci driver doesn't need to resume roothub devices everytime like in
the above case. It's only needed when there are pending event TRBs.
This patch should be back-ported to kernels as old as 3.2, that
contains the commit f69e3120df
"USB: XHCI: resume root hubs when the controller resumes"
Signed-off-by: Wang, Yu <yu.y.wang@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
[use readl() instead of removed xhci_readl(), reword commit message -Mathias]
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff8cbf250b upstream.
When xHCI PCI host is suspended, if do_wakeup is false in xhci_pci_suspend,
xhci_bus_suspend needs to clear all root port wake on bits. Otherwise some Intel
platforms may get a spurious wakeup, even if PCI PME# is disabled.
This patch should be back-ported to kernels as old as 2.6.37, that
contains the commit 9777e3ce90
"USB: xHCI: bus power management implementation".
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3213b15138 upstream.
The transfer burst count (TBC) field in xhci 1.0 hosts should be set
to the number of bursts needed to transfer all packets in a isoc TD.
Supported values are 0-2 (1 to 3 bursts per service interval).
Formula for TBC calculation is given in xhci spec section 4.11.2.3:
TBC = roundup( Transfer Descriptor Packet Count / Max Burst Size +1 ) - 1
This patch should be applied to stable kernels since 3.0 that contain
the commit 5cd43e33b9
"xhci 1.0: Set transfer burst count field."
Suggested-by: ShiChun Ma <masc2008@qq.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0ebef36e9 upstream.
Adding a couple of Olivetti modems and blacklisting the net
function on a couple which are already supported.
Reported-by: Lars Melin <larsm17@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aea1ae8760 upstream.
Fix NULL-pointer dereference when probing an interface with no
endpoints.
These devices have two bulk endpoints per interface, but this avoids
crashing the kernel if a user forces a non-FTDI device to be probed.
Note that the iterator variable was made unsigned in order to avoid
a maybe-uninitialized compiler warning for ep_desc after the loop.
Fixes: 895f28badc ("USB: ftdi_sio: fix hi-speed device packet size
calculation")
Reported-by: Mike Remski <mremski@mutualink.net>
Tested-by: Mike Remski <mremski@mutualink.net>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f0688c8b81 upstream.
If the descriptors do not need any strings and user space sends empty
set of strings, the ffs->stringtabs field remains NULL. Thus
*ffs->stringtabs in functionfs_bind leads to a NULL pointer
dereferenece.
The bug was introduced by commit [fd7c9a007f: “use usb_string_ids_n()”].
While at it, remove double initialisation of lang local variable in
that function.
ffs->strings_count does not need to be checked in any way since in
the above scenario it will remain zero and usb_string_ids_n() is
a no-operation when colled with 0 argument.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cb060a91c upstream.
KVM does not really do much with the PAT, so this went unnoticed for a
long time. It is exposed however if you try to do rdmsr on the PAT
register.
Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 682367c494 upstream.
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c021f241f4 upstream.
Fix a parser-bug in the omap2 muxing code where muxtable-entries will be
wrongly selected if the requested muxname is a *prefix* of their
m0-entry and they have a matching mN-entry. Fix by additionally checking
that the length of the m0_entry is equal.
For example muxing of "dss_data2.dss_data2" on omap32xx will fail
because the prefix "dss_data2" will match the mux-entries "dss_data2" as
well as "dss_data20", with the suffix "dss_data2" matching m0 (for
dss_data2) and m4 (for dss_data20). Thus both are recognized as signal
path candidates:
Relevant muxentries from mux34xx.c:
_OMAP3_MUXENTRY(DSS_DATA20, 90,
"dss_data20", NULL, "mcspi3_somi", "dss_data2",
"gpio_90", NULL, NULL, "safe_mode"),
_OMAP3_MUXENTRY(DSS_DATA2, 72,
"dss_data2", NULL, NULL, NULL,
"gpio_72", NULL, NULL, "safe_mode"),
This will result in a failure to mux the pin at all:
_omap_mux_get_by_name: Multiple signal paths (2) for dss_data2.dss_data2
Patch should apply to linus' latest master down to rather old linux-2.6
trees.
Signed-off-by: David R. Piegdon <lkml@p23q.org>
[tony@atomide.com: updated description to include full description]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit caa5344994, which
was commit fe6cc55f3a upstream. In 3.2,
the transport header length is not calculated in the forwarding path,
so skb_gso_network_seglen() returns an incorrect result. We also have
problems due to the local_df flag not being set correctly.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This reverts commit 59d9f389df, which
was commit ca6c5d4ad2 upstream. It is a
valid fix, but depends on sk_buff::local_df being set in all the right
cases, which it wasn't in 3.2. We need to defer it unless and until
the other fixes are also backported to 3.2.y.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1fd819ecb9 upstream.
skb_segment copies frags around, so we need
to copy them carefully to avoid accessing
user memory after reporting completion to userspace
through a callback.
skb_segment doesn't normally happen on datapath:
TSO needs to be disabled - so disabling zero copy
in this case does not look like a big deal.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2. As skb_segment() only supports page-frags *or* a
frag list, there is no need for the additional frag_skb pointer or the
preparatory renaming.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a353e0ce0f upstream.
Many places do
if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b9cd18de4d upstream.
The 'sysret' fastpath does not correctly restore even all regular
registers, much less any segment registers or reflags values. That is
very much part of why it's faster than 'iret'.
Normally that isn't a problem, because the normal ptrace() interface
catches the process using the signal handler infrastructure, which
always returns with an iret.
However, some paths can get caught using ptrace_event() instead of the
signal path, and for those we need to make sure that we aren't going to
return to user space using 'sysret'. Otherwise the modifications that
may have been done to the register set by the tracer wouldn't
necessarily take effect.
Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
arch_ptrace_stop_needed() which is invoked from ptrace_stop().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8fad87bca7 upstream.
When we configure CONFIG_ARM_LPAE=y, pfn << PAGE_SHIFT will
overflow if pfn >= 0x100000 in copy_oldmem_page.
So use __pfn_to_phys for converting.
Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Part of commit ea8ea460c9 upstream.
This missing IOTLB flush was added as a minor, inconsequential bug-fix
in commit ea8ea460c ("iommu/vt-d: Clean up and fix page table clear/free
behaviour") in 3.15. It wasn't originally intended for -stable but a
couple of users have reported issues which turn out to be fixed by
adding the missing flush.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- Use &dmar_domain->iommu_bmp, as it is a single word not an array]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11f8a7b31f upstream.
The assumption that sizeof(long) >= sizeof(resource_size_t) can lead to
truncation of the PCI resource address, meaning this driver didn't work
on 32-bit systems with 64-bit PCI adressing ranges.
Signed-off-by: Ben Collins <ben.c@servergy.com>
Acked-by: Sumit Saxena <sumit.saxena@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d555a2abf3 upstream.
We unconditionally execute scsi_eh_get_sense() to make sure all failed
commands that should have sense attached, do. However, the routine forgets
that some commands, because of the way they fail, will not have any sense code
... we should not bother them with a REQUEST_SENSE command. Fix this by
testing to see if we actually got a CHECK_CONDITION return and skip asking for
sense if we don't.
Tested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Part of commit 4442dc8a92 upstream.
This patch changes rd_allocate_sgl_table() to explicitly clear
ramdisk_mcp backend memory pages by passing __GFP_ZERO into
alloc_pages().
This addresses a potential security issue where reading from a
ramdisk_mcp could return sensitive information, and follows what
>= v3.15 does to explicitly clear ramdisk_mcp memory at backend
device initialization time.
Reported-by: Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt>
Cc: Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit befdf8978a ]
This patch wrap up a helper function __mlx4_remove_one() which does the tear
down function but preserve the drv_data. Functions like
mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out
releasing drvdata.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ No upstream commit, this is a cherry picked backport enabler. ]
That way we can check flags later on, when we've finished with the
pci_device_id structure.
This is a backport.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d3217b15a1 ]
Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.
Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 39c36094d7 ]
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.
06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
It appears I introduced this bug in linux-3.1.
inet_getid() must return the old value of peer->ip_id_count,
not the new one.
Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.
Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f98f89a010 ]
Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.
This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit bfc5184b69 ]
Any process is able to send netlink messages with leftover bytes.
Make the warning rate-limited to prevent too much log spam.
The warning is supposed to help find userspace bugs, so print the
triggering command name to implicate the buggy program.
[v2: Use pr_warn_ratelimited instead of printk_ratelimited.]
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Add #include of <linux/sched.h> for definition of struct task_struct,
as in 3.2 it doesn't get included indirectly on all architectures. Thanks
to Guenter Roeck for debugging this.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46ce0fe97a upstream.
When removing a (sibling) event we do:
raw_spin_lock_irq(&ctx->lock);
perf_group_detach(event);
raw_spin_unlock_irq(&ctx->lock);
<hole>
perf_remove_from_context(event);
raw_spin_lock_irq(&ctx->lock);
...
raw_spin_unlock_irq(&ctx->lock);
Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.
So, if during <hole> we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.
The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.
Hereafter we can proceed to free the event; while still programmed!
Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.
The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.
Most-likely-Fixes: e03a9a55b4 ("perf: Change close() semantics for group events")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Vince Weaver <vincent.weaver@maine.edu>
Much-staring-at-traces-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: drop change in perf_pmu_migrate_context()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f2495e228f upstream.
In the highly unusual case where two threads are running concurrently through
the scanning code scanning the same target, we run into the situation where
one may allocate the target while the other is still using it. In this case,
because the reap checks for STARGET_CREATED and kills the target without
reference counting, the second thread will do the wrong thing on reap.
Fix this by reference counting even creates and doing the STARGET_CREATED
check in the final put.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e63ed0d7a9 upstream.
This patch eliminates the reap_ref and replaces it with a proper kref.
On last put of this kref, the target is removed from visibility in
sysfs. The final call to scsi_target_reap() for the device is done from
__scsi_remove_device() and only if the device was made visible. This
ensures that the target disappears as soon as the last device is gone
rather than waiting until final release of the device (which is often
too long).
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b485462aca upstream.
Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.
Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 83ff42fcce upstream.
This patch fixes a left-over se_lun->lun_sep pointer OOPs when one
of the /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT/lun/$LUN/alua*
attributes is accessed after the $DEVICE symlink has been removed.
To address this bug, go ahead and clear se_lun->lun_sep memory in
core_dev_unexport(), so that the existing checks for show/store
ALUA attributes in target_core_fabric_configfs.c work as expected.
Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ab6c15bc66 upstream.
Previously, the lower limit for the MIPS SC initialization loop was
set incorrectly allowing one extra loop leading to writes
beyond the MSC ioremap'd space. More precisely, the value of the 'imp'
in the last loop increased beyond the msc_irqmap_t boundaries and
as a result of which, the 'n' variable was loaded with an incorrect
value. This value was used later on to calculate the offset in the
MSC01_IC_SUP which led to random crashes like the following one:
CPU 0 Unable to handle kernel paging request at virtual address e75c0200,
epc == 8058dba4, ra == 8058db90
[...]
Call Trace:
[<8058dba4>] init_msc_irqs+0x104/0x154
[<8058b5bc>] arch_init_irq+0xd8/0x154
[<805897b0>] start_kernel+0x220/0x36c
Kernel panic - not syncing: Attempted to kill the idle task!
This patch fixes the problem
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7118/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 91ad11d7cc upstream.
On MIPS calls to _mcount in modules generate 2 instructions to load
the _mcount address (and therefore 2 relocations). The mcount_loc
table should only reference the first of these, so the second is
filtered out by checking the relocation offset and ignoring ones that
immediately follow the previous one seen.
However if a module has an _mcount call at offset 0, the second
relocation would not be filtered out due to old_r_offset == 0
being taken to mean that the current relocation is the first one
seen, and both would end up in the mcount_loc table.
This results in ftrace_make_nop() patching both (adjacent)
instructions to branches over the _mcount call sequence like so:
0xffffffffc08a8000: 04 00 00 10 b 0xffffffffc08a8014
0xffffffffc08a8004: 04 00 00 10 b 0xffffffffc08a8018
0xffffffffc08a8008: 2d 08 e0 03 move at,ra
...
The second branch is in the delay slot of the first, which is
defined to be unpredictable - on the platform on which this bug was
encountered, it triggers a reserved instruction exception.
Fix by initializing old_r_offset to ~0 and using that instead of 0
to determine whether the current relocation is the first seen.
Signed-off-by: Alex Smith <alex.smith@imgtec.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7098/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1539fb9bd4 upstream.
If user uses wrong ioctl command with _IOC_NONE and argument size
greater than 0, it can cause NULL pointer access from memset of line
463. If _IOC_NONE, don't memset to 0 for kdata.
Signed-off-by: Zhaowei Yuan <zhaowei.yuan@samsung.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d05f0cdcbe upstream.
In v2.6.34 commit 9d8cebd4bc ("mm: fix mbind vma merge problem")
introduced vma merging to mbind(), but it should have also changed the
convention of passing start vma from queue_pages_range() (formerly
check_range()) to new_vma_page(): vma merging may have already freed
that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
worse crashes.
Fixes: 9d8cebd4bc ("mm: fix mbind vma merge problem")
Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Keep the same arguments to migrate_pages() except for private=start]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 082708072a upstream.
Revert commit 0def08e3ac because check_range can't fail in
migrate_to_node with considering current usecases.
Quote from Johannes
: I think it makes sense to revert. Not because of the semantics, but I
: just don't see how check_range() could even fail for this callsite:
:
: 1. we pass mm->mmap->vm_start in there, so we should not fail due to
: find_vma()
:
: 2. we pass MPOL_MF_DISCONTIG_OK, so the discontig checks do not apply
: and so can not fail
:
: 3. we pass MPOL_MF_MOVE | MPOL_MF_MOVE_ALL, the page table loops will
: continue until addr == end, so we never fail with -EIO
And I added a new VM_BUG_ON for checking migrate_to_node's future usecase
which might pass to MPOL_MF_STRICT.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4a705fef98 upstream.
There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 206a81c184 upstream.
The lzo decompressor can, if given some really crazy data, possibly
overrun some variable types. Modify the checking logic to properly
detect overruns before they happen.
Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Tested-by: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b975bd3f9 upstream.
This commit updates the kernel LZO code to the current upsteam version
which features a significant speed improvement - benchmarking the Calgary
and Silesia test corpora typically shows a doubled performance in
both compression and decompression on modern i386/x86_64/powerpc machines.
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6bec26cea upstream.
Rename the source file to match the function name and thereby
also make room for a possible future even slightly faster
"non-safe" decompressor version.
Signed-off-by: Markus F.X.J. Oberhumer <markus@oberhumer.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4af4206be2 upstream.
syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.
Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.
Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com
Fixes: a871bd33a6 "tracing: Add syscall tracepoints"
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fc68eb122 upstream.
Support for firmware rev 508+ was added years ago, but we never noticed
it reports channel in a different way for G-PHY devices. Instead of
offset from 2400 MHz it simply passes channel id (AKA hw_value).
So far it was (most probably) affecting monitor mode users only, but
the following recent commit made it noticeable for quite everybody:
commit 3afc2167f6
Author: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date: Tue Mar 4 16:50:13 2014 +0200
cfg80211/mac80211: ignore signal if the frame was heard on wrong channel
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba15a58b17 upstream.
From the Bluetooth Core Specification 4.1 page 1958:
"if both devices have set the Authentication_Requirements parameter to
one of the MITM Protection Not Required options, authentication stage 1
shall function as if both devices set their IO capabilities to
DisplayOnly (e.g., Numeric comparison with automatic confirmation on
both devices)"
So far our implementation has done user confirmation for all just-works
cases regardless of the MITM requirements, however following the
specification to the word means that we should not be doing confirmation
when neither side has the MITM flag set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.2: s/conn->flags/conn->pend/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e694788d73 upstream.
The conn->link_key variable tracks the type of link key in use. It is
set whenever we respond to a link key request as well as when we get a
link key notification event.
These two events do not however always guarantee that encryption is
enabled: getting a link key request and responding to it may only mean
that the remote side has requested authentication but not encryption. On
the other hand, the encrypt change event is a certain guarantee that
encryption is enabled. The real encryption state is already tracked in
the conn->link_mode variable through the HCI_LM_ENCRYPT bit.
This patch fixes a check for encryption in the hci_conn_auth function to
use the proper conn->link_mode value and thereby eliminates the chance
of a false positive result.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 883a1d49f0 upstream.
The ALSA control code expects that the range of assigned indices to a control is
continuous and does not overflow. Currently there are no checks to enforce this.
If a control with a overflowing index range is created that control becomes
effectively inaccessible and unremovable since snd_ctl_find_id() will not be
able to find it. This patch adds a check that makes sure that controls with a
overflowing index range can not be created.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ac902c112d upstream.
Each control gets automatically assigned its numids when the control is created.
The allocation is done by incrementing the numid by the amount of allocated
numids per allocation. This means that excessive creation and destruction of
controls (e.g. via SNDRV_CTL_IOCTL_ELEM_ADD/REMOVE) can cause the id to
eventually overflow. Currently when this happens for the control that caused the
overflow kctl->id.numid + kctl->count will also over flow causing it to be
smaller than kctl->id.numid. Most of the code assumes that this is something
that can not happen, so we need to make sure that it won't happen
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd9f26e4ec upstream.
A control that is visible on the card->controls list can be freed at any time.
This means we must not access any of its memory while not holding the
controls_rw_lock. Otherwise we risk a use after free access.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 82262a4662 upstream.
There are two issues with the current implementation for replacing user
controls. The first is that the code does not check if the control is actually a
user control and neither does it check if the control is owned by the process
that tries to remove it. That allows userspace applications to remove arbitrary
controls, which can cause a user after free if a for example a driver does not
expect a control to be removed from under its feed.
The second issue is that on one hand when a control is replaced the
user_ctl_count limit is not checked and on the other hand the user_ctl_count is
increased (even though the number of user controls does not change). This allows
userspace, once the user_ctl_count limit as been reached, to repeatedly replace
a control until user_ctl_count overflows. Once that happens new controls can be
added effectively bypassing the user_ctl_count limit.
Both issues can be fixed by instead of open-coding the removal of the control
that is to be replaced to use snd_ctl_remove_user_ctl(). This function does
proper permission checks as well as decrements user_ctl_count after the control
has been removed.
Note that by using snd_ctl_remove_user_ctl() the check which returns -EBUSY at
beginning of the function if the control already exists is removed. This is not
a problem though since the check is quite useless, because the lock that is
protecting the control list is released between the check and before adding the
new control to the list, which means that it is possible that a different
control with the same settings is added to the list after the check. Luckily
there is another check that is done while holding the lock in snd_ctl_add(), so
we'll rely on that to make sure that the same control is not added twice.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 07f4d9d74a upstream.
The user-control put and get handlers as well as the tlv do not protect against
concurrent access from multiple threads. Since the state of the control is not
updated atomically it is possible that either two write operations or a write
and a read operation race against each other. Both can lead to arbitrary memory
disclosure. This patch introduces a new lock that protects user-controls from
concurrent access. Since applications typically access controls sequentially
than in parallel a single lock per card should be fine.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0a50e92bd upstream.
Leandro Liptak reports that his HASEE E200 computer hangs when we ask
the BIOS to hand over control of the EHCI host controller. This
definitely sounds like a bug in the BIOS, but at the moment there is
no way to fix it.
This patch works around the problem by avoiding the handoff whenever
the motherboard and BIOS version match those of Leandro's computer.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Leandro Liptak <leandroliptak@gmail.com>
Tested-by: Leandro Liptak <leandroliptak@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27e35715df upstream.
When the rtmutex fast path is enabled the slow unlock function can
create the following situation:
spin_lock(foo->m->wait_lock);
foo->m->owner = NULL;
rt_mutex_lock(foo->m); <-- fast path
free = atomic_dec_and_test(foo->refcnt);
rt_mutex_unlock(foo->m); <-- fast path
if (free)
kfree(foo);
spin_unlock(foo->m->wait_lock); <--- Use after free.
Plug the race by changing the slow unlock to the following scheme:
while (!rt_mutex_has_waiters(m)) {
/* Clear the waiters bit in m->owner */
clear_rt_mutex_waiters(m);
owner = rt_mutex_owner(m);
spin_unlock(m->wait_lock);
if (cmpxchg(m->owner, owner, 0) == owner)
return;
spin_lock(m->wait_lock);
}
So in case of a new waiter incoming while the owner tries the slow
path unlock we have two situations:
unlock(wait_lock);
lock(wait_lock);
cmpxchg(p, owner, 0) == owner
mark_rt_mutex_waiters(lock);
acquire(lock);
Or:
unlock(wait_lock);
lock(wait_lock);
mark_rt_mutex_waiters(lock);
cmpxchg(p, owner, 0) != owner
enqueue_waiter();
unlock(wait_lock);
lock(wait_lock);
wakeup_next waiter();
unlock(wait_lock);
lock(wait_lock);
acquire(lock);
If the fast path is disabled, then the simple
m->owner = NULL;
unlock(m->wait_lock);
is sufficient as all access to m->owner is serialized via
m->wait_lock;
Also document and clarify the wakeup_next_waiter function as suggested
by Oleg Nesterov.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8208498438 upstream.
When we walk the lock chain, we drop all locks after each step. So the
lock chain can change under us before we reacquire the locks. That's
harmless in principle as we just follow the wrong lock path. But it
can lead to a false positive in the dead lock detection logic:
T0 holds L0
T0 blocks on L1 held by T1
T1 blocks on L2 held by T2
T2 blocks on L3 held by T3
T4 blocks on L4 held by T4
Now we walk the chain
lock T1 -> lock L2 -> adjust L2 -> unlock T1 ->
lock T2 -> adjust T2 -> drop locks
T2 times out and blocks on L0
Now we continue:
lock T2 -> lock L0 -> deadlock detected, but it's not a deadlock at all.
Brad tried to work around that in the deadlock detection logic itself,
but the more I looked at it the less I liked it, because it's crystal
ball magic after the fact.
We actually can detect a chain change very simple:
lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
next_lock = T2->pi_blocked_on->lock;
drop locks
T2 times out and blocks on L0
Now we continue:
lock T2 ->
if (next_lock != T2->pi_blocked_on->lock)
return;
So if we detect that T2 is now blocked on a different lock we stop the
chain walk. That's also correct in the following scenario:
lock T1 -> lock L2 -> adjust L2 -> unlock T1 -> lock T2 -> adjust T2 ->
next_lock = T2->pi_blocked_on->lock;
drop locks
T3 times out and drops L3
T2 acquires L3 and blocks on L4 now
Now we continue:
lock T2 ->
if (next_lock != T2->pi_blocked_on->lock)
return;
We don't have to follow up the chain at that point, because T2
propagated our priority up to T4 already.
[ Folded a cleanup patch from peterz ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Brad Mouring <bmouring@ni.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140605152801.930031935@linutronix.de
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3d5c9340d1 upstream.
Even in the case when deadlock detection is not requested by the
caller, we can detect deadlocks. Right now the code stops the lock
chain walk and keeps the waiter enqueued, even on itself. Silly not to
yell when such a scenario is detected and to keep the waiter enqueued.
Return -EDEADLK unconditionally and handle it at the call sites.
The futex calls return -EDEADLK. The non futex ones dequeue the
waiter, throw a warning and put the task into a schedule loop.
Tagged for stable as it makes the code more robust.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brad Mouring <bmouring@ni.com>
Link: http://lkml.kernel.org/r/20140605152801.836501969@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 624483f3ea upstream.
While working address sanitizer for kernel I've discovered
use-after-free bug in __put_anon_vma.
For the last anon_vma, anon_vma->root freed before child anon_vma.
Later in anon_vma_free(anon_vma) we are referencing to already freed
anon_vma->root to check rwsem.
This fixes it by freeing the child anon_vma before freeing
anon_vma->root.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fb1c9a4f2 upstream.
Calculating the 'security.evm' HMAC value requires access to the
EVM encrypted key. Only the kernel should have access to it. This
patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
from setting/modifying the 'security.evm' HMAC value directly.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 59a53afe70 upstream.
OPAL will mark a CPU that is guarded as "bad" in the status property of the CPU
node.
Unfortunatley Linux doesn't check this property and will put the bad CPU in the
present map. This has caused hangs on booting when we try to unsplit the core.
This patch checks the CPU is avaliable via this status property before putting
it in the present map.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23afeb613e upstream.
On some AR934x based systems, where the frequency of
the AHB bus is relatively high, the built-in watchdog
causes a spurious restart when it gets enabled.
The possible cause of these restarts is that the timeout
value written into the TIMER register does not reaches
the hardware in time.
Add an explicit delay into the ath79_wdt_enable function
to avoid the spurious restarts.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3c5493119 upstream.
Fixes an easy DoS and possible information disclosure.
This does nothing about the broken state of x32 auditing.
eparis: If the admin has enabled auditd and has specifically loaded
audit rules. This bug has been around since before git. Wow...
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: audit_filter_inode_name() is not a separate
function but part of audit_filter_inodes()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0986c1a55c upstream.
When we set the valid bit on invalid GART entries they are
loaded into the TLB when an adjacent entry is loaded. This
poisons the TLB with invalid entries which are sometimes
not correctly removed on TLB flush.
For stable inclusion the patch probably needs to be modified a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: R600_PTE_GART is not defined and we list all the
flags indvidually]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7d78874273 upstream.
We need to NULL the cached_state after freeing it, otherwise
we might free it again if find_delalloc_range doesn't find anything.
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d49cb7aeeb upstream.
commit 421e08c41f fixed the reported min/max for the X and Y axis,
but unfortunately, it broke the resolution of those same axis.
On the t540p, the resolution is the same regarding X and Y. It is not
a problem for xf86-input-synaptics because this driver is only interested
in the ratio between X and Y.
Unfortunately, xf86-input-cmt uses directly the resolution, and having a
null resolution leads to some divide by 0 errors, which are translated by
-infinity in the resulting coordinates.
Reported-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: I didn't apply the PNP ID changes, so the
code being moved looks different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fb4f8f568a upstream.
The touchpad on the GIGABYTE U2442 not only stops communicating when we try
to set bit 3 (enable real hardware resolution) of reg_10, but on some BIOS
versions also when we set bit 1 (enable two finger mode auto correct).
I've asked the original reporter of:
https://bugzilla.kernel.org/show_bug.cgi?id=61151
To check that not setting bit 1 does not lead to any adverse effects on his
model / BIOS revision, and it does not, so this commit fixes the touchpad
not working on these versions by simply never setting bit 1 for laptop
models with the no_hw_res quirk.
Reported-and-tested-by: James Lademann <jwlademann@gmail.com>
Tested-by: Philipp Wolfer <ph.wolfer@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cd9e83e275 upstream.
At least the Dell Vostro 5470 elantech *clickpad* reports right button
clicks when clicked in the right bottom area:
https://bugzilla.redhat.com/show_bug.cgi?id=1103528
This is different from how (elantech) clickpads normally operate, normally
no matter where the user clicks on the pad the pad always reports a left
button event, since there is only 1 hardware button beneath the path.
It looks like Dell has put 2 buttons under the pad, one under each bottom
corner, causing this.
Since this however still clearly is a real clickpad hardware-wise, we still
want to report it as such to userspace, so that things like finger movement
in the bottom area can be properly ignored as it should be on clickpads.
So deal with this weirdness by simply mapping a right click to a left click
on elantech clickpads. As an added advantage this is something which we can
simply do on all elantech clickpads, so no need to add special quirks for
this weird model.
Reported-and-tested-by: Elder Marco <eldermarco@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3afb69cb55 upstream.
idr_replace() open-codes the logic to calculate the maximum valid ID
given the height of the idr tree; unfortunately, the open-coded logic
doesn't account for the fact that the top layer may have unused slots
and over-shifts the limit to zero when the tree is at its maximum
height.
The following test code shows it fails to replace the value for
id=((1<<27)+42):
static void test5(void)
{
int id;
DEFINE_IDR(test_idr);
#define TEST5_START ((1<<27)+42) /* use the highest layer */
printk(KERN_INFO "Start test5\n");
id = idr_alloc(&test_idr, (void *)1, TEST5_START, 0, GFP_KERNEL);
BUG_ON(id != TEST5_START);
TEST_BUG_ON(idr_replace(&test_idr, (void *)2, TEST5_START) != (void *)1);
idr_destroy(&test_idr);
printk(KERN_INFO "End of test5\n");
}
Fix the bug by using idr_max() which correctly takes into account the
maximum allowed shift.
sub_alloc() shares the same problem and may incorrectly fail with
-EAGAIN; however, this bug doesn't affect correct operation because
idr_get_empty_slot(), which already uses idr_max(), retries with the
increased @id in such cases.
[tj@kernel.org: Updated patch description.]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e52365f27 upstream.
When tracing a process in another pid namespace, it's important for fork
event messages to contain the child's pid as seen from the tracer's pid
namespace, not the parent's. Otherwise, the tracer won't be able to
correlate the fork event with later SIGTRAP signals it receives from the
child.
We still risk a race condition if a ptracer from a different pid
namespace attaches after we compute the pid_t value. However, sending a
bogus fork event message in this unlikely scenario is still a vast
improvement over the status quo where we always send bogus fork event
messages to debuggers in a different pid namespace than the forking
process.
Signed-off-by: Matthew Dempsky <mdempsky@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Julien Tinnes <jln@chromium.org>
Cc: Roland McGrath <mcgrathr@chromium.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fe121e1f5 upstream.
The rtc user must wait at least 1 sec between each time/calandar update
(see atmel's datasheet chapter "Updating Time/Calendar").
Use the 1Hz interrupt to update the at91_rtc_upd_rdy flag and wait for
the at91_rtc_wait_upd_rdy event if the rtc is not ready.
This patch fixes a deadlock in an uninterruptible wait when the RTC is
updated more than once every second. AFAICT the bug is here from the
beginning, but I think we should at least backport this fix to 3.10 and
the following longterm and stable releases.
Signed-off-by: Boris BREZILLON <boris.brezillon@free-electrons.com>
Reported-by: Bryan Evenson <bevenson@melinkcorp.com>
Tested-by: Bryan Evenson <bevenson@melinkcorp.com>
Cc: Andrew Victor <linux@maxim.org.za>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- at91_rtc_write() is called at91_sys_write()
- Use at91_sys_write() directly instead of the missing
at91_rtc_write_ier() and at91_rtc_write_idr()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 71abdc15ad upstream.
When kswapd exits, it can end up taking locks that were previously held
by allocating tasks while they waited for reclaim. Lockdep currently
warns about this:
On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
> inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
> kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
> (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
> {RECLAIM_FS-ON-W} state was registered at:
> mark_held_locks+0xb9/0x140
> lockdep_trace_alloc+0x7a/0xe0
> kmem_cache_alloc_trace+0x37/0x240
> flex_array_alloc+0x99/0x1a0
> cgroup_attach_task+0x63/0x430
> attach_task_by_pid+0x210/0x280
> cgroup_procs_write+0x16/0x20
> cgroup_file_write+0x120/0x2c0
> vfs_write+0xc0/0x1f0
> SyS_write+0x4c/0xa0
> tracesys+0xdd/0xe2
> irq event stamp: 49
> hardirqs last enabled at (49): _raw_spin_unlock_irqrestore+0x36/0x70
> hardirqs last disabled at (48): _raw_spin_lock_irqsave+0x2b/0xa0
> softirqs last enabled at (0): copy_process.part.24+0x627/0x15f0
> softirqs last disabled at (0): (null)
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&sig->group_rwsem);
> <Interrupt>
> lock(&sig->group_rwsem);
>
> *** DEADLOCK ***
>
> no locks held by kswapd2/1151.
>
> stack backtrace:
> CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
> Call Trace:
> dump_stack+0x19/0x1b
> print_usage_bug+0x1f7/0x208
> mark_lock+0x21d/0x2a0
> __lock_acquire+0x52a/0xb60
> lock_acquire+0xa2/0x140
> down_read+0x51/0xa0
> exit_signals+0x24/0x130
> do_exit+0xb5/0xa50
> kthread+0xdb/0x100
> ret_from_fork+0x7c/0xb0
This is because the kswapd thread is still marked as a reclaimer at the
time of exit. But because it is exiting, nobody is actually waiting on
it to make reclaim progress anymore, and it's nothing but a regular
thread at this point. Be tidy and strip it of all its powers
(PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
before returning from the thread function.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d2b60a554 upstream.
This patch adds an explicit check in chap_server_compute_md5() to ensure
the CHAP_C value received from the initiator during mutual authentication
does not match the original CHAP_C provided by the target.
This is in line with RFC-3720, section 8.2.1:
Originators MUST NOT reuse the CHAP challenge sent by the Responder
for the other direction of a bidirectional authentication.
Responders MUST check for this condition and close the iSCSI TCP
connection if it occurs.
Reported-by: Tejas Vaykole <tejas.vaykole@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c4cad90f9e upstream.
We had a mix & match of flags used when creating legacy ports
depending on where we found them in the device-tree. Among others
we were missing UPF_SKIP_TEST for some kind of ISA ports which is
a problem as quite a few UARTs out there don't support the loopback
test (such as a lot of BMCs).
Let's pick the set of flags used by the SoC code and generalize it
which means autoconf, no loopback test, irq maybe shared and fixed
port.
Sending to stable as the lack of UPF_SKIP_TEST is breaking
serial on some machines so I want this back into distros
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7f39dda9d8 upstream.
Trinity reports BUG:
sleeping function called from invalid context at kernel/locking/rwsem.c:47
in_atomic(): 0, irqs_disabled(): 0, pid: 5787, name: trinity-c27
__might_sleep < down_write < __put_anon_vma < page_get_anon_vma <
migrate_pages < compact_zone < compact_zone_order < try_to_compact_pages ..
Right, since conversion to mutex then rwsem, we should not put_anon_vma()
from inside an rcu_read_lock()ed section: fix the two places that did so.
And add might_sleep() to anon_vma_free(), as suggested by Peter Zijlstra.
Fixes: 88c22088bf ("mm: optimize page_lock_anon_vma() fast-path")
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 931ee56c67 upstream.
This fixes a bug in the handling of the fi_delegations list.
nfs4_setlease does not hold the recall_lock when adding to it. The
client_mutex is held, which prevents against concurrent list changes,
but nfsd_break_deleg_cb does not hold while walking it. New delegations
could theoretically creep onto the list while we're walking it there.
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Also remove a list_del_init() in nfs4_setlease() which would now be
before the corresponding list_add()
- Drop change to nfsd_find_all_delegations(), which doesn't exist]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d251836508 upstream.
This device normally comes with a proprietary driver, using a web GUI
to configure RAID:
http://www.highpoint-tech.com/USA_new/series_rr600-download.htm
But thankfully it also works out of the box with the AHCI driver,
being just a Marvell 88SE9235.
Devices 640L, 644L, 644LS should also be supported but not tested here.
Signed-off-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af5d36539d upstream.
We were checking the ext clock rather than the display clock.
Noticed by ArtForz on IRC.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72abc8f4b4 upstream.
I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
shows like this:
[ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297)
[ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1
[ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24)
[ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28)
[ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs])
[ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs])
[ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8)
[ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544)
[ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398)
[ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8)
[ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238)
[ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c)
[ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188)
[ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs])
[ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs])
[ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418)
[ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530)
[ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54)
[ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184)
[ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac)
[ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0)
[ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40)
After analyzing the code, I found a condition that may cause this failed
in correct operations. Thus, I think this assertion is wrong and should be
removed.
Suppose there are two clean znodes and one dirty znode in TNC. So the
per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode
is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops
on this znode. We clear COW bit and DIRTY bit in write_index() without
@tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the
comments in write_index() shows, if another process hold @tnc_mutex and
dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1).
We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in
free_obsolete_znodes() to keep it right.
If shrink_tnc() performs between decrease and increase, it will release
other 2 clean znodes it holds and found @clean_zn_cnt is less than zero
(1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will
soon correct @clean_zn_cnt and no harm to fs in this case, I think this
assertion could be removed.
2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2
Thread A (commit) Thread B (write or others) Thread C (shrinker)
->write_index
->clear_bit(DIRTY_NODE)
->clear_bit(COW_ZNODE)
@clean_zn_cnt == 2
->mutex_locked(&tnc_mutex)
->dirty_cow_znode
->!ubifs_zn_cow(znode)
->!test_and_set_bit(DIRTY_NODE)
->atomic_dec(&clean_zn_cnt)
->mutex_unlocked(&tnc_mutex)
@clean_zn_cnt == 1
->mutex_locked(&tnc_mutex)
->shrink_tnc
->destroy_tnc_subtree
->atomic_sub(&clean_zn_cnt, 2)
->ubifs_assert <- hit
->mutex_unlocked(&tnc_mutex)
@clean_zn_cnt == -1
->mutex_lock(&tnc_mutex)
->free_obsolete_znodes
->atomic_inc(&clean_zn_cnt)
->mutux_unlock(&tnc_mutex)
@clean_zn_cnt == 0 (correct after shrink)
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 12337901d6 upstream.
Note nobody's ever noticed because the typical client probably never
requests FILES_AVAIL without also requesting something else on the list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6f04d3d21 upstream.
The i386 ABI disagrees with most other ABIs regarding alignment of
data types larger than 4 bytes: on most ABIs a padding must be added
at end of the structures, while it is not required on i386.
So for most ABI struct c4iw_create_cq_resp gets implicitly padded
to be aligned on a 8 bytes multiple, while for i386, such padding
is not added.
The tool pahole can be used to find such implicit padding:
$ pahole --anon_include \
--nested_anon_include \
--recursive \
--class_name c4iw_create_cq_resp \
drivers/infiniband/hw/cxgb4/iw_cxgb4.o
Then, structure layout can be compared between i386 and x86_64:
# +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100
# --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
@@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
__u32 size; /* 28 4 */
__u32 qid_mask; /* 32 4 */
- /* size: 36, cachelines: 1, members: 6 */
- /* last cacheline: 36 bytes */
+ /* size: 40, cachelines: 1, members: 6 */
+ /* padding: 4 */
+ /* last cacheline: 40 bytes */
};
This ABI disagreement will make an x86_64 kernel try to write past the
buffer provided by an i386 binary.
When boundary check will be implemented, the x86_64 kernel will refuse
to write past the i386 userspace provided buffer and the uverbs will
fail.
If the structure is on a page boundary and the next page is not
mapped, ib_copy_to_udata() will fail and the uverb will fail.
This patch adds an explicit padding at end of structure
c4iw_create_cq_resp, and, like 92b0ca7cb1 ("IB/mlx5: Fix stack info
leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
not writting this padding field to userspace. This way, x86_64 kernel
will be able to write struct c4iw_create_cq_resp as expected by
unpatched and patched i386 libcxgb4.
Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Fixes: e24a72a330 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8ec0a0e6b5 upstream.
Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
or if nonseekable_open() fails.
Avoid leaking a kref count, that sm_sem is kept down and also that the
IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
nonseekable_open() fails.
Since container_of() never returns NULL, remove the code that tests
whether container_of() returns NULL.
Moving the kref_get() call from the start of ib_umad_*open() to the
end is safe since it is the responsibility of the caller of these
functions to ensure that the cdev pointer remains valid until at least
when these functions return.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
[ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
[ nonseekable_open() can't actually fail, but.... - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5dc2808c47 upstream.
Lists of endpoints are stored for bandwidth calculation for roothub ports.
Make sure we remove all endpoints from the list before the whole device,
containing its endpoints list_head stuctures, is freed.
This used to be done in the wrong order in xhci_mem_cleanup(),
and triggered an oops in resume from S4 (hibernate).
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 397335f004 upstream.
The current deadlock detection logic does not work reliably due to the
following early exit path:
/*
* Drop out, when the task has no waiters. Note,
* top_waiter can be NULL, when we are in the deboosting
* mode!
*/
if (top_waiter && (!task_has_pi_waiters(task) ||
top_waiter != task_top_pi_waiter(task)))
goto out_unlock_pi;
So this not only exits when the task has no waiters, it also exits
unconditionally when the current waiter is not the top priority waiter
of the task.
So in a nested locking scenario, it might abort the lock chain walk
and therefor miss a potential deadlock.
Simple fix: Continue the chain walk, when deadlock detection is
enabled.
We also avoid the whole enqueue, if we detect the deadlock right away
(A-A). It's an optimization, but also prevents that another waiter who
comes in after the detection and before the task has undone the damage
observes the situation and detects the deadlock and returns
-EDEADLOCK, which is wrong as the other task is not in a deadlock
situation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/20140522031949.725272460@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 993072ee67 upstream.
The IRB might be 96 bytes if the extended-I/O-measurement facility is
used. This feature is currently not used by Linux, but struct irb
already has the emw defined. So let's make the irb in lowcore match the
size of the internal data structure to be future proof.
We also have to add a pad, to correctly align the paste.
The bigger irb field also circumvents a bug in some QEMU versions that
always write the emw field on test subchannel and therefore destroy the
paste definitions of this CPU. Running under these QEMU version broke
some timing functions in the VDSO and all users of these functions,
e.g. some JREs.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
[bwh: Backported to 3.2: offsets of the affected fields in the 64-bit version
of struct _lowcore are 128 bytes smaller]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3991b31ea0 upstream.
If mddev->ro is set, md_to_sync will (correctly) abort.
However in that case MD_RECOVERY_INTR isn't set.
If a RESHAPE had been requested, then ->finish_reshape() will be
called and it will think the reshape was successful even though
nothing happened.
Normally a resync will not be requested if ->ro is set, but if an
array is stopped while a reshape is on-going, then when the array is
started, the reshape will be restarted. If the array is also set
read-only at this point, the reshape will instantly appear to success,
resulting in data corruption.
Consequently, this patch is suitable for any -stable kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7998eb3dc7 upstream.
With binutils 2.24, various 64 bit builds fail with relocation errors
such as
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
against symbol `interrupt_base_book3e' defined in .text section
in arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
against symbol `interrupt_end_book3e' defined in .text section
in arch/powerpc/kernel/built-in.o
The assembler maintainer says:
I changed the ABI, something that had to be done but unfortunately
happens to break the booke kernel code. When building up a 64-bit
value with lis, ori, shl, oris, ori or similar sequences, you now
should use @high and @higha in place of @h and @ha. @h and @ha
(and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
now report overflow if the value is out of 32-bit signed range.
ie. @h and @ha assume you're building a 32-bit value. This is needed
to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
and @toc@ha expressions, and for consistency I did the same for all
other @h and @ha relocs.
Replacing @h with @high in one strategic location fixes the relocation
errors. This has to be done conditionally since the assembler either
supports @h or @high but not both.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4d58f5dcb upstream.
TEST 12 and TEST 24 unlinks the URB write request for N times. When
host and gadget both initialize pattern 1 (mod 63) data series to
transfer, the gadget side will complain the wrong data which is not
expected. Because in host side, usbtest doesn't fill the data buffer
as mod 63 and this patch fixed it.
[20285.488974] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
[20285.489181] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active
[20285.489423] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb50800 length 512 last
[20285.489727] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000
[20285.490055] dwc3 dwc3.0.auto: Command Complete --> 0
[20285.490281] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
[20285.490492] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Active
[20285.490713] dwc3 dwc3.0.auto: ep1out-bulk: endpoint busy
[20285.490909] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Complete
[20285.491117] dwc3 dwc3.0.auto: request ffff8800aa6cb480 from ep1out-bulk completed 512/512 ===> 0
[20285.491431] zero gadget: bad OUT byte, buf[1] = 0
[20285.491605] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Set Stall' params 00000000 00000000 00000000
[20285.491915] dwc3 dwc3.0.auto: Command Complete --> 0
[20285.492099] dwc3 dwc3.0.auto: queing request ffff8800aa6cb480 to ep1out-bulk length 512
[20285.492387] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
[20285.492595] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active
[20285.492830] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb51000 length 512 last
[20285.493135] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000
[20285.493465] dwc3 dwc3.0.auto: Command Complete --> 0
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d0839d757e upstream.
The NovaTech OrionLXm uses an onboard FTDI serial converter for JTAG and
console access.
Here is the lsusb output:
Bus 004 Device 123: ID 0403:7c90 Future Technology Devices
International, Ltd
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c03890ff5e upstream.
A recent patch that purported to fix firmware download on big-endian
machines failed to add the corresponding sparse annotation to the
i2c-header. This was reported by the kbuild test robot.
Adding the appropriate annotation revealed another endianess bug related
to the i2c-header Size-field in a code path that is exercised when the
firmware is actually being downloaded (and not just verified and left
untouched unless older than the firmware at hand).
This patch adds the required sparse annotation to the i2c-header and
makes sure that the Size-field is sent in little-endian byte order
during firmware download also on big-endian machines.
Note that this patch is only compile-tested, but that there is no
functional change for little-endian systems.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Ludovic Drolez <ldrolez@debian.org>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 183a45087d upstream.
Make sure to check return value of autopm get in write() in order to
avoid urb leak and PM counter imbalance on errors.
Fixes: 11ea859d64 ("USB: additional power savings for cdc-acm devices
that support remote wakeup")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Error/status variable is called rc, not stat]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bae3f4c535 upstream.
Fix runtime PM handling of control messages by adding the required PM
counter operations.
Fixes: 11ea859d64 ("USB: additional power savings for cdc-acm devices
that support remote wakeup")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 140cb81ac8 upstream.
The current ACM runtime-suspend implementation is broken in several
ways:
Firstly, it buffers only the first write request being made while
suspended -- any further writes are silently dropped.
Secondly, writes being dropped also leak write urbs, which are never
reclaimed (until the device is unbound).
Thirdly, even the single buffered write is not cleared at shutdown
(which may happen before the device is resumed), something which can
lead to another urb leak as well as a PM usage-counter leak.
Fix this by implementing a delayed-write queue using urb anchors and
making sure to discard the queue properly at shutdown.
Fixes: 11ea859d64 ("USB: additional power savings for cdc-acm devices
that support remote wakeup")
Reported-by: Xiao Jin <jin.xiao@intel.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e144ed28be upstream.
Fix race between write() and resume() due to improper locking that could
lead to writes being reordered.
Resume must be done atomically and susp_count be protected by the
write_lock in order to prevent racing with write(). This could otherwise
lead to writes being reordered if write() grabs the write_lock after
susp_count is decremented, but before the delayed urb is submitted.
Fixes: 11ea859d64 ("USB: additional power savings for cdc-acm devices
that support remote wakeup")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Move mutex_lock(acm->mutex) above acquisition of spinlocks]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a345c20c1 upstream.
Fix race between write() and suspend() which could lead to writes being
dropped (or I/O while suspended) if the device is runtime suspended
while a write request is being processed.
Specifically, suspend() releases the write_lock after determining the
device is idle but before incrementing the susp_count, thus leaving a
window where a concurrent write() can submit an urb.
Fixes: 11ea859d64 ("USB: additional power savings for cdc-acm devices
that support remote wakeup")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fb7ad4f93d upstream.
Keep trying to submit urbs rather than bail out on first read-urb
submission error, which would also prevent I/O for any further ports
from being resumed.
Instead keep an error count, for all types of failed submissions, and
let USB core know that something went wrong.
Also make sure to always clear the suspended flag. Currently a failed
read-urb submission would prevent cached writes as well as any
subsequent writes from being submitted until next suspend-resume cycle,
something which may not even necessarily happen.
Note that USB core currently only logs an error if an interface resume
failed.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 79eed03e77 upstream.
The delayed-write queue was never emptied at shutdown (close), something
which could lead to leaked urbs if the port is closed before being
runtime resumed due to a write.
When this happens the output buffer would not drain on close
(closing_wait timeout), and after consecutive opens, writes could be
corrupted with previously buffered data, transfered with reduced
throughput or completely blocked.
Note that unbusy_queued_urb() was simply moved out of CONFIG_PM.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 170fad9e22 upstream.
Fix race between write() and suspend() which could lead to writes being
dropped (or I/O while suspended) if the device is runtime suspended
while a write request is being processed.
Specifically, suspend() releases the susp_lock after determining the
device is idle but before setting the suspended flag, thus leaving a
window where a concurrent write() can submit an urb.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d9e93c08d8 upstream.
We find a race between write and resume. usb_wwan_resume run play_delayed()
and spin_unlock, but intfdata->suspended still is not set to zero.
At this time usb_wwan_write is called and anchor the urb to delay
list. Then resume keep running but the delayed urb have no chance
to be commit until next resume. If the time of next resume is far
away, tty will be blocked in tty_wait_until_sent during time. The
race also can lead to writes being reordered.
This patch put play_Delayed and intfdata->suspended together in the
spinlock, it's to avoid the write race during resume.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Zhang, Qi1 <qi1.zhang@intel.com>
Reviewed-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: there's no need to check for portdata == NULL]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit db09047379 upstream.
When enable usb serial for modem data, sometimes the tty is blocked
in tty_wait_until_sent because portdata->out_busy always is set and
have no chance to be cleared.
We find a bug in write error path. usb_wwan_write set portdata->out_busy
firstly, then try autopm async with error. No out urb submit and no
usb_wwan_outdat_callback to this write, portdata->out_busy can't be
cleared.
This patch clear portdata->out_busy if usb_wwan_write try autopm async
with error.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: xiao jin <jin.xiao@intel.com>
Signed-off-by: Zhang, Qi1 <qi1.zhang@intel.com>
Reviewed-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit acf47d4f9c upstream.
Fix potential I/O while runtime suspended due to missing PM operations
in send_setup.
Fixes: 383cedc3bb ("USB: serial: full autosuspend support for the
option driver")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80cc0fcbda upstream.
Make sure that needs_remote_wake up is always set when there are open
ports.
Currently close() would unconditionally set needs_remote_wakeup to 0
even though there might still be open ports. This could lead to blocked
input and possibly dropped data on devices that do not support remote
wakeup (and which must therefore not be runtime suspended while open).
Add an open_ports counter (protected by the susp_lock) and only clear
needs_remote_wakeup when the last port is closed.
Fixes: e6929a9020 ("USB: support for autosuspend in sierra while
online")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 014333f77c upstream.
The delayed-write queue was never emptied on disconnect, something which
would lead to leaked urbs and transfer buffers if the device is
disconnected before being runtime resumed due to a write.
Fixes: e6929a9020 ("USB: support for autosuspend in sierra while
online")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7fdd26a01e upstream.
Neither the transfer buffer or the urb itself were released in the
resume error path for delayed writes. Also on errors, the remainder of
the queue was not even processed, which leads to further urb and buffer
leaks.
The same error path also failed to balance the outstanding-urb counter,
something which results in degraded throughput or completely blocked
writes.
Fix this by releasing urb and buffer and balancing counters on errors,
and by always processing the whole queue even when submission of one urb
fails.
Fixes: e6929a9020 ("USB: support for autosuspend in sierra while
online")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 353fe19860 upstream.
Fix AA deadlock in open error path that would call close() and try to
grab the already held disc_mutex.
Fixes: b9a44bc19f ("sierra: driver urb handling improvements")
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b5b6077855 upstream.
The variable "size" is expressed as number of blocks and not as
number of clusters, this could trigger a kernel panic when using
ext4 with the size of a cluster different from the size of a block.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eeece469de upstream.
Tail of a page straddling inode size must be zeroed when being written
out due to POSIX requirement that modifications of mmaped page beyond
inode size must not be written to the file. ext4_bio_write_page() did
this only for blocks fully beyond inode size but didn't properly zero
blocks partially beyond inode size. Fix this.
The problem has been uncovered by mmap_11-4 test in openposix test suite
(part of LTP).
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Fixes: 5a0dc7365c
Fixes: bd2d0210cf
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- block_end was used instead of block_start + blocksize]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 754a292fe6 upstream.
Add support for Marvell Technology Group Ltd. 88SE91A0 SATA 6Gb/s
Controller by adding its PCI ID.
Signed-off-by: Andreas Schrägle <ajs124.ajs124@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9aab8bff7a upstream.
We only want to modifiy a single field in the userspace view of the
execbuffer command buffer, so explicitly change that rather than copy
everything back again.
This serves two purposes:
1. The single fields are much cheaper to copy (constant size so the
copy uses special case code) and much smaller than the whole array.
2. We modify the array for internal use that need to be masked from
the user.
Note: We need this backported since without it the next bugfix will
blow up when userspace recycles batchbuffers and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: to_user_ptr() is open-coded]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff240199b6 upstream.
These are all user-trigerable, so tune down their loudness a notch.
For some of these we have i-g-t tests (because they prevent
newly-discovered bugs), without this patches running the test suite
leaves behind a dirty dmesg.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c7d37a66e3 upstream.
Without this fix, freshly rebooted Linux creates a new IBSS
instead of joining an existing one. Only when jiffies counter
overflows after 5 minutes the IBSS can be successfully joined.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
[edit commit message slightly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6d396ede22 upstream.
The T540p has a touchpad with pnp-id LEN0034, all the models with this
pnp-id have the same min/max values, except the T540p where the values are
slightly off. Fix them to be identical.
This is a preparation patch for simplifying the quirk table.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 537094b64b upstream.
According to arm procedure call standart r2 register is call-cloberred.
So after the result of x expression was put into r2 any following
function call in p may overwrite r2. To fix this, the result of p
expression must be saved to the temporary variable before the
assigment x expression to __r2.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8ef42ddd9a upstream.
Not all host controller drivers have bus-suspend and bus-resume
methods. When one doesn't, it will cause problems if runtime PM is
enabled in the kernel. The PM core will attempt to suspend the
controller's root hub, the suspend will fail because there is no
bus-suspend routine, and a -EBUSY error code will be returned to the
PM core. This will cause the suspend attempt to be repeated shortly
thereafter, in a never-ending loop.
Part of the problem is that the original error code -ENOENT gets
changed to -EBUSY in usb_runtime_suspend(), on the grounds that the PM
core will interpret -ENOENT as meaning that the root hub has gotten
into a runtime-PM error state. While this change is appropriate for
real USB devices, it's not such a good idea for a root hub. In fact,
considering the root hub to be in a runtime-PM error state would not
be far from the truth. Therefore this patch updates
usb_runtime_suspend() so that it adjusts error codes only for
non-root-hub devices.
Furthermore, the patch attempts to prevent the problem from occurring
in the first place by not enabling runtime PM by default for root hubs
whose host controller driver doesn't have bus_suspend and bus_resume
methods.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Will Deacon <will.deacon@arm.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: runtime PM is also not supported for USB 3.0
non-root hubs]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 972754cfae upstream.
I had occasional screen corruption with the matrox framebuffer driver and
I found out that the reason for the corruption is that the hardware
blitter accesses the videoram while it is being written to.
The matrox driver has a macro WaitTillIdle() that should wait until the
blitter is idle, but it sometimes doesn't work. I added a dummy read
mga_inl(M_STATUS) to WaitTillIdle() to fix the problem. The dummy read
will flush the write buffer in the PCI chipset, and the next read of
M_STATUS will return the hardware status.
Since applying this patch, I had no screen corruption at all.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d1d70e5dc2 upstream.
If we fail to allocate struct platform_device pdev we
dereference it after the goto label err.
This bug was found using coccinelle.
Fixes: afa77ef (ARM: mx3: dynamically allocate "ipu-core" devices)
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 22e7478ddb upstream.
Prior to commit 0e4f6a791b (Fix reiserfs_file_release()), reiserfs
truncates serialized on i_mutex. They mostly still do, with the exception
of reiserfs_file_release. That blocks out other writers via the tailpack
mutex and the inode openers counter adjusted in reiserfs_file_open.
However, NFS will call reiserfs_setattr without having called ->open, so
we end up with a race when nfs is calling ->setattr while another
process is releasing the file. Ultimately, it triggers the
BUG_ON(inode->i_size != new_file_size) check in maybe_indirect_to_direct.
The solution is to pull the lock into reiserfs_setattr to encompass the
truncate_setsize call as well.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 024ca90151 upstream.
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.
Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b15d2e5b8 upstream.
Some drivers use the first HID report in the list instead of using an
index. In these cases, validation uses ID 0, which was supposed to mean
"first known report". This fixes the problem, which was causing at least
the lgff family of devices to stop working since hid_validate_values
was being called with ID 0, but the devices used single numbered IDs
for their reports:
0x05, 0x01, /* Usage Page (Desktop), */
0x09, 0x05, /* Usage (Gamepad), */
0xA1, 0x01, /* Collection (Application), */
0xA1, 0x02, /* Collection (Logical), */
0x85, 0x01, /* Report ID (1), */
...
Reported-by: Simon Wood <simon@mungewell.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 691a7c6f28 upstream.
There is a race condition in UBIFS:
Thread A (mmap) Thread B (fsync)
->__do_fault ->write_cache_pages
-> ubifs_vm_page_mkwrite
-> budget_space
-> lock_page
-> release/convert_page_budget
-> SetPagePrivate
-> TestSetPageDirty
-> unlock_page
-> lock_page
-> TestClearPageDirty
-> ubifs_writepage
-> do_writepage
-> release_budget
-> ClearPagePrivate
-> unlock_page
-> !(ret & VM_FAULT_LOCKED)
-> lock_page
-> set_page_dirty
-> ubifs_set_page_dirty
-> TestSetPageDirty (set page dirty without budgeting)
-> unlock_page
This leads to situation where we have a diry page but no budget allocated for
this page, so further write-back may fail with -ENOSPC.
In this fix we return from page_mkwrite without performing unlock_page. We
return VM_FAULT_LOCKED instead. After doing this, the race above will not
happen.
Signed-off-by: hujianyang <hujianyang@huawei.com>
Tested-by: Laurence Withers <lwithers@guralp.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1e77d0a1ed upstream.
Till reported that the spurious interrupt detection of threaded
interrupts is broken in two ways:
- note_interrupt() is called for each action thread of a shared
interrupt line. That's wrong as we are only interested whether none
of the device drivers felt responsible for the interrupt, but by
calling multiple times for a single interrupt line we account
IRQ_NONE even if one of the drivers felt responsible.
- note_interrupt() when called from the thread handler is not
serialized. That leaves the members of irq_desc which are used for
the spurious detection unprotected.
To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.
If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return.
If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.
If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.
Reported-by: Till Straumann <strauman@slac.stanford.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Austin Schuh <austin@peloton-tech.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-can@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1303071450130.22263@ionos
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da64c27d3c upstream.
LDISCs shouldn't call tty->ops->write() from within
->write_wakeup().
->write_wakeup() is called with port lock taken and
IRQs disabled, tty->ops->write() will try to acquire
the same port lock and we will deadlock.
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Tested-by: Andreas Bießmann <andreas@biessmann.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 498c228021 upstream.
kmap_to_page returns the corresponding struct page for a virtual address
of an arbitrary mapping. This works by checking whether the address
falls in the pkmap region and using the pkmap page tables instead of the
linear mapping if appropriate.
Unfortunately, the bounds checking means that PKMAP_ADDR(LAST_PKMAP) is
incorrectly treated as a highmem address and we can end up walking off
the end of pkmap_page_table and subsequently passing junk to pte_page.
This patch fixes the bound check to stay within the pkmap tables.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 54a217887a upstream.
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex. We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.
The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address. This can lead to state leakage and worse under some
circumstances.
Handle the cases explicit:
Waiter | pi_state | pi->owner | uTID | uODIED | ?
[1] NULL | --- | --- | 0 | 0/1 | Valid
[2] NULL | --- | --- | >0 | 0/1 | Valid
[3] Found | NULL | -- | Any | 0/1 | Invalid
[4] Found | Found | NULL | 0 | 1 | Valid
[5] Found | Found | NULL | >0 | 1 | Invalid
[6] Found | Found | task | 0 | 1 | Valid
[7] Found | Found | NULL | Any | 0 | Invalid
[8] Found | Found | task | ==taskTID | 0/1 | Valid
[9] Found | Found | task | 0 | 0 | Invalid
[10] Found | Found | task | !=taskTID | 0/1 | Invalid
[1] Indicates that the kernel can acquire the futex atomically. We
came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
[2] Valid, if TID does not belong to a kernel thread. If no matching
thread is found then it indicates that the owner TID has died.
[3] Invalid. The waiter is queued on a non PI futex
[4] Valid state after exit_robust_list(), which sets the user space
value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
[5] The user space value got manipulated between exit_robust_list()
and exit_pi_state_list()
[6] Valid state after exit_pi_state_list() which sets the new owner in
the pi_state but cannot access the user space value.
[7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
[8] Owner and user space value match
[9] There is no transient state which sets the user space TID to 0
except exit_robust_list(), but this is indicated by the
FUTEX_OWNER_DIED bit. See [4]
[10] There is no transient state which leaves owner and user space
TID out of sync.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13fbca4c6e upstream.
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex. So the owner TID of the current owner
(the unlocker) persists. That's observable inconsistant state,
especially when the ownership of the pi state got transferred.
Clean it up unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b3eaa9fc5c upstream.
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.
Verify whether the futex has waiters associated with kernel state. If
it has, return -EINVAL. The state is corrupted already, so no point in
cleaning it up. Subsequent calls will fail as well. Not our problem.
[ tglx: Use futex_top_waiter() and explain why we do not need to try
restoring the already corrupted user space state. ]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9c243a5a6 upstream.
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.
This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")
[ tglx: Compare the resulting keys as well, as uaddrs might be
different depending on the mapping ]
Fixes CVE-2014-3153.
Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e030ecc0f upstream.
When a memory error happens on an in-use page or (free and in-use)
hugepage, the victim page is isolated with its refcount set to one.
When you try to unpoison it later, unpoison_memory() calls put_page()
for it twice in order to bring the page back to free page pool (buddy or
free hugepage list). However, if another memory error occurs on the
page which we are unpoisoning, memory_failure() returns without
releasing the refcount which was incremented in the same call at first,
which results in memory leak and unconsistent num_poisoned_pages
statistics. This patch fixes it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: s/num_poisoned_pages/bad_mce_pages/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b985194c8c upstream.
For handling a free hugepage in memory failure, the race will happen if
another thread hwpoisoned this hugepage concurrently. So we need to
check PageHWPoison instead of !PageHWPoison.
If hwpoison_filter(p) returns true or a race happens, then we need to
unlock_page(hpage).
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: s/num_poisoned_pages/bad_mce_pages/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a9a55bf91 upstream.
We need to use writel() instead of writel_relaxed() when starting
a channel, to ensure all the descriptors have been flushed before
the activation.
While at it, remove the unneeded read-modify-write and make the
code simpler.
Signed-off-by: Lior Amsalem <alior@marvell.com>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[bwh: Backported to 3.2: it was using __raw_readl() and __raw_writel()
which are just as wrong]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27b11428b7 upstream.
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1b8ff4c97 upstream.
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.
We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.
Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2983040641 upstream.
Change the way channels objects are linked together by peak_pci_probe()
avoiding any kernel oops when driver is removed. Side effect is that
the list is now browsed from last to first channel.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 39af6b1678 upstream.
The perf cpu offline callback takes down all cpu context
events and releases swhash->swevent_hlist.
This could race with task context software event being just
scheduled on this cpu via perf_swevent_add while cpu hotplug
code already cleaned up event's data.
The race happens in the gap between the cpu notifier code
and the cpu being actually taken down. Note that only cpu
ctx events are terminated in the perf cpu hotplug code.
It's easily reproduced with:
$ perf record -e faults perf bench sched pipe
while putting one of the cpus offline:
# echo 0 > /sys/devices/system/cpu/cpu1/online
Console emits following warning:
WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
Modules linked in:
CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G W 3.14.0+ #256
Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
Call Trace:
[<ffffffff81665a23>] dump_stack+0x4f/0x7c
[<ffffffff8104732c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8104737a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8110fb3d>] perf_swevent_add+0x18d/0x1a0
[<ffffffff811162ae>] event_sched_in.isra.75+0x9e/0x1f0
[<ffffffff8111646a>] group_sched_in+0x6a/0x1f0
[<ffffffff81083dd5>] ? sched_clock_local+0x25/0xa0
[<ffffffff811167e6>] ctx_sched_in+0x1f6/0x450
[<ffffffff8111757b>] perf_event_sched_in+0x6b/0xa0
[<ffffffff81117a4b>] perf_event_context_sched_in+0x7b/0xc0
[<ffffffff81117ece>] __perf_event_task_sched_in+0x43e/0x460
[<ffffffff81096f1e>] ? put_lock_stats.isra.18+0xe/0x30
[<ffffffff8107b3c8>] finish_task_switch+0xb8/0x100
[<ffffffff8166a7de>] __schedule+0x30e/0xad0
[<ffffffff81172dd2>] ? pipe_read+0x3e2/0x560
[<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
[<ffffffff8166b45e>] ? preempt_schedule_irq+0x3e/0x70
[<ffffffff8166b464>] preempt_schedule_irq+0x44/0x70
[<ffffffff816707f0>] retint_kernel+0x20/0x30
[<ffffffff8109e60a>] ? lockdep_sys_exit+0x1a/0x90
[<ffffffff812a4234>] lockdep_sys_exit_thunk+0x35/0x67
[<ffffffff81679321>] ? sysret_check+0x5/0x56
Fixing this by tracking the cpu hotplug state and displaying
the WARN only if current cpu is initialized properly.
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 178eda29ca upstream.
It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:
https://github.com/zfsonlinux/spl/issues/241http://tracker.ceph.com/issues/7790
The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.
This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.
Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e36b13cceb upstream.
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().
Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
[bwh: Backported to 3.2:
- Add ceph_tcp_sendpage(), from commit 31739139f3 ('libceph: use
kernel_sendpage() for sending zeroes'), the rest of which is not
applicable
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 93fa9d3267 upstream.
When a new device is added below a hotplug bridge, the bridge's secondary
bus speed and the device's bus speed must match. The shpchp driver
previously checked the bridge's *primary* bus speed, not the secondary bus
speed.
This caused hot-add errors like:
shpchp 0000:00:03.0: Speed of bus ff and adapter 0 mismatch
Check the secondary bus speed instead.
[bhelgaas: changelog]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75251
Fixes: 3749c51ac6 ("PCI: Make current and maximum bus speeds part of the PCI core")
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fa81511bb0 upstream.
Checkin:
b3b42ac2cb x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
disabled 16-bit segments on 64-bit kernels due to an information
leak. However, it does seem that people are genuinely using Wine to
run old 16-bit Windows programs on Linux.
A proper fix for this ("espfix64") is coming in the upcoming merge
window, but as a temporary fix, create a sysctl to allow the
administrator to re-enable support for 16-bit segments.
It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
you hit this issue and care about your old Windows program more than
you care about a kernel stack address information leak, you can do
echo 1 > /proc/sys/abi/ldt16
as root (add it to your startup scripts), and you should be ok.
The sysctl table is only added if you have COMPAT support enabled on
x86-64, but I assume anybody who runs old windows binaries very much
does that ;)
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ce78cc071f upstream.
Don't unmark the device as suspended until after it's been re-setup.
The main race would be w.r.t. an i2c driver that gets resumed at the same
time (asyncronously), that is allowed to do a transfer since suspended
is set to 0 before reinit, but really should have seen the -EIO return
instead.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 47bb27e788 upstream.
There have been "i2c_designware 80860F41:00: controller timed out" errors
on a number of Baytrail platforms. The issue is caused by incorrect value in
Interrupt Mask Register (DW_IC_INTR_MASK) when i2c core is being enabled.
This causes call to __i2c_dw_enable() to immediately start the transfer which
leads to timeout. There are 3 failure modes observed:
1. Failure in S0 to S3 resume path
The default value after reset for DW_IC_INTR_MASK is 0x8ff. When we start
the first transaction after resuming from system sleep, TX_EMPTY interrupt
is already unmasked because of the hardware default.
2. Failure in normal operational path
This failure happens rarely and is hard to reproduce. Debug trace showed that
DW_IC_INTR_MASK had value of 0x254 when failure occurred, which meant
TX_EMPTY was unmasked.
3. Failure in S3 to S0 suspend path
This failure also happens rarely and is hard to reproduce. Adding debug trace
that read DW_IC_INTR_MASK made this failure not reproducible. But from ISR
call trace we could conclude TX_EMPTY was unmasked when problem occurred.
The patch masks all interrupts before the controller is enabled to resolve the
faulty DW_IC_INTR_MASK conditions.
Signed-off-by: Wenkai Du <wenkai.du@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[wsa: improved the comment and removed typo in commit msg]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9844f54623 upstream.
The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().
The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.
Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97d9d23dda upstream.
If a struct contains 64-bit fields, it is aligned on 64-bit boundaries
within containing structs in 64-bit compilations. This is the case with
struct v4l2_window, which contains pointers and is embedded into struct
v4l2_format, and that one is embedded into struct v4l2_create_buffers.
Unlike some other structs, used as a part of the kernel ABI as ioctl()
arguments, that are packed, these structs aren't packed. This isn't a
problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains
a bug, that triggers in such 64-bit builds. That code wrongly assumes,
that in struct v4l2_create_buffers, struct v4l2_format immediately follows
the __u32 memory field, which in fact isn't the case. This bug wasn't
visible until now, because until recently hardly any applications used
this ioctl() and mostly embedded 32-bit only drivers implemented it. This
is changing now with addition of this ioctl() to some USB drivers, e.g.
UVC. This patch fixes the bug by copying parts of struct
v4l2_create_buffers separately.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cfece5857c upstream.
Commit 75e2bdad89 "ov7670: allow
configuration of image size, clock speed, and I/O method" uses a wrong
index to iterate an array. Apart from being wrong, it also uses an
unchecked value from user-space, which can cause access to unmapped
memory in the kernel, triggered by a normal desktop user with rights to
use V4L2 devices.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2:
- Adjust filename
- win_sizes array is static, not per-device]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3a18e1398f upstream.
The datasheet for EMC1413/EMC1414, which is fully compatible to
EMC1403/1404 and uses the same chip identification, references revision
numbers 0x01, 0x03, and 0x04. Accept the full range of revision numbers
from 0x01 to 0x04 to make sure none are missed.
Signed-off-by: Josef Gajdusek <atx@atx.name>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 17c048fc4b upstream.
Attempts to set the hysteresis value to a temperature below the target
limit fails with "write error: Numerical result out of range" due to
an inverted comparison.
Signed-off-by: Josef Gajdusek <atx@atx.name>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
[Guenter Roeck: Updated headline and description]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6227cb00cc upstream.
The check at the beginning of cpupri_find() makes sure that the task_pri
variable does not exceed the cp->pri_to_cpu array length. But that length
is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two
priorities in that array.
As task_pri is computed from convert_prio() which should never be bigger
than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is
hit.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d5c9fde3da upstream.
It is possible for "limit - setpoint + 1" to equal zero, after getting
truncated to a 32 bit variable, and resulting in a divide by zero error.
Using the fully 64 bit divide functions avoids this problem. It also
will cause pos_ratio_polynom() to return the correct value when
(setpoint - limit) exceeds 2^32.
Also uninline pos_ratio_polynom, at Andrew's request.
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
Adjust context - pos_ratio_polynom() is not a separate function]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed84825b78 upstream.
In bdi_position_ratio(), get difference (setpoint-dirty) right even when
negative. Both setpoint and dirty are unsigned long, the difference was
zero-padded thus wrongly sign-extended to s64. This issue affects all
32-bit architectures, does not affect 64-bit architectures where long
and s64 are equivalent.
In this function, dirty is between freerun and limit, the pseudo-float x
is between [-1,1], expected to be negative about half the time. With
zero-padding, instead of a small negative x we obtained a large positive
one so bdi_position_ratio() returned garbage.
Casting the difference to s64 also prevents overflow with left-shift;
though normally these numbers are small and I never observed a 32-bit
overflow there.
(This patch does not solve the PAE OOM issue.)
Paul Szabo psz@maths.usyd.edu.auhttp://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Paul Szabo <psz@maths.usyd.edu.au>
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo <psz@maths.usyd.edu.au>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 50c6e282bd upstream.
Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.
Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5694c93e6c upstream.
Aside from making it clearer what is non-trivial in create_client(), it
also fixes a bug whereby we can call free_client() before idr_init()
has been called.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Also move initialisation of nfs4_client::cl_strhash, but
not nfs4_client::cl_revoked]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0f62fb220a upstream.
If an md array with externally managed metadata (e.g. DDF or IMSM)
is in use, then we should not set safemode==2 at shutdown because:
1/ this is ineffective: user-space need to be involved in any 'safemode' handling,
2/ The safemode management code doesn't cope with safemode==2 on external metadata
and md_check_recover enters an infinite loop.
Even at shutdown, an infinite-looping process can be problematic, so this
could cause shutdown to hang.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36189cc3cd upstream.
The hw_version 3 Elantech touchpad on the Gigabyte U2442 does not accept
0x0b as initialization value for r10, this stand-alone version of the
driver: http://planet76.com/drivers/elantech/psmouse-elantech-v6.tar.bz2
Uses 0x03 which does work, so this means not setting bit 3 of r10 which
sets: "Enable Real H/W Resolution In Absolute mode"
Which will result in half the x and y resolution we get with that bit set,
so simply not setting it everywhere is not a solution. We've been unable to
find a way to identify touchpads where setting the bit will fail, so this
patch uses a dmi based blacklist for this.
https://bugzilla.kernel.org/show_bug.cgi?id=61151
Reported-by: Philipp Wolfer <ph.wolfer@gmail.com>
Tested-by: Philipp Wolfer <ph.wolfer@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8171a67d58 upstream.
BugLink: http://bugs.launchpad.net/bugs/1180881
Synaptics large touchscreen doesn't support some of the report request
while initializing. The unspoorted request will make the device unreachable,
and will lead to the following usb_submit_urb() function call timeout.
So, add the IDs into HID_QUIRK_NO_INIT_REPORTS quirk.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2:
- Adjust context
- Add definition of USB_VENDOR_ID_SYNAPTICS]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 501fed45b7 upstream.
When 'console=hvc0' is specified to the kernel parameter in x86 KVM guest,
hvc console is setup within a kthread. However, that will cause SEGV
and the boot will fail when the driver is builtin to the kernel,
because currently hvc_console_setup() is annotated with '__init'. This
patch removes '__init' to boot the guest successfully with 'console=hvc0'.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit df602c2d23 upstream.
Even if the USB-to-ATAPI converter supported multiple LUNs, this
driver would always detect the same physical device or media because
it doesn't use srb->device->lun in any way.
Tested with an Hewlett-Packard CD-Writer Plus 8200e.
Signed-off-by: Daniele Forsi <dforsi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a3d0b1218d upstream.
There appear to be a crop of new hardware where the vbios is not
available from PROM/PRAMIN, but there is a valid _ROM method in ACPI.
The data read from PCIROM almost invariably contains invalid
instructions (still has the x86 opcodes), which makes this a low-risk
way to try to obtain a valid vbios image.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76475
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3234f5b06f upstream.
Fixes: a53268be0c ('rtlwifi: rtl8192cu: Fix too long disable of IRQs')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
commit a53268be0c upstream.
In commit f78bccd79b entitled "rtlwifi:
rtl8192ce: Fix too long disable of IRQs", Olivier Langlois
<olivier@trillion01.com> fixed a problem caused by an extra long disabling
of interrupts. This patch makes the same fix for rtl8192cu.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98a01e779f upstream.
On architectures with sizeof(int) < sizeof (long), the
computation of mask inside apply_slack() can be undefined if the
computed bit is > 32.
E.g. with: expires = 0xffffe6f5 and slack = 25, we get:
expires_limit = 0x20000000e
bit = 33
mask = (1 << 33) - 1 /* undefined */
On x86, mask becomes 1 and and the slack is not applied properly.
On s390, mask is -1, expires is set to 0 and the timer fires immediately.
Use 1UL << bit to solve that issue.
Suggested-by: Deborah Townsend <dstownse@us.ibm.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 012a45e3f4 upstream.
If a cpu is idle and starts an hrtimer which is not pinned on that
same cpu, the nohz code might target the timer to a different cpu.
In the case that we switch the cpu base of the timer we already have a
sanity check in place, which determines whether the timer is earlier
than the current leftmost timer on the target cpu. In that case we
enqueue the timer on the current cpu because we cannot reprogram the
clock event device on the target.
If the timers base is already the target CPU we do not have this
sanity check in place so we enqueue the timer as the leftmost timer in
the target cpus rb tree, but we cannot reprogram the clock event
device on the target cpu. So the timer expires late and subsequently
prevents the reprogramming of the target cpu clock event device until
the previously programmed event fires or a timer with an earlier
expiry time gets enqueued on the target cpu itself.
Add the same target check as we have for the switch base case and
start the timer on the current cpu if it would become the leftmost
timer on the target.
[ tglx: Rewrote subject and changelog ]
Signed-off-by: Leon Ma <xindong.ma@intel.com>
Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6c6c0d5a1c upstream.
If the last hrtimer interrupt detected a hang it sets hang_detected=1
and programs the clock event device with a delay to let the system
make progress.
If hang_detected == 1, we prevent reprogramming of the clock event
device in hrtimer_reprogram() but not in hrtimer_force_reprogram().
This can lead to the following situation:
hrtimer_interrupt()
hang_detected = 1;
program ce device to Xms from now (hang delay)
We have two timers pending:
T1 expires 50ms from now
T2 expires 5s from now
Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
invoked, which in turn programs the clock event device to T2 (5
seconds from now).
Any hrtimer_start after that will not reprogram the hardware due to
hang_detected still being set. So we effectivly block all timers until
the T2 event fires and cleans up the hang situation.
Add a check for hang_detected to hrtimer_force_reprogram() which
prevents the reprogramming of the hang delay in the hardware
timer. The subsequent hrtimer_interrupt will resolve all outstanding
issues.
[ tglx: Rewrote subject and changelog and fixed up the comment in
hrtimer_force_reprogram() ]
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af61e27c3f upstream.
On suspend, _scsih_suspend calls mpt2sas_base_free_resources, which
in turn calls pci_disable_device if the device is enabled prior to
suspending. However, _scsih_suspend also calls pci_disable_device
itself.
Thus, in the event that the device is enabled prior to suspending,
pci_disable_device will be called twice. This patch removes the
duplicate call to pci_disable_device in _scsi_suspend as it is both
unnecessary and results in a kernel oops.
Signed-off-by: Tyler Stachecki <tstache1@binghamton.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a949ae560a upstream.
A race exists between module loading and enabling of function tracer.
CPU 1 CPU 2
----- -----
load_module()
module->state = MODULE_STATE_COMING
register_ftrace_function()
mutex_lock(&ftrace_lock);
ftrace_startup()
update_ftrace_function();
ftrace_arch_code_modify_prepare()
set_all_module_text_rw();
<enables-ftrace>
ftrace_arch_code_modify_post_process()
set_all_module_text_ro();
[ here all module text is set to RO,
including the module that is
loading!! ]
blocking_notifier_call_chain(MODULE_STATE_COMING);
ftrace_init_module()
[ tries to modify code, but it's RO, and fails!
ftrace_bug() is called]
When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.
The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.
The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.
Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com
Reported-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 41c22f6262 upstream.
get_user_pages(mm) is simply wrong if mm->mm_users == 0 and exit_mmap/etc
was already called (or is in progress), mm->mm_count can only pin mm->pgd
and mm_struct itself.
Change kvm_setup_async_pf/async_pf_execute to inc/dec mm->mm_users.
kvm_create_vm/kvm_destroy_vm play with ->mm_count too but this case looks
fine at first glance, it seems that this ->mm is only used to verify that
current->mm == kvm->mm.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98fda16929 upstream.
'.done' is used to mark the completion of 'async_pf_execute()', but
'cancel_work_sync()' returns true when the work was canceled, so we
use it instead.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 28b441e240 upstream.
When we cancel 'async_pf_execute()', we should behave as if the work was
never scheduled in 'kvm_setup_async_pf()'.
Fixes a bug when we can't unload module because the vm wasn't destroyed.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 09da1f3463 upstream.
When we're performing reauthentication (in order to elevate the
security level from an unauthenticated key to an authenticated one) we
do not need to issue any encryption command once authentication
completes. Since the trigger for the encryption HCI command is the
ENCRYPT_PEND flag this flag should not be set in this scenario.
Instead, the REAUTH_PEND flag takes care of all necessary steps for
reauthentication.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[bwh: Backported to 3.2:
- Adjust context
- s/conn->flags/conn->pend/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cbd75e97a5 upstream.
We already check that the buffer object we're accessing is registered with
the file. Now also make sure that we can't DMA across buffer object boundaries.
v2: Code commenting update.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8834d3608c upstream.
When disable beaconing we clear register with beacon and newer set it
back, what make we stop send beacons infinitely.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff413195e8 upstream.
The tp_features.bright_acpimode will not be set correctly for brightness
control because ACPI_VIDEO_HID will not be located in ACPI. As a result,
a duplicated key event will always be sent. acpi_video_backlight_support()
is sufficient to detect standard ACPI brightness control.
Signed-off-by: Alex Hung <alex.hung@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e33d0ba804 ]
Recycling skb always had been very tough...
This time it appears GRO layer can accumulate skb->truesize
adjustments made by drivers when they attach a fragment to skb.
skb_gro_receive() can only subtract from skb->truesize the used part
of a fragment.
I spotted this problem seeing TcpExtPruneCalled and
TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
TCP receive window should be sized properly to accept traffic coming
from a driver not overshooting skb->truesize.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ec47ea8247 ]
With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head. Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fbdc0ad095 ]
the value of itag is a random value from stack, and may not be initiated by
fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID
is not set
This will make the cached dst uncertainty
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 16c0b164bd ]
We drop packet unconditionally when we fail to mirror it. This is not intended
in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge
to a tap device used by a VM. When kernel fails to mirror the packet in
conditions such as when qemu crashes or stop polling the tap, it's hard for the
management software to detect such condition and clean the the mirroring
before. This would lead all packets to the bridge to be dropped and break the
netowrk of other virtual machines.
To solve the issue, the patch does not drop packets when kernel fails to mirror
it, and only drop the redirected packets.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit bbeb0eadcf ]
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
overflow on the underlying interface.
Attempting the set IFF_ALLMULTI on the underlying interface would cause an
error and the log message:
"allmulti touches root, set allmulti failed."
Signed-off-by: Peter Christensen <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit aeefa1ecfc ]
Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().
Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ca6c5d4ad2 ]
local_df means 'ignore DF bit if set', so if its set we're
allowed to perform ip fragmentation.
This wasn't noticed earlier because the output path also drops such skbs
(and emits needed icmp error) and because netfilter ip defrag did not
set local_df until couple of days ago.
Only difference is that DF-packets-larger-than MTU now discarded
earlier (f.e. we avoid pointless netfilter postrouting trip).
While at it, drop the repeated test ip_exceeds_mtu, checking it once
is enough...
Fixes: fe6cc55f3a ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 0cda345d1b ]
commit b9f47a3aae (tcp_cubic: limit delayed_ack ratio to prevent
divide error) try to prevent divide error, but there is still a little
chance that delayed_ack can reach zero. In case the param cnt get
negative value, then ratio+cnt would overflow and may happen to be zero.
As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero.
In some old kernels, such as 2.6.32, there is a bug that would
pass negative param, which then ultimately leads to this divide error.
commit 5b35e1e6e9 (tcp: fix tcp_trim_head() to adjust segment count
with skb MSS) fixed the negative param issue. However,
it's safe that we fix the range of delayed_ack as well,
to make sure we do not hit a divide by zero.
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Liu Yu <allanyuliu@tencent.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f114890cdf ]
This reverts commit 12a2856b60.
The commit above doesn't appear to be necessary any more as the
checksums appear to be correctly computed/validated.
Additionally the above commit breaks kvm configurations where
one VM is using a device that support checksum offload (virtio) and
the other VM does not.
In this case, packets leaving virtio device will have CHECKSUM_PARTIAL
set. The packets is forwarded to a macvtap that has offload features
turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not
update the checksum and thus a bad checksum is passed up to
the guest.
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrian Nord <nightnord@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8535087131 ]
commit 813b3b5db8 (ipv4: Use caller's on-stack flowi as-is
in output route lookups.) introduces another regression which
is very similar to the problem of commit e6b45241c (ipv4: reset
flowi parameters on route connect) wants to fix:
Before we call ip_route_output_key() in sctp_v4_get_dst() to
get a dst that matches a bind address as the source address,
we have already called this function previously and the flowi
parameters have been initialized including flowi4_oif, so when
we call this function again, the process in __ip_route_output_key()
will be different because of the setting of flowi4_oif, and we'll
get a networking device which corresponds to the inputted flowi4_oif
as the output device, this is wrong because we'll never hit this
place if the previously returned source address of dst match one
of the bound addresses.
To reproduce this problem, a vlan setting is enough:
# ifconfig eth0 up
# route del default
# vconfig add eth0 2
# vconfig add eth0 3
# ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
# route add default gw 10.0.1.254 dev eth0.2
# ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
# ip rule add from 10.0.0.14 table 4
# ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
# sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
You'll detect that all the flow are routed to eth0.2(10.0.1.254).
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 30313a3d57 ]
When bridge device is created with IFLA_ADDRESS, we are not calling
br_stp_change_bridge_id(), which leads to incorrect local fdb
management and bridge id calculation, and prevents us from receiving
frames on the bridge device.
Reported-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit c53864fd60 ]
Since 115c9b8192 (rtnetlink: Fix problem with
buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
attribute if they were solicited by a GETLINK message containing an
IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.
That was done because some user programs broke when they received more data
than expected - because IFLA_VFINFO_LIST contains information for each VF
it can become large if there are many VFs.
However, the IFLA_VF_PORTS attribute, supplied for devices which implement
ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
It supplies per-VF information and can therefore become large, but it is
not currently conditional on the IFLA_EXT_MASK value.
Worse, it interacts badly with the existing EXT_MASK handling. When
IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
NLMSG_GOODSIZE. If the information for IFLA_VF_PORTS exceeds this, then
rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
netlink_dump() will misinterpret this as having finished the listing and
omit data for this interface and all subsequent ones. That can cause
getifaddrs(3) to enter an infinite loop.
This patch addresses the problem by only supplying IFLA_VF_PORTS when
IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 973462bbde ]
Without IFLA_EXT_MASK specified, the information reported for a single
interface in response to RTM_GETLINK is expected to fit within a netlink
packet of NLMSG_GOODSIZE.
If it doesn't, however, things will go badly wrong, When listing all
interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
message in a packet as the end of the listing and omit information for
that interface and all subsequent ones. This can cause getifaddrs(3) to
enter an infinite loop.
This patch won't fix the problem, but it will WARN_ON() making it easier to
track down what's going wrong.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ba67b51003 upstream.
The patch fixes a problem with dropped jumbo frames after usage of
'ethtool -G ... rx'.
Scenario:
1. ip link set eth0 up
2. ethtool -G eth0 rx N # <- This zeroes rx-jumbo
3. ip link set mtu 9000 dev eth0
The ethtool command set rx_jumbo_pending to zero so any received jumbo
packets are dropped and you need to use 'ethtool -G eth0 rx-jumbo N'
to workaround the issue.
The patch changes the logic so rx_jumbo_pending value is changed only if
jumbo frames are enabled (MTU > 1500).
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 05ab8f2647 ]
The BPF_S_ANC_NLATTR and BPF_S_ANC_NLATTR_NEST extensions fail to check
for a minimal message length before testing the supplied offset to be
within the bounds of the message. This allows the subtraction of the nla
header to underflow and therefore -- as the data type is unsigned --
allowing far to big offset and length values for the search of the
netlink attribute.
The remainder calculation for the BPF_S_ANC_NLATTR_NEST extension is
also wrong. It has the minuend and subtrahend mixed up, therefore
calculates a huge length value, allowing to overrun the end of the
message while looking for the netlink attribute.
The following three BPF snippets will trigger the bugs when attached to
a UNIX datagram socket and parsing a message with length 1, 2 or 3.
,-[ PoC for missing size check in BPF_S_ANC_NLATTR ]--
| ld #0x87654321
| ldx #42
| ld #nla
| ret a
`---
,-[ PoC for the same bug in BPF_S_ANC_NLATTR_NEST ]--
| ld #0x87654321
| ldx #42
| ld #nlan
| ret a
`---
,-[ PoC for wrong remainder calculation in BPF_S_ANC_NLATTR_NEST ]--
| ; (needs a fake netlink header at offset 0)
| ld #0
| ldx #42
| ld #nlan
| ret a
`---
Fix the first issue by ensuring the message length fulfills the minimal
size constrains of a nla header. Fix the second bug by getting the math
for the remainder calculation right.
Fixes: 4738c1db15 ("[SKFILTER]: Add SKF_ADF_NLATTR instruction")
Fixes: d214c7537b ("filter: add SKF_AD_NLATTR_NEST to look for nested..")
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Fix misplacement of the first check due to a bug in the patch program]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b04c461902 ]
Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com>
Signed-off-by: xiaoming wang <xiaoming.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 30f78d8ebf ]
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit db29868653 ]
Remove the bonding debug_fs entries when the
module initialization fails. The debug_fs
entries should be removed together with all other
already allocated resources.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6d39d589bb ]
In case of tcp, gso_size contains the tcpmss.
For UFO (udp fragmentation offloading) skbs, gso_size is the fragment
payload size, i.e. we must not account for udp header size.
Otherwise, when using virtio drivers, a to-be-forwarded UFO GSO packet
will be needlessly fragmented in the forward path, because we think its
individual segments are too large for the outgoing link.
Fixes: fe6cc55f3a ("net: ip, ipv6: handle gso skbs in forwarding path")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f34c4a35d8 ]
When l2tp driver tries to get PMTU for the tunnel destination, it uses
the pointer to struct sock that represents PPPoX socket, while it
should use the pointer that represents UDP socket of the tunnel.
Signed-off-by: Dmitry Petukhov <dmgenp@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7dec935a3a upstream.
No reason to allocate tp_module structures for modules that have no
tracepoints. This just wastes memory.
Fixes: b75ef8b44b "Tracepoint: Dissociate from module mutex"
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c58dd2dd44 upstream.
All xtables variants suffer from the defect that the copy_to_user()
to copy the counters to user memory may fail after the table has
already been exchanged and thus exposed. Return an error at this
point will result in freeing the already exposed table. Any
subsequent packet processing will result in a kernel panic.
We can't copy the counters before exposing the new tables as we
want provide the counter state after the old table has been
unhooked. Therefore convert this into a silent error.
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6249665890 upstream.
Mode setting in the TGA driver is broken for these reasons:
- info->fix.line_length is set just once in tgafb_init_fix function. If
we change videomode, info->fix.line_length is not recalculated - so
the video mode is changed but the screen is corrupted because of wrong
info->fix.line_length.
- info->fix.smem_len is set in tgafb_init_fix to the size of the default
video mode (640x480). If we set a higher resolution,
info->fix.smem_len is smaller than the current screen size, preventing
the userspace program from mapping the framebuffer.
This patch fixes it:
- info->fix.line_length initialization is moved to tgafb_set_par so that
it is recalculated with each mode change.
- info->fix.smem_len is set to a fixed value representing the real
amount of video ram (the values are taken from xfree86 driver).
- add a check to tgafb_check_var to prevent us from setting a videomode
that doesn't fit into videoram.
- in tgafb_register, tgafb_init_fix is moved upwards, to be called
before fb_find_mode (because fb_find_mode already needs the videoram
size set in tgafb_init_fix).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vga.kernel.org
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3eba563e28 upstream.
Address a regression caused by commit ad332c8a45:
(ACPI / EC: Clear stale EC events on Samsung systems)
After the earlier patch, there was found to be a race condition on some
earlier Samsung systems (N150/N210/N220). The function acpi_ec_clear was
sometimes discarding a new EC event before its GPE was triggered by the
system. In the case of these systems, this meant that the "lid open"
event was not registered on resume if that was the cause of the wake,
leading to problems when attempting to close the lid to suspend again.
After testing on a number of Samsung systems, both those affected by the
previous EC bug and those affected by the race condition, it seemed that
the best course of action was to process rather than discard the events.
On Samsung systems which accumulate stale EC events, there does not seem
to be any adverse side-effects of running the associated _Q methods.
This patch adds an argument to the static function acpi_ec_sync_query so
that it may be used within the acpi_ec_clear loop in place of
acpi_ec_query_unlocked which was used previously.
With thanks to Stefan Biereigel for reporting the issue, and for all the
people who helped test the new patch on affected systems.
Fixes: ad332c8a45 (ACPI / EC: Clear stale EC events on Samsung systems)
References: https://lkml.kernel.org/r/532FE3B2.9060808@biereigel-wb.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=44161#c173
Reported-by: Stefan Biereigel <stefan@biereigel.de>
Signed-off-by: Kieran Clancy <clancy.kieran@gmail.com>
Tested-by: Stefan Biereigel <stefan@biereigel.de>
Tested-by: Dennis Jansen <dennis.jansen@web.de>
Tested-by: Nicolas Porcel <nicolasporcel06@gmail.com>
Tested-by: Maurizio D'Addona <mauritiusdadd@gmail.com>
Tested-by: Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
Tested-by: Giannis Koutsou <giannis.koutsou@gmail.com>
Tested-by: Kieran Clancy <clancy.kieran@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad332c8a45 upstream.
A number of Samsung notebooks (530Uxx/535Uxx/540Uxx/550Pxx/900Xxx/etc)
continue to log events during sleep (lid open/close, AC plug/unplug,
battery level change), which accumulate in the EC until a buffer fills.
After the buffer is full (tests suggest it holds 8 events), GPEs stop
being triggered for new events. This state persists on wake or even on
power cycle, and prevents new events from being registered until the EC
is manually polled.
This is the root cause of a number of bugs, including AC not being
detected properly, lid close not triggering suspend, and low ambient
light not triggering the keyboard backlight. The bug also seemed to be
responsible for performance issues on at least one user's machine.
Juan Manuel Cabo found the cause of bug and the workaround of polling
the EC manually on wake.
The loop which clears the stale events is based on an earlier patch by
Lan Tianyu (see referenced attachment).
This patch:
- Adds a function acpi_ec_clear() which polls the EC for stale _Q
events at most ACPI_EC_CLEAR_MAX (currently 100) times. A warning is
logged if this limit is reached.
- Adds a flag EC_FLAGS_CLEAR_ON_RESUME which is set to 1 if the DMI
system vendor is Samsung. This check could be replaced by several
more specific DMI vendor/product pairs, but it's likely that the bug
affects more Samsung products than just the five series mentioned
above. Further, it should not be harmful to run acpi_ec_clear() on
systems without the bug; it will return immediately after finding no
data waiting.
- Runs acpi_ec_clear() on initialisation (boot), from acpi_ec_add()
- Runs acpi_ec_clear() on wake, from acpi_ec_unblock_transactions()
References: https://bugzilla.kernel.org/show_bug.cgi?id=44161
References: https://bugzilla.kernel.org/show_bug.cgi?id=45461
References: https://bugzilla.kernel.org/show_bug.cgi?id=57271
References: https://bugzilla.kernel.org/attachment.cgi?id=126801
Suggested-by: Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
Signed-off-by: Kieran Clancy <clancy.kieran@gmail.com>
Reviewed-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Dennis Jansen <dennis.jansen@web.de>
Tested-by: Kieran Clancy <clancy.kieran@gmail.com>
Tested-by: Juan Manuel Cabo <juanmanuel.cabo@gmail.com>
Tested-by: Dennis Jansen <dennis.jansen@web.de>
Tested-by: Maurizio D'Addona <mauritiusdadd@gmail.com>
Tested-by: San Zamoyski <san@plusnet.pl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- acpi_ec::mutex was called lock]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8fe9c93e74 upstream.
GCC 4.8 now generates out-of-line vr save/restore functions when
optimizing for size. They are needed for the raid6 altivec support.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2145e15e05 upstream.
Do not leak kernel-only floppy_raw_cmd structure members to userspace.
This includes the linked-list pointer and the pointer to the allocated
DMA space.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ef87dbe761 upstream.
Always clear out these floppy_raw_cmd struct members after copying the
entire structure from userspace so that the in-kernel version is always
valid and never left in an interdeterminate state.
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4291086b1f upstream.
The tty atomic_write_lock does not provide an exclusion guarantee for
the tty driver if the termios settings are LECHO & !OPOST. And since
it is unexpected and not allowed to call TTY buffer helpers like
tty_insert_flip_string concurrently, this may lead to crashes when
concurrect writers call pty_write. In that case the following two
writers:
* the ECHOing from a workqueue and
* pty_write from the process
race and can overflow the corresponding TTY buffer like follows.
If we look into tty_insert_flip_string_fixed_flag, there is:
int space = __tty_buffer_request_room(port, goal, flags);
struct tty_buffer *tb = port->buf.tail;
...
memcpy(char_buf_ptr(tb, tb->used), chars, space);
...
tb->used += space;
so the race of the two can result in something like this:
A B
__tty_buffer_request_room
__tty_buffer_request_room
memcpy(buf(tb->used), ...)
tb->used += space;
memcpy(buf(tb->used), ...) ->BOOM
B's memcpy is past the tty_buffer due to the previous A's tb->used
increment.
Since the N_TTY line discipline input processing can output
concurrently with a tty write, obtain the N_TTY ldisc output_lock to
serialize echo output with normal tty writes. This ensures the tty
buffer helper tty_insert_flip_string is not called concurrently and
everything is fine.
Note that this is nicely reproducible by an ordinary user using
forkpty and some setup around that (raw termios + ECHO). And it is
present in kernels at least after commit
d945cb9cce (pty: Rework the pty layer to
use the normal buffering logic) in 2.6.31-rc3.
js: add more info to the commit log
js: switch to bool
js: lock unconditionally
js: lock only the tty->ops->write call
References: CVE-2014-0196
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: output_lock is a member of struct tty_struct]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Dmitry Semyonov reported that after upgrading from 3.2.54 to
3.2.57 the rtl8192ce driver will crash when its interface is brought
up. The oops message shows:
[ 1833.611397] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 1833.611455] IP: [<ffffffffa0410c6a>] rtl92ce_update_hal_rate_tbl+0x29/0x4db [rtl8192ce]
...
[ 1833.613326] Call Trace:
[ 1833.613346] [<ffffffffa02ad9c6>] ? rtl92c_dm_watchdog+0xd0b/0xec9 [rtl8192c_common]
[ 1833.613391] [<ffffffff8105b5cf>] ? process_one_work+0x161/0x269
[ 1833.613425] [<ffffffff8105c598>] ? worker_thread+0xc2/0x145
[ 1833.613458] [<ffffffff8105c4d6>] ? manage_workers.isra.25+0x15b/0x15b
[ 1833.613496] [<ffffffff8105f6d9>] ? kthread+0x76/0x7e
[ 1833.613527] [<ffffffff81356b74>] ? kernel_thread_helper+0x4/0x10
[ 1833.613563] [<ffffffff8105f663>] ? kthread_worker_fn+0x139/0x139
[ 1833.613598] [<ffffffff81356b70>] ? gs_change+0x13/0x13
Disassembly of rtl92ce_update_hal_rate_tbl() shows that the 'sta'
parameter was null. None of the changes to the rtlwifi family between
3.2.54 and 3.2.57 seem to directly cause this, and reverting commit
f78bccd79b ('rtlwifi: rtl8192ce: Fix too long disable of IRQs')
doesn't fix it.
rtl92c_dm_watchdog() calls rtl92ce_update_hal_rate_tbl() via
rtl92c_dm_refresh_rate_adaptive_mask(), which does not appear in the
call trace as it was inlined. That function has been completely
removed upstream which may explain why this crash wasn't seen there.
I'm not sure that it is sensible to completely remove
rtl92c_dm_refresh_rate_adaptive_mask() without making other
compensating changes elsewhere, so try to work around this for 3.2 by
checking for a null pointer in rtl92c_dm_refresh_rate_adaptive_mask()
and then skipping the call to rtl92ce_update_hal_rate_tbl().
References: https://bugs.debian.org/745137
References: https://bugs.debian.org/745462
Reported-by: Dmitry Semyonov <linulin@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Chaoming Li <chaoming_li@realsil.com.cn>
commit 5509076d1b upstream.
During firmware download the device expects memory addresses in
big-endian byte order. As the wIndex parameter which hold the address is
sent in little-endian byte order regardless of host byte order, we need
to use swab16 rather than cpu_to_be16.
Also make sure to handle the struct ti_i2c_desc size parameter which is
returned in little-endian byte order.
Reported-by: Ludovic Drolez <ldrolez@debian.org>
Tested-by: Ludovic Drolez <ldrolez@debian.org>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01bb59ebff upstream.
When CONFIG_PCI and CONFIG_PM are not selected, xhci.c gets this
warning:
drivers/usb/host/xhci.c:409:13: warning: ‘xhci_msix_sync_irqs’ defined
but not used [-Wunused-function]
Instead of creating nested #ifdefs, this patch fixes it by defining the
xHCI PCI stubs as inline.
This warning has been in since 3.2 kernel and was
caused by commit 421aa841a1
"usb/xhci: hide MSI code behind PCI bars", but wasn't noticed
until 3.13 when a configuration with these options was tried
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f81b6d22a upstream.
We have observed a rare cycle state desync bug after Set TR Dequeue
Pointer commands on Intel LynxPoint xHCs (resulting in an endpoint that
doesn't fetch new TRBs and thus an unresponsive USB device). It always
triggers when a previous Set TR Dequeue Pointer command has set the
pointer to the final Link TRB of a segment, and then another URB gets
enqueued and cancelled again before it can be completed. Further
investigation showed that the xHC had returned the Link TRB in the TRB
Pointer field of the Transfer Event (CC == Stopped -- Length Invalid),
but when xhci_find_new_dequeue_state() later accesses the Endpoint
Context's TR Dequeue Pointer field it is set to the first TRB of the
next segment.
The driver expects those two values to be the same in this situation,
and uses the cycle state of the latter together with the address of the
former. This should be fine according to the XHCI specification, since
the endpoint ring should be stopped when returning the Transfer Event
and thus should not advance over the Link TRB before it gets restarted.
However, real-world XHCI implementations apparently don't really care
that much about these details, so the driver should follow a more
defensive approach to try to work around HC spec violations.
This patch removes the stopped_trb variable that had been used to store
the TRB Pointer from the last Transfer Event of a stopped TRB. Instead,
xhci_find_new_dequeue_state() now relies only on the Endpoint Context,
requiring a small amount of additional processing to find the virtual
address corresponding to the TR Dequeue Pointer. Some other parts of the
function were slightly rearranged to better fit into this model.
This patch should be backported to kernels as old as 2.6.31 that contain
the commit ae63674714 "USB: xhci: URB
cancellation support."
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1c70d8fb4d upstream.
Currently, with inode cache enabled, we will reuse its inode id immediately
after unlinking file, we may hit something like following:
|->iput inode
|->return inode id into inode cache
|->create dir,fsync
|->power off
An easy way to reproduce this problem is:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt -o inode_cache,commit=100
dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
inode_id=`ls -i /mnt/data | awk '{print $1}'`
rm -f /mnt/data
i=1
while [ 1 ]
do
mkdir /mnt/dir_$i
test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
if [ $test1 -eq $inode_id ]
then
dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
echo b > /proc/sysrq-trigger
fi
sleep 1
i=$(($i+1))
done
mount /dev/sdb /mnt
umount /dev/sdb
btrfs check /dev/sdb
We fix this problem by adding unlinked inode's id into pinned tree,
and we can not reuse them until committing transaction.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff76b05655 upstream.
Due to an off-by-one error, it is possible to reproduce a bug
when the inode cache is used.
The same inode number is assigned twice, the second time this
leads to an EEXIST in btrfs_insert_empty_items().
The issue can happen when a file is removed right after a subvolume
is created and then a new inode number is created before the
inodes in free_inode_pinned are processed.
unlink() calls btrfs_return_ino() which calls start_caching() in this
case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
searching for the highest inode (which already cannot find the
unlinked one anymore in btrfs_find_free_objectid()). So if this
unlinked inode's number is equal to the highest_ino + 1 (or >= this value
instead of > this value which was the off-by-one error), we mustn't add
the inode number to free_ino_pinned (caching_thread() does it right).
In this case we need to try directly to add the number to the inode_cache
which will fail in this case.
When this inode number is allocated while it is still in free_ino_pinned,
it is allocated and still added to the free inode cache when the
pinned inodes are processed, thus one of the following inode number
allocations will get an inode that is already in use and fail with EEXIST
in btrfs_insert_empty_items().
One example which was created with the reproducer below:
Create a snapshot, work in the newly created snapshot for the rest.
In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
start_caching() calls add_free_space [34284, 18446744073709517077].
In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
btrfs_unpin_free_ino calls add_free_space [34284, 1].
mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
EEXIST when the new inode is inserted.
One possible reproducer is this one:
#!/bin/sh
# preparation
TEST_DEV=/dev/sdc1
TEST_MNT=/mnt
umount ${TEST_MNT} 2>/dev/null || true
mkfs.btrfs -f ${TEST_DEV}
mount ${TEST_DEV} ${TEST_MNT} -o \
rw,relatime,compress=lzo,space_cache,inode_cache
btrfs subv create ${TEST_MNT}/s1
for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
rm ${TEST_MNT}/s2/$FILENAME
touch ${TEST_MNT}/s2/$FILENAME
# the following steps can be repeated to reproduce the issue again and again
[ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
rm ${TEST_MNT}/s3/$FILENAME
touch ${TEST_MNT}/s3/$FILENAME
ls -alFi ${TEST_MNT}/s?/$FILENAME
touch ${TEST_MNT}/s3/_1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_1
touch ${TEST_MNT}/s3/_2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/_2
touch ${TEST_MNT}/s3/__1 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__1
touch ${TEST_MNT}/s3/__2 || logger FAILED
ls -alFi ${TEST_MNT}/s?/__2
# if the above is not enough, add the following loop:
for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
#for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
# one of the touch(1) calls in s3 fail due to EEXIST because the inode is
# already in use that btrfs_find_ino_for_alloc() returns.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 10164c2ad6 upstream.
Fix driver new_id sysfs-attribute removal deadlock by making sure to
not hold any locks that the attribute operations grab when removing the
attribute.
Specifically, usb_serial_deregister holds the table mutex when
deregistering the driver, which includes removing the new_id attribute.
This can lead to a deadlock as writing to new_id increments the
attribute's active count before trying to grab the same mutex in
usb_serial_probe.
The deadlock can easily be triggered by inserting a sleep in
usb_serial_deregister and writing the id of an unbound device to new_id
during module unload.
As the table mutex (in this case) is used to prevent subdriver unload
during probe, it should be sufficient to only hold the lock while
manipulating the usb-serial driver list during deregister. A racing
probe will then either fail to find a matching subdriver or fail to get
the corresponding module reference.
Since v3.15-rc1 this also triggers the following lockdep warning:
======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc2 #123 Tainted: G W
-------------------------------------------------------
modprobe/190 is trying to acquire lock:
(s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
but task is already holding lock:
(table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (table_lock){+.+.+.}:
[<c0075f84>] __lock_acquire+0x1694/0x1ce4
[<c0076de8>] lock_acquire+0xb4/0x154
[<c03af3cc>] _raw_spin_lock+0x4c/0x5c
[<c02bbc24>] usb_store_new_id+0x14c/0x1ac
[<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
[<c025f568>] drv_attr_store+0x30/0x3c
[<c01690e0>] sysfs_kf_write+0x5c/0x60
[<c01682c0>] kernfs_fop_write+0xd4/0x194
[<c010881c>] vfs_write+0xbc/0x198
[<c0108e4c>] SyS_write+0x4c/0xa0
[<c000f880>] ret_fast_syscall+0x0/0x48
-> #0 (s_active#4){++++.+}:
[<c03a7a28>] print_circular_bug+0x68/0x2f8
[<c0076218>] __lock_acquire+0x1928/0x1ce4
[<c0076de8>] lock_acquire+0xb4/0x154
[<c0166b70>] __kernfs_remove+0x254/0x310
[<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
[<c0169fb8>] remove_files.isra.1+0x48/0x84
[<c016a2fc>] sysfs_remove_group+0x58/0xac
[<c016a414>] sysfs_remove_groups+0x34/0x44
[<c02623b8>] driver_remove_groups+0x1c/0x20
[<c0260e9c>] bus_remove_driver+0x3c/0xe4
[<c026235c>] driver_unregister+0x38/0x58
[<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
[<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
[<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
[<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
[<c009d6cc>] SyS_delete_module+0x184/0x210
[<c000f880>] ret_fast_syscall+0x0/0x48
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(table_lock);
lock(s_active#4);
lock(table_lock);
lock(s_active#4);
*** DEADLOCK ***
1 lock held by modprobe/190:
#0: (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
stack backtrace:
CPU: 0 PID: 190 Comm: modprobe Tainted: G W 3.15.0-rc2 #123
[<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
[<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
[<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
[<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
[<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
[<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
[<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
[<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
[<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
[<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
[<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
[<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
[<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
[<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
[<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
[<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
[<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
[<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
[<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80bb3ef109 upstream.
In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result.
When viewing ftrace record in big-endian ARM systems, we found that
the timestamp errors:
swapper-0 [001] 1325.970000: 0:120:R ==> [001] 16:120:R events/1
events/1-16 [001] 1325.970000: 16:120:S ==> [001] 0:120:R swapper
swapper-0 [000] 1325.1000000: 0:120:R + [000] 15:120:R events/0
swapper-0 [000] 1325.1000000: 0:120:R ==> [000] 15:120:R events/0
swapper-0 [000] 1326.030000: 0:120:R + [000] 1150:120:R sshd
swapper-0 [000] 1326.030000: 0:120:R ==> [000] 1150:120:R sshd
When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2".
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Alex Wu <wuquanming@huawei.com>
Signed-off-by: Xiangyu Lu <luxiangyu@huawei.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b17844b29 upstream.
fixup_user_fault() is used by the futex code when the direct user access
fails, and the futex code wants it to either map in the page in a usable
form or return an error. It relied on handle_mm_fault() to map the
page, and correctly checked the error return from that, but while that
does map the page, it doesn't actually guarantee that the page will be
mapped with sufficient permissions to be then accessed.
So do the appropriate tests of the vma access rights by hand.
[ Side note: arguably handle_mm_fault() could just do that itself, but
we have traditionally done it in the caller, because some callers -
notably get_user_pages() - have been able to access pages even when
they are mapped with PROT_NONE. Maybe we should re-visit that design
decision, but in the meantime this is the minimal patch. ]
Found by Dave Jones running his trinity tool.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46a2986ebb upstream.
We expect that all the Haswell series will need such quirks, sigh.
The T431s seems to be T430 hardware in a T440s case, using the T440s touchpad,
with the same min/max issue.
The X1 Carbon 3rd generation name says 2nd while it is a 3rd generation.
The X1 and T431s share a PnPID with the T540p, but the reported ranges are
closer to those of the T440s.
HdG: Squashed 5 quirk patches into one. T431s + L440 + L540 are written by me,
S1 Yoga and X1 are written by Benjamin Tissoires.
Hdg: Standardized S1 Yoga and X1 values, Yoga uses the same touchpad as the
X240, X1 uses the same touchpad as the T440.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5017b28513 upstream.
dmi_match() considers a substring match to be a successful match. This is
not always sufficient to distinguish between DMI data for different
systems. Add support for exact string matching using strcmp() in addition
to the substring matching using strstr().
The specific use case in the i915 driver is to allow us to use an exact
match for D510MO, without also incorrectly matching D510MOV:
{
.ident = "Intel D510MO",
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "Intel"),
DMI_EXACT_MATCH(DMI_BOARD_NAME, "D510MO"),
},
}
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Cc: <annndddrr@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Cornel Panceac <cpanceac@gmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8a4aeec8d2 upstream.
The AHCI spec allows implementations to issue commands in tag order
rather than FIFO order:
5.3.2.12 P:SelectCmd
HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
or HBA selects the command to issue that has had the
PxCI bit set to '1' longer than any other command
pending to be issued.
The result is that commands posted sequentially (time-wise) may play out
of sequence when issued by hardware.
This behavior has likely been hidden by drives that arrange for commands
to complete in issue order. However, it appears recent drives (two from
different vendors that we have found so far) inflict out-of-order
completions as a matter of course. So, we need to take care to maintain
ordered submission, otherwise we risk triggering a drive to fall out of
sequential-io automation and back to random-io processing, which incurs
large latency and degrades throughput.
This issue was found in simple benchmarks where QD=2 seq-write
performance was 30-50% *greater* than QD=32 seq-write performance.
Tagging for -stable and making the change globally since it has a low
risk-to-reward ratio. Also, word is that recent versions of an unnamed
OS also does it this way now. So, drives in the field are already
experienced with this tag ordering scheme.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
Reviewed-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e01280d28 upstream.
This reverts commit 1ebca9dad5.
This device was erroneously added to the sierra driver even though it's
not a Sierra device and was already handled by the option driver.
Cc: Richard Farina <sidhayn@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c98235cb85 upstream.
The mlx4 driver is triggering schedules while atomic inside
mlx4_en_netpoll:
spin_lock_irqsave(&cq->lock, flags);
napi_synchronize(&cq->napi);
^^^^^ msleep here
mlx4_en_process_rx_cq(dev, cq, 0);
spin_unlock_irqrestore(&cq->lock, flags);
This was part of a patch by Alexander Guller from Mellanox in 2011,
but it still isn't upstream.
Signed-off-by: Chris Mason <clm@fb.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f1c6bb2cb8 upstream.
A fl->fl_break_time of 0 has a special meaning to the lease break code
that basically means "never break the lease". knfsd uses this to ensure
that leases don't disappear out from under it.
Unfortunately, the code in __break_lease can end up passing this value
to wait_event_interruptible as a timeout, which prevents it from going
to sleep at all. This makes __break_lease to spin in a tight loop and
causes soft lockups.
Fix this by ensuring that we pass a minimum value of 1 as a timeout
instead.
Cc: J. Bruce Fields <bfields@fieldses.org>
Reported-by: Terry Barnaby <terry1@beam.ltd.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ab3e55b119 upstream.
This bug was detected with the libio-epoll-perl debian package where the
test case IO-Ppoll-compat.t failed.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e6358fc3c upstream.
We haven't taken i_mutex yet, so we need to use i_size_read().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9503c67c93 upstream.
ext4_end_bio() currently throws away the error that it receives. Chances
are this is part of a spate of errors, one of which will end up getting
the error returned to userspace somehow, but we shouldn't take that risk.
Also print out the errno to aid in debug.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4adb6ab3e0 upstream.
When we try to get 2^32-1 block of the file which has the extent
(ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON
in ext4_ext_put_gap_in_cache().
To avoid the problem, ext4_map_blocks() needs to check the file logical block
number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot
handle 2^32-1 because the maximum file logical block number is 2^32-2.
Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid.
So ext4_map_blocks() should also return the same errno.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 22c73795b1 upstream.
This patch reorders reported frequencies from the highest to the lowest,
just like in other frequency drivers.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: cpu_frequency_table::driver_data is called index]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d82b922a4a upstream.
The powernow-k6 driver used to read the initial multiplier from the
powernow register. However, there is a problem with this:
* If there was a frequency transition before, the multiplier read from the
register corresponds to the current multiplier.
* If there was no frequency transition since reset, the field in the
register always reads as zero, regardless of the current multiplier that
is set using switches on the mainboard and that the CPU is running at.
The zero value corresponds to multiplier 4.5, so as a consequence, the
powernow-k6 driver always assumes multiplier 4.5.
For example, if we have 550MHz CPU with bus frequency 100MHz and
multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5
and bus frequency is 122MHz. The powernow-k6 driver then sets the
multiplier to 4.5, underclocking the CPU to 450MHz, but reports the
current frequency as 550MHz.
There is no reliable way how to read the initial multiplier. I modified
the driver so that it contains a table of known frequencies (based on
parameters of existing CPUs and some common overclocking schemes) and sets
the multiplier according to the frequency. If the frequency is unknown
(because of unusual overclocking or underclocking), the user must supply
the bus speed and maximum multiplier as module parameters.
This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- s/driver_data/index/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e20e1d0ac0 upstream.
I found out that a system with k6-3+ processor is unstable during network
server load. The system locks up or the network card stops receiving. The
reason for the instability is the CPU frequency scaling.
During frequency transition the processor is in "EPM Stop Grant" state.
The documentation says that the processor doesn't respond to inquiry
requests in this state. Consequently, coherency of processor caches and
bus master devices is not maintained, causing the system instability.
This patch flushes the cache during frequency transition. It fixes the
instability.
Other minor changes:
* u64 invalue changed to unsigned long because the variable is 32-bit
* move the logic to set the multiplier to a separate function
powernow_k6_set_cpu_multiplier
* preserve lower 5 bits of the powernow port instead of 4 (the voltage
field has 5 bits)
* mask interrupts when reading the multiplier, so that the port is not
open during other activity (running other kernel code with the port open
shouldn't cause any misbehavior, but we should better be safe and keep
the port closed)
This patch should be backported to all stable kernels. If it doesn't
apply cleanly, change it, or ask me to change it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f64410ec66 upstream.
This patch is based on an earlier patch by Eric Paris, he describes
the problem below:
"If an inode is accessed before policy load it will get placed on a
list of inodes to be initialized after policy load. After policy
load we call inode_doinit() which calls inode_doinit_with_dentry()
on all inodes accessed before policy load. In the case of inodes
in procfs that means we'll end up at the bottom where it does:
/* Default to the fs superblock SID. */
isec->sid = sbsec->sid;
if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
if (opt_dentry) {
isec->sclass = inode_mode_to_security_class(...)
rc = selinux_proc_get_sid(opt_dentry,
isec->sclass,
&sid);
if (rc)
goto out_unlock;
isec->sid = sid;
}
}
Since opt_dentry is null, we'll never call selinux_proc_get_sid()
and will leave the inode labeled with the label on the superblock.
I believe a fix would be to mimic the behavior of xattrs. Look
for an alias of the inode. If it can't be found, just leave the
inode uninitialized (and pick it up later) if it can be found, we
should be able to call selinux_proc_get_sid() ..."
On a system exhibiting this problem, you will notice a lot of files in
/proc with the generic "proc_t" type (at least the ones that were
accessed early in the boot), for example:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
However, with this patch in place we see the expected result:
# ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a94cdd1f4d upstream.
In read_all_bytes, we do
unsigned char i;
...
bt->read_data[0] = BMC2HOST;
bt->read_count = bt->read_data[0];
...
for (i = 1; i <= bt->read_count; i++)
bt->read_data[i] = BMC2HOST;
If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the
'for' loop. Make 'i' an 'int' instead of 'char' to get rid of the
overflow and finish the loop after 255 iterations every time.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Reported-and-debugged-by: Rui Hui Dian <rhdian@novell.com>
Cc: Tomas Cech <tcech@suse.cz>
Cc: Corey Minyard <minyard@acm.org>
Cc: <openipmi-developer@lists.sourceforge.net>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e4af376d04b0(drivers: hv: switch to use mb() instead of smp_mb()),
the adjustment mistakenly dropped the change in hv_ringbuffer_read,
so add it.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2c42be2dd4 upstream.
ft_del_tpg checks tpg->tport is set before unlinking the tpg from the
tport when the tpg is being removed. Set this pointer in ft_tport_create,
or the unlinking won't happen in ft_del_tpg and tport->tpg will reference
a deleted object.
This patch sets tpg->tport in ft_tport_create, because that's what
ft_del_tpg checks, and is the only way to get back to the tport to
clear tport->tpg.
The bug was occuring when:
- lport created, tport (our per-lport, per-provider context) is
allocated.
tport->tpg = NULL
- tpg created
- a PRLI is received. ft_tport_create is called, tpg is found and
tport->tpg is set
- tpg removed. ft_tpg is freed in ft_del_tpg. Since tpg->tport was not
set, tport->tpg is not cleared and points at freed memory
- Future calls to ft_tport_create return tport via first conditional,
instead of searching for new tpg by calling ft_lport_find_tpg.
tport->tpg is still invalid, and will access freed memory.
see https://bugzilla.redhat.com/show_bug.cgi?id=1071340
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b3b42ac2cb upstream.
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. We have
a software workaround for that ("espfix") for the 32-bit kernel, but
it relies on a nonzero stack segment base which is not available in
32-bit mode.
Since 16-bit support is somewhat crippled anyway on a 64-bit kernel
(no V86 mode), and most (if not quite all) 64-bit processors support
virtualization for the users who really need it, simply reject
attempts at creating a 16-bit segment when running on top of a 64-bit
kernel.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 12cd43c6ed upstream.
Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
functions isn't safe. On my machine it causes delayed (!) CPU exception:
Disabling lock debugging due to kernel taint
mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
mce: [Hardware Error]: TSC 164083803dc
mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check on current CPU
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e39435ce68 upstream.
I got a bug report yesterday from Laszlo Ersek in which he states that
his kvm instance fails to suspend. Laszlo bisected it down to this
commit 1cf7e9c68f ("virtio_blk: blk-mq support") where virtio-blk is
converted to use the blk-mq infrastructure.
After digging a bit, it became clear that the issue was with the queue
drain. blk-mq tracks queue usage in a percpu counter, which is
incremented on request alloc and decremented when the request is freed.
The initial hunt was for an inconsistency in blk-mq, but everything
seemed fine. In fact, the counter only returned crazy values when
suspend was in progress.
When a CPU is unplugged, the percpu counters merges that CPU state with
the general state. blk-mq takes care to register a hotcpu notifier with
the appropriate priority, so we know it runs after the percpu counter
notifier. However, the percpu counter notifier only merges the state
when the CPU is fully gone. This leaves a state transition where the
CPU going away is no longer in the online mask, yet it still holds
private values. This means that in this state, percpu_counter_sum()
returns invalid results, and the suspend then hangs waiting for
abs(dead-cpu-value) requests to complete which of course will never
happen.
Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state. This bug
has been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4f8e940095 upstream.
PCM pointer callbacks in ice1712 driver check the buffer size boundary
wrongly between bytes and frames. This leads to PCM core warnings
like:
snd_pcm_update_hw_ptr0: 105 callbacks suppressed
ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730
This patch fixes these checks to be placed after the proper unit
conversions.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dfccbb5e49 upstream.
wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock. If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
The last transition is racy, this is even documented in 50b8d25748
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race". wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else. So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.
Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55f67141a8 upstream.
When I decrease the value of nr_hugepage in procfs a lot, softlockup
happens. It is because there is no chance of context switch during this
process.
On the other hand, when I allocate a large number of hugepages, there is
some chance of context switch. Hence softlockup doesn't happen during
this process. So it's necessary to add the context switch in the
freeing process as same as allocating process to avoid softlockup.
When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
process occupied a CPU over 150 seconds and following softlockup message
appeared twice or more.
$ echo 6000000 > /proc/sys/vm/nr_hugepages
$ cat /proc/sys/vm/nr_hugepages
6000000
$ grep ^Huge /proc/meminfo
HugePages_Total: 6000000
HugePages_Free: 6000000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ echo 0 > /proc/sys/vm/nr_hugepages
BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
Call Trace:
free_pool_huge_page+0xb8/0xd0
set_max_huge_pages+0x128/0x190
hugetlb_sysctl_handler_common+0x113/0x140
hugetlb_sysctl_handler+0x1e/0x20
proc_sys_call_handler+0x97/0xd0
proc_sys_write+0x14/0x20
vfs_write+0xb8/0x1a0
sys_write+0x51/0x90
__audit_syscall_exit+0x265/0x290
system_call_fastpath+0x16/0x1b
I have not confirmed this problem with upstream kernels because I am not
able to prepare the machine equipped with 12TB memory now. However I
confirmed that the amount of decreasing hugepages was directly
proportional to the amount of required time.
I measured required times on a smaller machine. It showed 130-145
hugepages decreased in a millisecond.
Amount of decreasing Required time Decreasing rate
hugepages (msec) (pages/msec)
------------------------------------------------------------
10,000 pages == 20GB 70 - 74 135-142
30,000 pages == 60GB 208 - 229 131-144
It means decrement of 6TB hugepages will trigger softlockup with the
default threshold 20sec, in this decreasing rate.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57e68e9cd6 upstream.
A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
fuzzing with trinity. The call site try_to_unmap_cluster() does not lock
the pages other than its check_page parameter (which is already locked).
The BUG_ON in mlock_vma_page() is not documented and its purpose is
somewhat unclear, but apparently it serializes against page migration,
which could otherwise fail to transfer the PG_mlocked flag. This would
not be fatal, as the page would be eventually encountered again, but
NR_MLOCK accounting would become distorted nevertheless. This patch adds
a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
effect.
The call site try_to_unmap_cluster() is fixed so that for page !=
check_page, trylock_page() is attempted (to avoid possible deadlocks as we
already have check_page locked) and mlock_vma_page() is performed only
upon success. If the page lock cannot be obtained, the page is left
without PG_mlocked, which is again not a problem in the whole unevictable
memory design.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d444edc679 upstream.
This patch fixes a long-standing bug in iscsit_build_conn_drop_async_message()
where during ERL=2 connection recovery, a bogus conn_p pointer could
end up being used to send the ISCSI_OP_ASYNC_EVENT + DROPPING_CONNECTION
notifying the initiator that cmd->logout_cid has failed.
The bug was manifesting itself as an OOPs in iscsit_allocate_cmd() with
a bogus conn_p pointer in iscsit_build_conn_drop_async_message().
Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com>
Reported-by: santosh kulkarni <santosh.kulkarni@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ded2cf7141 upstream.
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node. After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value. This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.
old recovery master: new recovery master:
dlm_remaster_locks()
become recovery master for
another dead node.
dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
return -EAGAIN;
}
dlm_set_reco_master(dlm, br->node_idx);
dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
will hang in dlm_remaster_locks() for
request dlm locks info
Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.
A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 34aa8dac48 upstream.
This issue was introduced by commit 800deef3f6 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry. The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this. Sunil advised reverting it back, but the old version was
also not right. At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic. So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.
Another concern is that this seemes can not happen because the "tmpq"
list should not be empty. Let me describe how.
old lock resource owner(node 1): migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
dlm_pick_migration_target() {
pick node 2 as target as its lock is the first one
in granted list.
}
dlm_migrate_lockres() {
dlm_mark_lockres_migrating() {
res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
//after the above code, we can not dirty lockres any more,
// so dlm_thread shuffle list will not run
downconvert lock from EX to NR
upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.
// will set lockres RES_MIGRATING flag, the following
// lock/unlock can not run
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data()
for (i=0; i<mres->num_locks; i++)
if (ml->node == dlm->node_num)
for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
list_for_each_entry(lock, tmpq, list)
if (lock) break; <<< lock is invalid as grant list is empty.
}
if (lock->ml.node != ml->node)
BUG() >>> crash here
}
I see the above locks status from a vmcore of our internal bug.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a0c32761e7 upstream.
Kees reported the following error:
arch/sh/kernel/dumpstack.c: In function 'print_trace_address':
arch/sh/kernel/dumpstack.c:118:2: error: format not a string literal and no format arguments [-Werror=format-security]
Use the "%s" format so that it's impossible to interpret 'data' as a
format string.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reported-by: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1608627935 upstream.
This needs to be done to update some of the fields in
the connector structure used by the audio code.
Noticed by several users on irc.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 01d8885785 upstream.
jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)
The -ENOENT is due to readdir calling dir_emit on the same entry twice.
If the dir_emit callback sleeps and the tree is changed underneath us,
we won't be able to trust deh_offset(deh) anymore. We need to save
next_pos before we might sleep so we can find the next entry.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5bdb0f02ad upstream.
In case of error when writing to userspace, function ehca_create_cq()
does not set an error code before following its error path.
This patch sets the error code to -EFAULT when ib_copy_to_udata()
fails.
This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08e74c4b00 upstream.
In case of error when writing to userspace, the function mthca_create_cq()
does not set an error code before following its error path.
This patch sets the error code to -EFAULT when ib_copy_to_udata() fails.
This was caught when using spatch (aka. coccinelle)
to rewrite call to ib_copy_{from,to}_udata().
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4b7f21d7b upstream.
The `lspci -nnvv` output contains (wrapped for line length):
00:1b.0 Audio device [0403]:
Intel Corporation 7 Series/C210 Series Chipset Family
High Definition Audio Controller [8086:1e20] (rev 04)
Subsystem: ASUSTeK Computer Inc. Device [1043:115d]
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c14af233fb upstream.
The original MIPS hibernate code flushes cache and TLB entries in
swsusp_arch_resume(). But they are removed in Commit 44eeab6741
(MIPS: Hibernation: Remove SMP TLB and cacheflushing code.). A cross-
CPU flush is surely unnecessary because all but the local CPU have
already been disabled. But a local flush (at least the TLB flush) is
needed. When we do hibernation on Loongson-3 with an E1000E NIC, it is
very easy to produce a kernel panic (kernel page fault, or unaligned
access). The root cause is E1000E driver use vzalloc_node() to allocate
pages, the stale TLB entries of the booting kernel will be misused by
the resumed target kernel.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/6643/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fe76cd88e6 upstream.
If unable to ensure_next_mapping() we must add the current bio, which
was removed from the @bios list via bio_list_pop, back to the
deferred_bios list before all the remaining @bios.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e1f23f3dd8 upstream.
This is *not* bisected, but the likely regression is
commit c35614380d
Author: Zhao Yakui <yakui.zhao@intel.com>
Date: Tue Nov 24 09:48:48 2009 +0800
drm/i915: Don't set up the TV port if it isn't in the BIOS table.
The commit does not check for all TV device types that might be present
in the VBT, disabling TV out for the missing ones. Add composite
S-video.
Reported-and-tested-by: Matthew Khouzam <matthew.khouzam@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73362
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: s/old\.device_type/device_type/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f67f18993 upstream.
Looks like this bug has been here since these write counts were
introduced, not sure why it was just noticed now.
Thanks also to Jan Kara for pointing out the problem.
Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ca3ba2a2f4 upstream.
This patch bypass the timer_irq_works() check for hyperv guest since:
- It was guaranteed to work.
- timer_irq_works() may fail sometime due to the lpj calibration were inaccurate
in a hyperv guest or a buggy host.
In the future, we should get the tsc frequency from hypervisor and use preset
lpj instead.
[ hpa: I would prefer to not defer things to "the future" in the future... ]
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1393558229-14755-1-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a585f87c86 upstream.
The scenario here is that someone calls enable_irq_wake() from somewhere
in the code. This will result in the lockdep producing a backtrace as can
be seen below. In my case, this problem is triggered when using the wl1271
(TI WlCore) driver found in drivers/net/wireless/ti/ .
The problem cause is rather obvious from the backtrace, but let's outline
the dependency. enable_irq_wake() grabs the IRQ buslock in irq_set_irq_wake(),
which in turns calls mxs_gpio_set_wake_irq() . But mxs_gpio_set_wake_irq()
calls enable_irq_wake() again on the one-level-higher IRQ , thus it tries to
grab the IRQ buslock again in irq_set_irq_wake() . Because the spinlock in
irq_set_irq_wake()->irq_get_desc_buslock()->__irq_get_desc_lock() is not
marked as recursive, lockdep will spew the stuff below.
We know we can safely re-enter the lock, so use IRQ_GC_INIT_NESTED_LOCK to
fix the spew.
=============================================
[ INFO: possible recursive locking detected ]
3.10.33-00012-gf06b763-dirty #61 Not tainted
---------------------------------------------
kworker/0:1/18 is trying to acquire lock:
(&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88
but task is already holding lock:
(&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/0:1/18:
#0: (events){.+.+.+}, at: [<c0036308>] process_one_work+0x134/0x4a4
#1: ((&fw_work->work)){+.+.+.}, at: [<c0036308>] process_one_work+0x134/0x4a4
#2: (&irq_desc_lock_class){-.-...}, at: [<c00685f0>] __irq_get_desc_lock+0x48/0x88
stack backtrace:
CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 3.10.33-00012-gf06b763-dirty #61
Workqueue: events request_firmware_work_func
[<c0013eb4>] (unwind_backtrace+0x0/0xf0) from [<c0011c74>] (show_stack+0x10/0x14)
[<c0011c74>] (show_stack+0x10/0x14) from [<c005bb08>] (__lock_acquire+0x140c/0x1a64)
[<c005bb08>] (__lock_acquire+0x140c/0x1a64) from [<c005c6a8>] (lock_acquire+0x9c/0x104)
[<c005c6a8>] (lock_acquire+0x9c/0x104) from [<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58)
[<c051d5a4>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c00685f0>] (__irq_get_desc_lock+0x48/0x88)
[<c00685f0>] (__irq_get_desc_lock+0x48/0x88) from [<c0068e78>] (irq_set_irq_wake+0x20/0xf4)
[<c0068e78>] (irq_set_irq_wake+0x20/0xf4) from [<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24)
[<c027260c>] (mxs_gpio_set_wake_irq+0x1c/0x24) from [<c0068cf4>] (set_irq_wake_real+0x30/0x44)
[<c0068cf4>] (set_irq_wake_real+0x30/0x44) from [<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4)
[<c0068ee4>] (irq_set_irq_wake+0x8c/0xf4) from [<c0310748>] (wlcore_nvs_cb+0x10c/0x97c)
[<c0310748>] (wlcore_nvs_cb+0x10c/0x97c) from [<c02be5e8>] (request_firmware_work_func+0x38/0x58)
[<c02be5e8>] (request_firmware_work_func+0x38/0x58) from [<c0036394>] (process_one_work+0x1c0/0x4a4)
[<c0036394>] (process_one_work+0x1c0/0x4a4) from [<c0036a4c>] (worker_thread+0x138/0x394)
[<c0036a4c>] (worker_thread+0x138/0x394) from [<c003cb74>] (kthread+0xa4/0xb0)
[<c003cb74>] (kthread+0xa4/0xb0) from [<c000ee00>] (ret_from_fork+0x14/0x34)
wlcore: loaded
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3bbb24b20a upstream.
Zach found this deadlock that would happen like this
btrfs_end_transaction <- reduce trans->use_count to 0
btrfs_run_delayed_refs
btrfs_cow_block
find_free_extent
btrfs_start_transaction <- increase trans->use_count to 1
allocate chunk
btrfs_end_transaction <- decrease trans->use_count to 0
btrfs_run_delayed_refs
lock tree block we are cowing above ^^
We need to only decrease trans->use_count if it is above 1, otherwise leave it
alone. This will make nested trans be the only ones who decrease their added
ref, and will let us get rid of the trans->use_count++ hack if we have to commit
the transaction. Thanks,
Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Tested-by: Zach Brown <zab@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c92cdeb45e upstream.
sys_getppid() returns the parent pid of the current process in its own pid
namespace. Since audit filters are based in the init pid namespace, a process
could avoid a filter or trigger an unintended one by being in an alternate pid
namespace or log meaningless information.
Switch to task_ppid_nr() for PPIDs to anchor all audit filters in the
init_pid_ns.
(informed by ebiederman's 6c621b7e)
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[bwh: Backported to 3.2: sys_getppid() is used by audit_exit() but not
audit_log_task_info()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ad36d28293 upstream.
Added the functions task_ppid_nr_ns() and task_ppid_nr() to abstract the lookup
of the PPID (real_parent's pid_t) of a process, including rcu locking, in the
arbitrary and init_pid_ns.
This provides an alternative to sys_getppid(), which is relative to the child
process' pid namespace.
(informed by ebiederman's 6c621b7e)
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 159ce52a6b upstream.
During probe the driver allocates dummy I2C device for companion chip
with i2c_new_dummy() but it does not check the return value of this call.
In case of error (i2c_new_device(): memory allocation failure or I2C
address cannot be used) this function returns NULL which is later used
by regmap_init_i2c().
If i2c_new_dummy() fails for companion device, fail also the probe for
main MFD driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
[bwh: Backported to 3.2:
- Adjust filename, context
- Add kfree() before return, as driver is not using managed allocations]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 96cf3dedc4 upstream.
During probe the driver allocates dummy I2C devices for RTC and ADC
with i2c_new_dummy() but it does not check the return value of this
calls.
In case of error (i2c_new_device(): memory allocation failure or I2C
address cannot be used) this function returns NULL which is later used
by i2c_unregister_device().
If i2c_new_dummy() fails for RTC or ADC devices, fail also the probe
for main MFD driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed26f87b9f upstream.
During probe the driver allocates dummy I2C device for RTC with i2c_new_dummy() but it does not check the return value of this call.
In case of error (i2c_new_device(): memory allocation failure or I2C
address cannot be used) this function returns NULL which is later used
by i2c_unregister_device().
If i2c_new_dummy() fails for RTC device, fail also the probe for
main MFD driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97dc4ed3fa upstream.
During probe the driver allocates dummy I2C devices for RTC, haptic and
MUIC with i2c_new_dummy() but it does not check the return value of this
calls.
In case of error (i2c_new_device(): memory allocation failure or I2C
address cannot be used) this function returns NULL which is later used
by i2c_unregister_device().
If i2c_new_dummy() fails for RTC, haptic or MUIC devices, fail also the
probe for main MFD driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6e6e660ba upstream.
It is currently not possible to select the SA1100 or Vexpress
drivers in the MFD subsystem, because the menu for the entire
subsystem ends before these options are presented.
Move the main menu closing and the endif for HAS_IOMEM to the
end of the file so these are selectable again.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9d194d1025 upstream.
In case of error while accessing to userspace memory, function
nes_create_qp() returns NULL instead of an error code wrapped through
ERR_PTR(). But NULL is not expected by ib_uverbs_create_qp(), as it
check for error with IS_ERR().
As page 0 is likely not mapped, it is going to trigger an Oops when
the kernel will try to dereference NULL pointer to access to struct
ib_qp's fields.
In some rare cases, page 0 could be mapped by userspace, which could
turn this bug to a vulnerability that could be exploited: the function
pointers in struct ib_device will be under userspace total control.
This was caught when using spatch (aka. coccinelle)
to rewrite calls to ib_copy_{from,to}_udata().
Link: https://www.gitorious.org/opteya/ib-hw-nes-create-qp-null
Link: 75ebf2c103:ib_copy_udata.cocci
Link: http://marc.info/?i=cover.1394485254.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a2cb0eb8a6 upstream.
Guard against a potential buffer overrun. The size to read from the
user is passed in, and due to the padding that needs to be taken into
account, as well as the place holder for the ICRC it is possible to
overflow the 32bit value which would cause more data to be copied from
user space than is allocated in the buffer.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3b3e0efb5c upstream.
qi->tqi_readyTime is written directly to registers that expect
microseconds as unit instead of TU.
When setting the CABQ ready time, cur_conf->beacon_interval is in TU, so
convert it to microseconds before passing it to ath9k_hw.
This should hopefully fix some Tx DMA issues with buffered multicast
frames in AP mode.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c063449394 upstream.
Commit 9cb00419fa, which enables hole punching for bigalloc file
systems, exposed a bug introduced by commit 6ae06ff51e in an earlier
release. When run on a bigalloc file system, xfstests generic/013, 068,
075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors
or cause kernel error messages indicating that previously freed blocks
are being freed again.
The latter commit optimizes the selection of the starting extent in
ext4_ext_rm_leaf() when hole punching by beginning with the extent
supplied in the path argument rather than with the last extent in the
leaf node (as is still done when truncating). However, the code in
rm_leaf that initially sets partial_cluster to track cluster sharing on
extent boundaries is only guaranteed to run if rm_leaf starts with the
last node in the leaf. Consequently, partial_cluster is not correctly
initialized when hole punching, and a cluster on the boundary of a
punched region that should be retained may instead be deallocated.
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f74ef0f2d upstream.
When adding or removing 100G from a balloon:
BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367]
We have a wait_event_interruptible(), but the condition is always true
(more ballooning to do) so we don't ever sleep. We also have a
wait_event() for the host to ack, but that is also always true as QEMU
is synchronous for balloon operations.
Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f88ba6a2a4 upstream.
I got an error on v3.13:
BTRFS error (device sdf1) in write_all_supers:3378: errno=-5 IO failure (errors while submitting device barriers.)
how to reproduce:
> mkfs.btrfs -f -d raid1 /dev/sdf1 /dev/sdf2
> wipefs -a /dev/sdf2
> mount -o degraded /dev/sdf1 /mnt
> btrfs balance start -f -sconvert=single -mconvert=single -dconvert=single /mnt
The reason of the error is that barrier_all_devices() failed to submit
barrier to the missing device. However it is clear that we cannot do
anything on missing device, and also it is not necessary to care chunks
on the missing device.
This patch stops sending/waiting barrier if device is missing.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2610decdd0 upstream.
In commit f78bccd79b entitled "rtlwifi:
rtl8192ce: Fix too long disable of IRQs", Olivier Langlois
<olivier@trillion01.com> fixed a problem caused by an extra long disabling
of interrupts. This patch makes the same fix for rtl8192se.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2:
- Adjust context
- Drop change to an error path that we don't have]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit af5040da01 upstream.
trace_block_rq_complete does not take into account that request can
be partially completed, so we can get the following incorrect output
of blkparser:
C R 232 + 240 [0]
C R 240 + 232 [0]
C R 248 + 224 [0]
C R 256 + 216 [0]
but should be:
C R 232 + 8 [0]
C R 240 + 8 [0]
C R 248 + 8 [0]
C R 256 + 8 [0]
Also, the whole output summary statistics of completed requests and
final throughput will be incorrect.
This patch takes into account real completion size of the request and
fixes wrong completion accounting.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
[bwh: Backported to 3.2: drop change in blk-mq.c]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d8eb6c653e upstream.
commit 511f3c5 (usb: gadget: udc-core: fix a regression during gadget driver
unbinding) introduced a crash when DEBUG is enabled.
The debug trace in the atmel_usba_stop function made the assumption that the
driver pointer passed in parameter was not NULL, but since the commit above,
such assumption was no longer always true.
This commit now uses the driver pointer stored in udc which fixes this
issue.
[ balbi@ti.com : improved commit log a bit ]
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1e43f2326 upstream.
The UVC specification uses alternate setting selection to notify devices
of stream start/stop. This breaks when using bulk-based devices, as the
video streaming interface has a single alternate setting in that case,
making video stream start and video stream stop events to appear
identical to the device. Bulk-based devices are thus not well supported
by UVC.
The webcam built in the Asus Zenbook UX302LA ignores the set interface
request and will keep the video stream enabled when the driver tries to
stop it. If USB autosuspend is enabled the device will then be suspended
and will crash, requiring a cold reboot.
USB trace capture showed that Windows sends a CLEAR_FEATURE(HALT)
request to the bulk endpoint when stopping the stream instead of
selecting alternate setting 0. The camera then behaves correctly, and
thus seems to require that behaviour.
Replace selection of alternate setting 0 with clearing of the endpoint
halt feature at video stream stop for bulk-based devices. Let's refrain
from blaming Microsoft this time, as it's not clear whether this
Windows-specific but USB-compliant behaviour was specifically developed
to handle bulkd-based UVC devices, or if the camera just took advantage
of it.
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 723abd87f6 upstream.
The 'active' sysfs attribute should refer to the currently active tty
devices the console is running on, not the currently active console. The
console structure doesn't refer to any device in sysfs, only the tty the
console is running on has. So we need to print out the tty names in
'active', not the console names.
There is one special-case, which is tty0. If the console is directed to
it, we want 'tty0' to show up in the file, so user-space knows that the
messages get forwarded to the active VT. The ->device() callback would
resolve tty0, though. Hence, treat it special and don't call into the VT
layer to resolve it (plymouth is known to depend on it).
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Werner Fink <werner@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: no TTY_DRIVER_UNNUMBERED_NODE case in tty_line_name()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06f9b6e596 upstream.
Around DWC USB3 2.30a release another bit has been added to the
Device-Specific Event (DEVT) Event Information (EvtInfo) bitfield.
Because of that, what used to be 8 bits long, has become 9 bits long.
Per dwc3 2.30a+ spec in the Device-Specific Event (DEVT), the field of
Event Information Bits(EvtInfo) uses [24:16] bits, and it has 9 bits
not 8 bits. And the following reserved field uses [31:25] bits not
[31:24] bits, and it has 7 bits.
So in dwc3_event_devt, the bit mask should be:
event_info [24:16] 9 bits
reserved31_25 [31:25] 7 bits
This patch makes sure that newer core releases will work fine with
Linux and that we will decode the event information properly on new
core releases.
[ balbi@ti.com : improve commit log a bit ]
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f76a1cbed1 upstream.
Commit 3e6c6f630a ("Delay creation of
khcvd thread") moved the call of hvc_init from being a device_initcall
into hvc_alloc, and used a non-null hvc_driver as indication of whether
hvc_init had already been called.
The problem with this is that hvc_driver is only assigned a value
at the bottom of hvc_init, and so there is a window where multiple
hvc_alloc calls can be in progress at the same time and hence try
and call hvc_init multiple times. Previously the use of device_init
guaranteed that hvc_init was only called once.
This manifests itself as sporadic instances of two hvc_init calls
racing each other, and with the loser of the race getting -EBUSY
from tty_register_driver() and hence that virtual console fails:
Couldn't register hvc console driver
virtio-ports vport0p1: error -16 allocating hvc for port
Here we add an atomic_t to guarantee we'll never run hvc_init twice.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 3e6c6f630a ("Delay creation of khcvd thread")
Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
Tested-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b0df6827b upstream.
The functions for data copying copyarea_foreward_8bpp and
copyarea_backward_8bpp are buggy, they produce screen corruption.
This patch fixes the functions and moves the logic to one function
"copyarea_8bpp". For simplicity, the function only handles copying that
is aligned on 8 pixes. If we copy an unaligned area, generic function
cfb_copyarea is used.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43751a1b8e upstream.
This patch fixes the hardware cursor on mach64 when font width is not a
multiple of 8 pixels.
If you load such a font, the cursor is expanded to the next 8-byte
boundary and a part of the next character after the cursor is not
visible.
For example, when you load a font with 12-pixel width, the cursor width
is 16 pixels and when the cursor is displayed, 4 pixels of the next
character are not visible.
The reason is this: atyfb_cursor is called with proper parameters to
load an image that is 12-pixel wide. However, the number is aligned on
the next 8-pixel boundary on the line
"unsigned int width = (cursor->image.width + 7) >> 3;" and the whole
function acts as it is was loading a 16-pixel image.
This patch fixes it so that the value written to the framebuffer is
padded with 0xaaaa (the transparent pattern) when the image size it not
a multiple of 8 pixels. The transparent pattern causes that the cursor
will not interfere with the next character.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c29dd8696d upstream.
This patch fixes mach64 to use unaligned access to the font bitmap.
This fixes unaligned access warning on sparc64 when 14x8 font is loaded.
On x86(64), unaligned access is handled in hardware, so both functions
le32_to_cpup and get_unaligned_le32 perform the same operation.
On RISC machines, unaligned access is not handled in hardware, so we
better use get_unaligned_le32 to avoid the unaligned trap and warning.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00a9d699bc upstream.
The function cfb_copyarea is buggy when the copy operation is not aligned on
long boundary (4 bytes on 32-bit machines, 8 bytes on 64-bit machines).
How to reproduce:
- use x86-64 machine
- use a framebuffer driver without acceleration (for example uvesafb)
- set the framebuffer to 8-bit depth
(for example fbset -a 1024x768-60 -depth 8)
- load a font with character width that is not a multiple of 8 pixels
note: the console-tools package cannot load a font that has
width different from 8 pixels. You need to install the packages
"kbd" and "console-terminus" and use the program "setfont" to
set font width (for example: setfont Uni2-Terminus20x10)
- move some text left and right on the bash command line and you get a
screen corruption
To expose more bugs, put this line to the end of uvesafb_init_info:
info->flags |= FBINFO_HWACCEL_COPYAREA | FBINFO_READS_FAST;
- Now framebuffer console will use cfb_copyarea for console scrolling.
You get a screen corruption when console is scrolled.
This patch is a rewrite of cfb_copyarea. It fixes the bugs, with this
patch, console scrolling in 8-bit depth with a font width that is not a
multiple of 8 pixels works fine.
The cfb_copyarea code was very buggy and it looks like it was written
and never tried with non-8-pixel font.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a772d47366 upstream.
When X11 is running and the user switches back to console, the card
modifies the content of registers M_MACCESS and M_PITCH in periodic
intervals.
This patch fixes it by restoring the content of these registers before
issuing any accelerator command.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6ccb9803e upstream.
CPU_32v6 currently selects CPU_USE_DOMAINS if CPU_V6 and MMU. This is
because ARM 1136 r0pX CPUs lack the v6k extensions, and therefore do
not have hardware thread registers. The lack of these registers requires
the kernel to update the vectors page at each context switch in order to
write a new TLS pointer. This write must be done via the userspace
mapping, since aliasing caches can lead to expensive flushing when using
kmap. Finally, this requires the vectors page to be mapped r/w for
kernel and r/o for user, which has implications for things like put_user
which must trigger CoW appropriately when targetting user pages.
The upshot of all this is that a v6/v7 kernel makes use of domains to
segregate kernel and user memory accesses. This has the nasty
side-effect of making device mappings executable, which has been
observed to cause subtle bugs on recent cores (e.g. Cortex-A15
performing a speculative instruction fetch from the GIC and acking an
interrupt in the process).
This patch solves this problem by removing the remaining domain support
from ARMv6. A new memory type is added specifically for the vectors page
which allows that page (and only that page) to be mapped as user r/o,
kernel r/w. All other user r/o pages are mapped also as kernel r/o.
Patch co-developed with Russell King.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2:
- Adjust filename, context
- Drop condition on CONFIG_ARM_LPAE]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 26ffd0d43b upstream.
PROT_NONE mappings apply the page protection attributes defined by _P000
which translate to PAGE_NONE for ARM. These attributes specify an XN,
RDONLY pte that is inaccessible to userspace. However, on kernels
configured without support for domains, such a pte *is* accessible to
the kernel and can be read via get_user, allowing tasks to read
PROT_NONE pages via syscalls such as read/write over a pipe.
This patch introduces a new software pte flag, L_PTE_NONE, that is set
to identify faulting, present entries.
Signed-off-by: Will Deacon <will.deacon@arm.com>
[bwh: Backported to 3.2 as dependency of commit b6ccb9803e
('ARM: 7954/1: mm: remove remaining domain support from ARMv6'):
- Drop 3-level changes
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6b355b33a6 upstream.
Previous logic,
if (avail > 8) {
store slave;
return;
}
send data; clear;
The logic error is, if there isn't space send the buffer and clear,
but the slave wasn't added to the now empty buffer loosing that slave
id. It also should have been "if (avail >= 8)" because when it is 8,
there is space.
Instead, if there isn't space send and clear the buffer, then there is
always space for the slave id.
Signed-off-by: David Fries <David@Fries.net>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c88507fbad upstream.
DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1535bd8adb ]
When checking a system call return code for an error,
linux_sparc_syscall was sign-extending the lower 32-bit value and
comparing it to -ERESTART_RESTARTBLOCK. lseek can return valid return
codes whose lower 32-bits alone would indicate a failure (such as 4G-1).
Use the whole 64-bit value to check for errors. Only the 32-bit path
should sign extend the lower 32-bit value.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Allen Pais <allen.pais@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 4f6500fff5 ]
In arch/sparc/Kernel/Makefile, we see:
obj-$(CONFIG_SPARC64) += jump_label.o
However, the Kconfig selects HAVE_ARCH_JUMP_LABEL unconditionally
for all SPARC. This in turn leads to the following failure when
doing allmodconfig coverage builds:
kernel/built-in.o: In function `__jump_label_update':
jump_label.c:(.text+0x8560c): undefined reference to `arch_jump_label_transform'
kernel/built-in.o: In function `arch_jump_label_transform_static':
(.text+0x85cf4): undefined reference to `arch_jump_label_transform'
make: *** [vmlinux] Error 1
Change HAVE_ARCH_JUMP_LABEL to be conditional on SPARC64 so that it
matches the Makefile.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 16932237f2 ]
This reverts commit 145e1c0023.
This commit broke the behavior of __copy_from_user_inatomic when
it is only partially successful. Instead of returning the number
of bytes not copied, it now returns 1. This translates to the
wrong value being returned by iov_iter_copy_from_user_atomic.
xfstests generic/246 and LTP writev01 both fail on btrfs and nfs
because of this.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 557fc5873e ]
The SIMBA APB Bridges lacks the 'ranges' of-property describing the
PCI I/O and memory areas located beneath the bridge. Faking this
information has been performed by reading range registers in the
APB bridge, and calculating the corresponding areas.
In commit 01f94c4a6c
("Fix sabre pci controllers with new probing scheme.") a bug was
introduced into this calculation, causing the PCI memory areas
to be calculated incorrectly: The shift size was set to be
identical for I/O and MEM ranges, which is incorrect.
This patch set the shift size of the MEM range back to the
value used before 01f94c4a6c.
Signed-off-by: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7563487cbf ]
There are three buffer overflows addressed in this patch.
1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer. I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().
2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters. The ->eazlist[] is 11
characters long. I have modified the code to return if the source
buffer is too long.
3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters. I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.
Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[]. (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters). For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8b7b932434 ]
nla_strcmp compares the string length plus one, so it's implicitly
including the nul-termination in the comparison.
int nla_strcmp(const struct nlattr *nla, const char *str)
{
int len = strlen(str) + 1;
...
d = memcmp(nla_data(nla), str, len);
However, if NLA_STRING is used, userspace can send us a string without
the nul-termination. This is a problem since the string
comparison will not match as the last byte may be not the
nul-termination.
Fix this by skipping the comparison of the nul-termination if the
attribute data is nul-terminated. Suggested by Thomas Graf.
Cc: Florian Westphal <fw@strlen.de>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 43a43b6040 ]
After commit c15b1ccadb ("ipv6: move DAD and addrconf_verify
processing to workqueue") some counters are now updated in process context
and thus need to disable bh before doing so, otherwise deadlocks can
happen on 32-bit archs. Fabio Estevam noticed this while while mounting
a NFS volume on an ARM board.
As a compensation for missing this I looked after the other *_STATS_BH
and found three other calls which need updating:
1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling)
2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ...
(only in case of icmp protocol with raw sockets in error handling)
3) ping6_v6_sendmsg (error handling)
Fixes: c15b1ccadb ("ipv6: move DAD and addrconf_verify processing to workqueue")
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 0576eddf24 ]
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a39ee449f9 ]
vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.
The code in question was introduced in commit
8dd014adfe
vhost-net: mergeable buffers support
CVE-2014-0055
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d8316f3991 ]
When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.
This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.
Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.
Fix this up by detecting this overrun and doing packet drop
immediately.
CVE-2014-0077
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e367c2d03d ]
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a
said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.
when I test through ping6 -c1 -s5000 $ip (mtu=1280):
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)
which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)
so delete the min() when change back the mtu.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Fixes: 75a493e60a ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ecab67015e ]
tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
age needs to be added to the condition.
Age calculation in ipv6_create_tempaddr is different from the one
in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
This can cause age in ipv6_create_tempaddr to be less than the one
in addrconf_verify and therefore unnecessary temporary address to
be generated.
Use age calculation as in addrconf_modify to avoid this.
Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit dbb490b965 ]
When copying in a struct msghdr from the user, if the user has set the
msg_namelen parameter to a negative value it gets clamped to a valid
size due to a comparison between signed and unsigned values.
Ensure the syscall errors when the user passes in a negative value.
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit dd38743b4c ]
With TX VLAN offload enabled the source MAC address for frames sent using the
VLAN interface is currently set to the address of the real interface. This is
wrong since the VLAN interface may be configured with a different address.
The bug was introduced in commit 2205369a31
("vlan: Fix header ops passthru when doing TX VLAN offload.").
This patch sets the source address before calling the create function of the
real interface.
Signed-off-by: Peter Boström <peter.bostrom@netrounds.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit de14439167 ]
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b0 ("net: fix multithreaded signal handling in
unix recv routines").
To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.
Fixes: b3ca9b02b0 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 6565b9eeef ]
MLD queries are supposed to have an IPv6 link-local source address
according to RFC2710, section 4 and RFC3810, section 5.1.14. This patch
adds a sanity check to ignore such broken MLD queries.
Without this check, such malformed MLD queries can result in a
denial of service: The queries are ignored by any MLD listener
therefore they will not respond with an MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit c485658bae ]
While working on ec0223ec48 ("net: sctp: fix sctp_sf_do_5_1D_ce to
verify if we/peer is AUTH capable"), we noticed that there's a skb
memory leakage in the error path.
Running the same reproducer as in ec0223ec48 and by unconditionally
jumping to the error label (to simulate an error condition) in
sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about
the unfreed chunk->auth_chunk skb clone:
Unreferenced object 0xffff8800b8f3a000 (size 256):
comm "softirq", pid 0, jiffies 4294769856 (age 110.757s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X.........
backtrace:
[<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0
[<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210
[<ffffffff81566929>] skb_clone+0x49/0xb0
[<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp]
[<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp]
[<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp]
[<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210
[<ffffffff815a64af>] nf_reinject+0xbf/0x180
[<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue]
[<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink]
[<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0
[<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink]
[<ffffffff815a2bd8>] netlink_unicast+0x168/0x250
[<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0
[<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0
[<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380
What happens is that commit bbd0d59809 clones the skb containing
the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case
that an endpoint requires COOKIE-ECHO chunks to be authenticated:
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
------------------ AUTH; COOKIE-ECHO ---------------->
<-------------------- COOKIE-ACK ---------------------
When we enter sctp_sf_do_5_1D_ce() and before we actually get to
the point where we process (and subsequently free) a non-NULL
chunk->auth_chunk, we could hit the "goto nomem_init" path from
an error condition and thus leave the cloned skb around w/o
freeing it.
The fix is to centrally free such clones in sctp_chunk_destroy()
handler that is invoked from sctp_chunk_free() after all refs have
dropped; and also move both kfree_skb(chunk->auth_chunk) there,
so that chunk->auth_chunk is either NULL (since sctp_chunkify()
allocs new chunks through kmem_cache_zalloc()) or non-NULL with
a valid skb pointer. chunk->skb and chunk->auth_chunk are the
only skbs in the sctp_chunk structure that need to be handeled.
While at it, we should use consume_skb() for both. It is the same
as dev_kfree_skb() but more appropriately named as we are not
a device but a protocol. Also, this effectively replaces the
kfree_skb() from both invocations into consume_skb(). Functions
are the same only that kfree_skb() assumes that the frame was
being dropped after a failure (e.g. for tools like drop monitor),
usage of consume_skb() seems more appropriate in function
sctp_chunk_destroy() though.
Fixes: bbd0d59809 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8d7f6690ce upstream.
The kernel currently crashes with a low-address-protection exception
if a user space process executes an instruction that tries to use the
linkage stack. Set the base-ASTE origin and the subspace-ASTE origin
of the dispatchable-unit-control-table to point to a dummy ASTE.
Set up control register 15 to point to an empty linkage stack with no
room left.
A user space process with a linkage stack instruction will still crash
but with a different exception which is correctly translated to a
segmentation fault instead of a kernel oops.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d81de8e86 upstream.
It's possible for userland to pass down an iovec via writev() that has a
bogus user pointer in it. If that happens and we're doing an uncached
write, then we can end up getting less bytes than we expect from the
call to iov_iter_copy_from_user. This is CVE-2014-0069
cifs_iovec_write isn't set up to handle that situation however. It'll
blindly keep chugging through the page array and not filling those pages
with anything useful. Worse yet, we'll later end up with a negative
number in wdata->tailsz, which will confuse the sending routines and
cause an oops at the very least.
Fix this by having the copy phase of cifs_iovec_write stop copying data
in this situation and send the last write as a short one. At the same
time, we want to avoid sending a zero-length write to the server, so
break out of the loop and set rc to -EFAULT if that happens. This also
allows us to handle the case where no address in the iovec is valid.
[Note: Marking this for stable on v3.4+ kernels, but kernels as old as
v2.6.38 may have a similar problem and may need similar fix]
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
[bwh: Backported to 3.2:
- Adjust context
- s/nr_pages/npages/
- s/wdata->pages/pages/
- In case of an error with no data copied, we must kunmap() page 0,
but in neither case should we free anything else]
Thanks to Raphael Geissert for an independent backport that showed some
bugs in my first version.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 989c6b34f6 upstream.
It is possible for __direct_map to be called on invalid root_hpa
(-1), two examples:
1) try_async_pf -> can_do_async_pf
-> vmx_interrupt_allowed -> nested_vmx_vmexit
2) vmx_handle_exit -> vmx_interrupt_allowed -> nested_vmx_vmexit
Then to load_vmcs12_host_state and kvm_mmu_reset_context.
Check for this possibility, let fault exception be regenerated.
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=924916
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d43ff4cd79 upstream.
The struct driver_info ax88178_info is assigned the function
asix_rx_fixup_common as it's rx_fixup callback. This means that
FLAG_MULTI_PACKET must be set as this function is cloning the
data and calling usbnet_skb_return. Not setting this flag leads
to usbnet_skb_return beeing called a second time from within
the rx_process function in the usbnet module.
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b5b6f5413 upstream.
ASIX AX88772B started to pack data even more tightly. Packets and the ASIX packet
header may now cross URB boundaries. To handle this we have to introduce
some state between individual calls to asix_rx_fixup().
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Emil: backported to 3.2:
- dropped changes to drivers/net/usb/ax88172a.c
- Introduced some static function declarations as the functions
are not used outside of asix.c (sparse is complaining about it) ]
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a9e0aca4b3 upstream.
asix_rx_fixup() is complex, and does some unnecessary memory copies (at
least on x86 where NET_IP_ALIGN is 0)
Also, it tends to provide skbs with a big truesize (4096+256 with
MTU=1500) to upper stack, so incoming trafic consume a lot of memory and
I noticed early packet drops because we hit socket rcvbuf too fast.
Switch to a different strategy, using copybreak so that we provide nice
skbs to upper stack (including the NET_SKB_PAD to avoid future head
reallocations in some paths)
With this patch, I no longer see packets drops or tcp collapses on
various tcp workload with a AX88772 adapter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Aurelien Jacobs <aurel@gnuage.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Trond Wuellner <trond@chromium.org>
Cc: Grant Grundler <grundler@chromium.org>
Cc: Paul Stewart <pstew@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Emil: Backported to 3.2: fixed small conflict ]
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f8ce239dfc upstream.
builddeb generates a control file that says the linux-headers package
can only be built for the build system primary architecture. This
breaks cross-building configurations. We should use $debarch for this
instead.
Since $debarch is not yet set when generating the control file, set
Architecture: any and use control file variables to fill in the
description.
Fixes: cd8d60a20a ('kbuild: create linux-headers package in deb-pkg')
Reported-and-tested-by: "Niew, Sh." <shniew@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Michal Marek <mmarek@suse.cz>
commit c5e318f67e upstream.
These commands will mysteriously fail:
$ make ARCH=arm versatile_defconfig
[...]
$ make ARCH=arm deb-pkg
[...]
make[1]: *** [deb-pkg] Error 1
make: *** [deb-pkg] Error 2
The Debian architecture selection for these kernel architectures does
'grep FOO=y $KCONFIG_CONFIG && echo bar', and after 'set -e' this
aborts the script if grep does not find the given config symbol.
Fixes: 10f26fa642 ('build, deb-pkg: select userland architecture based on UTS_MACHINE')
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Michal Marek <mmarek@suse.cz>
commit fe6cc55f3a upstream.
[ use zero netdev_feature mask to avoid backport of
netif_skb_dev_features function ]
Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.
Given:
Host <mtu1500> R1 <mtu1200> R2
Host sends tcp stream which is routed via R1 and R2. R1 performs GRO.
In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.
When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.
This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.
For ipv6, we send out pkt too big error for gso if the individual
segments are too big.
For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.
Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.
However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there. I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.
Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.
This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small. Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway. Also its not like this could not be improved later
once the dust settles.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit de960aa9ab upstream.
[ no skb_gso_seglen helper in 3.4, leave tbf alone ]
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
In older kernels (before v3.10) ipc_rcu_hdr->refcount was non-atomic int.
There was possuble double-free bug: do_msgsnd() calls ipc_rcu_putref() under
msq->q_perm->lock and RCU, while freequeue() calls it while it holds only
'rw_mutex', so there is no sinchronization between them. Two function
decrements '2' non-atomically, they both can get '0' as result.
do_msgsnd() freequeue()
msq = msg_lock_check(ns, msqid);
...
ipc_rcu_getref(msq);
msg_unlock(msq);
schedule();
(caller locks spinlock)
expunge_all(msq, -EIDRM);
ss_wakeup(&msq->q_senders, 1);
msg_rmid(ns, msq);
msg_unlock(msq);
ipc_lock_by_ptr(&msq->q_perm);
ipc_rcu_putref(msq); ipc_rcu_putref(msq);
< both may get get --(...)->refcount == 0 >
This patch locks ipc_lock and RCU around ipc_rcu_putref in freequeue.
( RCU protects memory for spin_unlock() )
Similar bugs might be in other users of ipc_rcu_putref().
In the mainline this has been fixed in v3.10 indirectly in commmit
6062a8dc05
("ipc,sem: fine grained locking for semtimedop") by Rik van Riel.
That commit optimized locking and converted refcount into atomic.
I'm not sure that anybody should care about this bug: it's very-very unlikely
and no longer exists in actual mainline. I've found this just by looking into
the code, probably this never happens in real life.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b22f5126a2 upstream.
Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...
struct dccp_hdr _dh, *dh;
...
skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.
Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.
Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 00a1a053eb upstream.
Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This is part of commit ca2beaf84d ('staging: speakup: Prefix
externally-visible symbols') upstream. It is required as preparation
for commit 00a1a053eb ('ext4: atomically set inode->i_flags in
ext4_set_inode_flags()').
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 421e08c41f upstream.
The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.
Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.
We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).
So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3cdeb713dc upstream.
Andreas reported that after 1f42db786b ("PCI: Enable INTx if BIOS left
them disabled"), pciehp surprise removal stopped working.
This happens because pci_reenable_device() on the hotplug bridge (used in
the pciehp_configure_device() path) clears the Interrupt Disable bit, which
apparently breaks the bridge's MSI hotplug event reporting.
Previously we cleared the Interrupt Disable bit in do_pci_enable_device(),
which is used by both pci_enable_device() and pci_reenable_device(). But
we use pci_reenable_device() after the driver may have enabled MSI or
MSI-X, and we *set* Interrupt Disable as part of enabling MSI/MSI-X.
This patch clears Interrupt Disable only when MSI/MSI-X has not been
enabled.
Fixes: 1f42db786b PCI: Enable INTx if BIOS left them disabled
Link: https://bugzilla.kernel.org/show_bug.cgi?id=71691
Reported-and-tested-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3735d524da upstream.
If the machine is booted without any cpu_idle driver set
(b/c disable_cpuidle() has been called) we should follow
other users of cpu_idle API and check the return value
for NULL before using it.
Reported-and-tested-by: Mark van Dijk <mark@internecto.net>
Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ec0223ec48 ]
RFC4895 introduced AUTH chunks for SCTP; during the SCTP
handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
being optional though):
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
-------------------- COOKIE-ECHO -------------------->
<-------------------- COOKIE-ACK ---------------------
A special case is when an endpoint requires COOKIE-ECHO
chunks to be authenticated:
---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
<------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
------------------ AUTH; COOKIE-ECHO ---------------->
<-------------------- COOKIE-ACK ---------------------
RFC4895, section 6.3. Receiving Authenticated Chunks says:
The receiver MUST use the HMAC algorithm indicated in
the HMAC Identifier field. If this algorithm was not
specified by the receiver in the HMAC-ALGO parameter in
the INIT or INIT-ACK chunk during association setup, the
AUTH chunk and all the chunks after it MUST be discarded
and an ERROR chunk SHOULD be sent with the error cause
defined in Section 4.1. [...] If no endpoint pair shared
key has been configured for that Shared Key Identifier,
all authenticated chunks MUST be silently discarded. [...]
When an endpoint requires COOKIE-ECHO chunks to be
authenticated, some special procedures have to be followed
because the reception of a COOKIE-ECHO chunk might result
in the creation of an SCTP association. If a packet arrives
containing an AUTH chunk as a first chunk, a COOKIE-ECHO
chunk as the second chunk, and possibly more chunks after
them, and the receiver does not have an STCB for that
packet, then authentication is based on the contents of
the COOKIE-ECHO chunk. In this situation, the receiver MUST
authenticate the chunks in the packet by using the RANDOM
parameters, CHUNKS parameters and HMAC_ALGO parameters
obtained from the COOKIE-ECHO chunk, and possibly a local
shared secret as inputs to the authentication procedure
specified in Section 6.3. If authentication fails, then
the packet is discarded. If the authentication is successful,
the COOKIE-ECHO and all the chunks after the COOKIE-ECHO
MUST be processed. If the receiver has an STCB, it MUST
process the AUTH chunk as described above using the STCB
from the existing association to authenticate the
COOKIE-ECHO chunk and all the chunks after it. [...]
Commit bbd0d59809 introduced the possibility to receive
and verification of AUTH chunk, including the edge case for
authenticated COOKIE-ECHO. On reception of COOKIE-ECHO,
the function sctp_sf_do_5_1D_ce() handles processing,
unpacks and creates a new association if it passed sanity
checks and also tests for authentication chunks being
present. After a new association has been processed, it
invokes sctp_process_init() on the new association and
walks through the parameter list it received from the INIT
chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO
and SCTP_PARAM_CHUNKS, and copies them into asoc->peer
meta data (peer_random, peer_hmacs, peer_chunks) in case
sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
peer_random != NULL and peer_hmacs != NULL the peer is to be
assumed asoc->peer.auth_capable=1, in any other case
asoc->peer.auth_capable=0.
Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
available, we set up a fake auth chunk and pass that on to
sctp_sf_authenticate(), which at latest in
sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
at position 0..0008 when setting up the crypto key in
crypto_hash_setkey() by using asoc->asoc_shared_key that is
NULL as condition key_id == asoc->active_key_id is true if
the AUTH chunk was injected correctly from remote. This
happens no matter what net.sctp.auth_enable sysctl says.
The fix is to check for net->sctp.auth_enable and for
asoc->peer.auth_capable before doing any operations like
sctp_sf_authenticate() as no key is activated in
sctp_auth_asoc_init_active_key() for each case.
Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
passed from the INIT chunk was not used in the AUTH chunk, we
SHOULD send an error; however in this case it would be better
to just silently discard such a maliciously prepared handshake
as we didn't even receive a parameter at all. Also, as our
endpoint has no shared key configured, section 6.3 says that
MUST silently discard, which we are doing from now onwards.
Before calling sctp_sf_pdiscard(), we need not only to free
the association, but also the chunk->auth_chunk skb, as
commit bbd0d59809 created a skb clone in that case.
I have tested this locally by using netfilter's nfqueue and
re-injecting packets into the local stack after maliciously
modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
and the SCTP packet containing the COOKIE_ECHO (injecting
AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.
Fixes: bbd0d59809 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d7b95315cc ]
Redefine the RXD_ERR_MASK to include only relevant error bits. This fixes
a customer reported issue of randomly dropping packets on the 5719.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5a581b367b upstream.
According to the C standard 3.4.3p3, overflow of a signed integer results
in undefined behavior. This commit therefore changes the definitions
of time_after(), time_after_eq(), time_after64(), and time_after_eq64()
to avoid this undefined behavior. The trick is that the subtraction
is done using unsigned arithmetic, which according to 6.2.5p9 cannot
overflow because it is defined as modulo arithmetic. This has the added
(though admittedly quite small) benefit of shortening four lines of code
by four characters each.
Note that the C standard considers the cast from unsigned to
signed to be implementation-defined, see 6.3.1.3p3. However, on a
two's-complement system, an implementation that defines anything other
than a reinterpretation of the bits is free to come to me, and I will be
happy to act as a witness for its being committed to an insane asylum.
(Although I have nothing against saturating arithmetic or signals in some
cases, these things really should not be the default when compiling an
operating-system kernel.)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Kevin Easton <kevin@guarana.org>
[ paulmck: Included time_after64() and time_after_eq64(), as suggested
by Eric Dumazet, also fixed commit message.]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1f91ecc14d upstream.
When selecting the audio output destinations (headphones, FP headphones,
multichannel output), unnecessary I2S channels are digitally muted to
avoid invalid signal levels on the other outputs.
Signed-off-by: Roman Volkov <v1ron@mail.ru>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3dd77654fb upstream.
Actually CS4245 connected to the I2S channel 1 for
capture, not channel 2. Otherwise capturing and
playback does not work for CS4245.
Signed-off-by: Roman Volkov <v1ron@mail.ru>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e6355ad7b1 upstream.
snd_pcm_stop() must be called in the PCM substream lock context.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[wml: Backported to 3.4: Adjust filename]
Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ffd5939381 ]
SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit
emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of
'struct sctp_getaddrs_old' which includes a struct sockaddr pointer,
sizeof(param) check will always fail in kernel as the structure in
64bit kernel space is 4bytes larger than for user binaries compiled
in 32bit mode. Thus, applications making use of sctp_connectx() won't
be able to run under such circumstances.
Introduce a compat interface in the kernel to deal with such
situations by using a 'struct compat_sctp_getaddrs_old' structure
where user data is copied into it, and then sucessively transformed
into a 'struct sctp_getaddrs_old' structure with the help of
compat_ptr(). That fixes sctp_connectx() abi without any changes
needed in user space, and lets the SCTP test suite pass when compiled
in 32bit and run on 64bit kernels.
Fixes: f9c67811eb ("sctp: Fix regression introduced by new sctp_connectx api")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 163c8ff30d ]
aggregator_identifier is used to assign unique aggregator identifiers
to aggregators of a bond during device enslaving.
aggregator_identifier is currently a global variable that is zeroed in
bond_3ad_initialize().
This sequence will lead to duplicate aggregator identifiers for eth1 and eth3:
create bond0
change bond0 mode to 802.3ad
enslave eth0 to bond0 //eth0 gets agg id 1
enslave eth1 to bond0 //eth1 gets agg id 2
create bond1
change bond1 mode to 802.3ad
enslave eth2 to bond1 //aggregator_identifier is reset to 0
//eth2 gets agg id 1
enslave eth3 to bond0 //eth3 gets agg id 2
Fix this by making aggregator_identifier private to the bond.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit eb85569fe2 ]
This patch removes a generic hard_header_len check from the usbnet
module that is causing dropped packages under certain circumstances
for devices that send rx packets that cross urb boundaries.
One example is the AX88772B which occasionally send rx packets that
cross urb boundaries where the remaining partial packet is sent with
no hardware header. When the buffer with a partial packet is of less
number of octets than the value of hard_header_len the buffer is
discarded by the usbnet module.
With AX88772B this can be reproduced by using ping with a packet
size between 1965-1976.
The bug has been reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=29082
This patch introduces the following changes:
- Removes the generic hard_header_len check in the rx_complete
function in the usbnet module.
- Introduces a ETH_HLEN check for skbs that are not cloned from
within a rx_fixup callback.
- For safety a hard_header_len check is added to each rx_fixup
callback function that could be affected by this change.
These extra checks could possibly be removed by someone
who has the hardware to test.
- Removes a call to dev_kfree_skb_any() and instead utilizes the
dev->done list to queue skbs for cleanup.
The changes place full responsibility on the rx_fixup callback
functions that clone skbs to only pass valid skbs to the
usbnet_skb_return function.
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit c6993dfd7d ]
Quoting David Vrabel -
"5780 cards cannot have jumbo frames and TSO enabled together. When
jumbo frames are enabled by setting the MTU, the TSO feature must be
cleared. This is done indirectly by calling netdev_update_features()
which will call tg3_fix_features() to actually clear the flags.
netdev_update_features() will also trigger a new netlink message for the
feature change event which will result in a call to tg3_get_stats64()
which deadlocks on the tg3 lock."
tg3_set_mtu() does not need to be under the tg3 lock since converting
the flags to use set_bit(). Move it out to after tg3_netif_stop().
Reported-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 85eae82a08 upstream.
The console_cpu_notify() function runs with interrupts disabled in the
CPU_DYING case. It therefore cannot block, for example, as will happen
when it calls console_lock(). Therefore, remove the CPU_DYING leg of
the switch statement to avoid this problem.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
fixed upstream in v3.6 by ec145babe7
get_monotonic_boottime adds three nanonsecond values stored
in longs, followed by an s64. If the long values are all
close to 1e9 the first three additions can overflow and
become negative when added to the s64. Cast the first
value to s64 so that all additions are 64 bit.
Signed-off-by: Colin Cross <ccross@android.com>
[jstultz: Fished this out of the AOSP commong.git tree. This was
fixed upstream in v3.6 by ec145babe7]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 23a8e8441a upstream.
Doing some different tests, I discovered that function graph tracing, when
filtered via the set_ftrace_filter and set_ftrace_notrace files, does
not always keep with them if another function ftrace_ops is registered
to trace functions.
The reason is that function graph just happens to trace all functions
that the function tracer enables. When there was only one user of
function tracing, the function graph tracer did not need to worry about
being called by functions that it did not want to trace. But now that there
are other users, this becomes a problem.
For example, one just needs to do the following:
# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function_graph > current_tracer
# cat trace
[..]
0) | schedule() {
------------------------------------------
0) <idle>-0 => rcu_pre-7
------------------------------------------
0) ! 2980.314 us | }
0) | schedule() {
------------------------------------------
0) rcu_pre-7 => <idle>-0
------------------------------------------
0) + 20.701 us | }
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# cat trace
[..]
1) + 20.825 us | }
1) + 21.651 us | }
1) + 30.924 us | } /* SyS_ioctl */
1) | do_page_fault() {
1) | __do_page_fault() {
1) 0.274 us | down_read_trylock();
1) 0.098 us | find_vma();
1) | handle_mm_fault() {
1) | _raw_spin_lock() {
1) 0.102 us | preempt_count_add();
1) 0.097 us | do_raw_spin_lock();
1) 2.173 us | }
1) | do_wp_page() {
1) 0.079 us | vm_normal_page();
1) 0.086 us | reuse_swap_page();
1) 0.076 us | page_move_anon_rmap();
1) | unlock_page() {
1) 0.082 us | page_waitqueue();
1) 0.086 us | __wake_up_bit();
1) 1.801 us | }
1) 0.075 us | ptep_set_access_flags();
1) | _raw_spin_unlock() {
1) 0.098 us | do_raw_spin_unlock();
1) 0.105 us | preempt_count_sub();
1) 1.884 us | }
1) 9.149 us | }
1) + 13.083 us | }
1) 0.146 us | up_read();
When the stack tracer was enabled, it enabled all functions to be traced, which
now the function graph tracer also traces. This is a side effect that should
not occur.
To fix this a test is added when the function tracing is changed, as well as when
the graph tracer is enabled, to see if anything other than the ftrace global_ops
function tracer is enabled. If so, then the graph tracer calls a test trampoline
that will look at the function that is being traced and compare it with the
filters defined by the global_ops.
As an optimization, if there's no other function tracers registered, or if
the only registered function tracers also use the global ops, the function
graph infrastructure will call the registered function graph callback directly
and not go through the test trampoline.
Fixes: d2d45c7a03 "tracing: Have stack_tracer use a separate list of functions"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 31abdab9c1 upstream.
For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
in hpfs_dir_lseek() - there's a lot of methods that grab the former with
the caller already holding the latter, so it must take i_mutex first.
For another, locking the damn thing, carefully validating the offset,
then dropping locks and assigning the offset is obviously racy.
Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
won't modify the sucker on B-tree surgeries.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2cbe5c76fc upstream.
Previously, hpfs scanned all bitmaps each time the user asked for free
space using statfs. This patch changes it so that hpfs scans the
bitmaps only once, remembes the free space and on next invocation of
statfs it returns the value instantly.
New versions of wine are hammering on the statfs syscall very heavily,
making some games unplayable when they're stored on hpfs, with load
times in minutes.
This should be backported to the stable kernels because it fixes
user-visible problem (excessive level load times in wine).
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ kamal: backport to 3.8 (no hpfs_prefetch_bitmap) ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dc1dc2f8a5 upstream.
When booting a multi-platform m68k kernel on a non-Mac with "console=ttyS0"
on the kernel command line, it crashes with:
Unable to handle kernel NULL pointer dereference at virtual address (null)
Oops: 00000000
PC: [<0013ad28>] __pmz_startup+0x32/0x2a0
...
Call Trace: [<002c5d3e>] pmz_console_setup+0x64/0xe4
The normal tty driver doesn't crash, because init_pmz() checks
pmz_ports_count again after calling pmz_probe().
In the serial console initialization path, pmz_console_init() doesn't do
this, causing the driver to crash later.
Add a check for pmz_ports_count to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3873d064b8 upstream.
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
shown below. Fix this warning by instead using sector_div() which is provided
by the kernel.h header file.
fs/nfs/blocklayout/extents.c: In function ‘normalize’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fd2bdfcca upstream.
pcmuio_detach() is called by the comedi core even if pcmuio_attach()
returned an error, so `dev->private` might be `NULL`. Check for that
before dereferencing it.
Also, as pointed out by Dan Carpenter, there is no need to check the
pointer passed to `kfree()` is non-NULL, so remove that check.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[Part of commit f6b316bcd8 upstream.
Split from original patch subject: "staging: comedi: ssv_dnp: use
comedi_dio_update_state()"]
Also, fix a bug where the state of the channels is returned in data[0].
The comedi core expects it to be returned in data[1].
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 847d7970de upstream.
For systems with multiple servers and routed fabric, all
northbridges get assigned to the first server. Fix this by also
using the node reported from the PCI bus. For single-fabric
systems, the northbriges are on PCI bus 0 by definition, which
are on NUMA node 0 by definition, so this is invarient on most
systems.
Tested on fam10h and fam15h single and multi-fabric systems and
candidate for stable.
Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 596f3142d2 upstream.
We always disable cr8 intercept in its handler, but only re-enable it
if handling KVM_REQ_EVENT, so there can be a window where we do not
intercept cr8 writes, which allows an interrupt to disrupt a higher
priority task.
Fix this by disabling intercepts in the same function that re-enables
them when needed. This fixes BSOD in Windows 2008.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d25f06ea46 upstream.
vmxnet3's netpoll driver is incorrectly coded. It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently. The result is data corruption causing panics such as this
one recently observed:
PID: 1371 TASK: ffff88023762caa0 CPU: 1 COMMAND: "rs:main Q:Reg"
#0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
#1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
#2 [ffff88023abd58b0] oops_end at ffffffff8152b570
#3 [ffff88023abd58e0] die at ffffffff81010e0b
#4 [ffff88023abd5910] do_trap at ffffffff8152add4
#5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
#6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
[exception RIP: vmxnet3_rq_rx_complete+1968]
RIP: ffffffffa00f1e80 RSP: ffff88023abd5ac8 RFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff88023b5dcee0 RCX: 00000000000000c0
RDX: 0000000000000000 RSI: 00000000000005f2 RDI: ffff88023b5dcee0
RBP: ffff88023abd5b48 R8: 0000000000000000 R9: ffff88023a3b6048
R10: 0000000000000000 R11: 0000000000000002 R12: ffff8802398d4cd8
R13: ffff88023af35140 R14: ffff88023b60c890 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
#8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
#9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7
The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames
Tested by myself, successfully.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c59053a23d upstream.
In the first place, the loop 'for' in the macro 'for_each_isci_host'
(drivers/scsi/isci/host.h:314) is incorrect, because it accesses
the 3rd element of 2 element array. After the 2nd iteration it executes
the instruction:
ihost = to_pci_info(pdev)->hosts[2]
(while the size of the 'hosts' array equals 2) and reads an
out of range element.
In the second place, this loop is incorrectly optimized by GCC v4.8
(see http://marc.info/?l=linux-kernel&m=138998871911336&w=2).
As a result, on platforms with two SCU controllers,
the loop is executed more times than it can be (for i=0,1 and 2).
It causes kernel panic during entering the S3 state
and the following oops after 'rmmod isci':
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0
Oops: 0000 [#1] SMP
RIP: 0010:[<ffffffff8131360b>] [<ffffffff8131360b>] __list_add+0x1b/0xc0
Call Trace:
[<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0
[<ffffffff81661c3f>] mutex_lock+0x1f/0x30
[<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas]
[<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas]
[<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci]
[<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci]
[<ffffffff813391cb>] pci_device_remove+0x3b/0xb0
[<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0
[<ffffffff813fc8f8>] driver_detach+0xa8/0xb0
[<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120
[<ffffffff813fcf2c>] driver_unregister+0x2c/0x50
[<ffffffff813381f3>] pci_unregister_driver+0x23/0x80
[<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci]
[<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0
[<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0
[<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b
The loop has been corrected.
This patch fixes kernel panic during entering the S3 state
and the above oops.
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ddfadd7736 upstream.
Remove an erroneous BUG_ON() in the case of a hard reset timeout. The
reset timeout handler puts the port into the "awaiting link-up" state.
The timeout causes the device to be disconnected and we need to be in
the awaiting link-up state to re-connect the port. The BUG_ON() made
the incorrect assumption that resets never timeout and we always
complete the reset in the "resetting" state.
Testing this patch also uncovered that libata continues to attempt to
reset the port long after the driver has torn down the context. Once
the driver has committed to abandoning the link it must indicate to
libata that recovery ends by returning -ENODEV from
->lldd_I_T_nexus_reset().
Acked-by: Lukasz Dorau <lukasz.dorau@intel.com>
Reported-by: David Milburn <dmilburn@redhat.com>
Reported-by: Xun Ni <xun.ni@intel.com>
Tested-by: Xun Ni <xun.ni@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d86db25e53 upstream.
The DELAY_INIT quirk only reduces the frequency of enumeration failures
with the Logitech HD Pro C920 and C930e webcams, but does not quite
eliminate them. We have found that adding a delay of 100ms between the
first and second Get Configuration request makes the device enumerate
perfectly reliable even after several weeks of extensive testing. The
reasons for that are anyone's guess, but since the DELAY_INIT quirk
already delays enumeration by a whole second, wating for another 10th of
that isn't really a big deal for the one other device that uses it, and
it will resolve the problems with these webcams.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e0429362ab upstream.
We've encountered a rare issue when enumerating two Logitech webcams
after a reboot that doesn't power cycle the USB ports. They are spewing
random data (possibly some leftover UVC buffers) on the second
(full-sized) Get Configuration request of the enumeration phase. Since
the data is random this can potentially cause all kinds of odd behavior,
and since it occasionally happens multiple times (after the kernel
issues another reset due to the garbled configuration descriptor), it is
not always recoverable. Set the USB_DELAY_INIT quirk that seems to work
around the issue.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a5b2cf5b1a upstream.
The 64bit relocation code places a few symbols in the text segment.
These symbols are only 4 byte aligned where they need to be 8 byte
aligned. Add an explicit alignment.
Signed-off-by: Anton Blanchard <anton@samba.org>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0a13404dd3 upstream.
The unix socket code is using the result of csum_partial to
hash into a lookup table:
unix_hash_fold(csum_partial(sunaddr, len, 0));
csum_partial is only guaranteed to produce something that can be
folded into a checksum, as its prototype explains:
* returns a 32-bit number suitable for feeding into itself
* or csum_tcpudp_magic
The 32bit value should not be used directly.
Depending on the alignment, the ppc64 csum_partial will return
different 32bit partial checksums that will fold into the same
16bit checksum.
This difference causes the following testcase (courtesy of
Gustavo) to sometimes fail:
#include <sys/socket.h>
#include <stdio.h>
int main()
{
int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
int i = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
struct sockaddr addr;
addr.sa_family = AF_LOCAL;
bind(fd, &addr, 2);
listen(fd, 128);
struct sockaddr_storage ss;
socklen_t sslen = (socklen_t)sizeof(ss);
getsockname(fd, (struct sockaddr*)&ss, &sslen);
fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
perror(NULL);
return 1;
}
printf("OK\n");
return 0;
}
As suggested by davem, fix this by using csum_fold to fold the
partial 32bit checksum into a 16bit checksum before using it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c99b1861c2 upstream.
While preparing association request, intersection of device's HT
capability information and corresponding fields advertised by AP
is used.
This patch fixes an error while copying this field from AP's
beacon.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 864a6040f3 upstream.
Avoid leaking data by sending uninitialized memory and setting an
invalid (non-zero) fragment number (the sequence number is ignored
anyway) by setting the seq_ctrl field to zero.
Fixes: 3f52b7e328 ("mac80211: mesh power save basics")
Fixes: ce662b44ce ("mac80211: send (QoS) Null if no buffered frames")
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: Drop change to mps_qos_null_get()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e805ca8b0a upstream.
Logitech C500 (046d:0807) needs the same workaround like other
Logitech Webcams.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 15c34a7606 upstream.
Global quota files are accessed from different nodes. Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.
Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 45ab2813d4 upstream.
If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.
Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.
Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org
Fixes: 6d723736e4 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d96e43e8fc upstream.
This patch adds the missing netif_napi_del() to the flexcan_remove() function.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7e9e148af0 upstream.
If flexcan_chip_start() in flexcan_open() fails, the interrupt is not freed,
this patch adds the missing cleanup.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5be93bdda6 upstream.
When shutting down the CAN interface (ifconfig canX down) during high CAN bus
loads, the CAN core might hang and freeze the whole CPU.
This patch fixes the shutdown sequence by first disabling the CAN core then
disabling all interrupts.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f5295bd8ea upstream.
In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.
This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.
This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e3703f8cdf upstream.
Drew Richardson reported that he could make the kernel go *boom* when hotplugging
while having perf events active.
It turned out that when you have a group event, the code in
__perf_event_exit_context() fails to remove the group siblings from
the context.
We then proceed with destroying and freeing the event, and when you
re-plug the CPU and try and add another event to that CPU, things go
*boom* because you've still got dead entries there.
Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/n/tip-k6v5wundvusvcseqj1si0oz0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 26e61e8939 upstream.
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.
This is I think the relevant bit:
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable
> pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
> pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800)
So we add the insn:p event (fd[23]).
At this point we should have:
n_events = 2, n_added = 1, n_txn = 1
> pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
> pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)
We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.
group_sched_in()
pmu->start_txn() /* nop - BP pmu */
event_sched_in()
event->pmu->add()
So here we should end up with:
0: n_events = 3, n_added = 2, n_txn = 2
4: n_events = 4, n_added = 3, n_txn = 3
But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.
Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.
But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.
However, since we try and schedule 4 it means the 0 event must have
succeeded! Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:
event_sched_out()
event->pmu->del()
on 0 and the BP event.
Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:
n_event = 2, n_added = 2, n_txn = 2
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable
> pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: {
> pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null))
> pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
> pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: {
> pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800)
> pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: }
> pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0
So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.
Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 791c9e0292 upstream.
dequeue_entity() is called when p->on_rq and sets se->on_rq = 0
which appears to guarentee that the !se->on_rq condition is met.
If the task has done set_current_state(TASK_INTERRUPTIBLE) without
schedule() the second condition will be met and vruntime will be
incorrectly adjusted twice.
In certain cases this can result in the task's vruntime never increasing
past the vruntime of other tasks on the CFS' run queue, starving them of
CPU time.
This patch changes switched_from_fair() to use !p->on_rq instead of
!se->on_rq.
I'm able to cause a task with a priority of 120 to starve all other
tasks with the same priority on an ARM platform running 3.2.51-rt72
PREEMPT RT by writing one character at time to a serial tty (16550 UART)
in a tight loop. I'm also able to verify making this change corrects the
problem on that platform and kernel version.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c685689fd2 upstream.
We hit one rare case below:
T1 calling disable_irq(), but hanging at synchronize_irq()
always;
The corresponding irq thread is in sleeping state;
And all CPUs are in idle state;
After analysis, we found there is one possible scenerio which
causes T1 is waiting there forever:
CPU0 CPU1
synchronize_irq()
wait_event()
spin_lock()
atomic_dec_and_test(&threads_active)
insert the __wait into queue
spin_unlock()
if(waitqueue_active)
atomic_read(&threads_active)
wake_up()
Here after inserted the __wait into queue on CPU0, and before
test if queue is empty on CPU1, there is no barrier, it maybe
cause it is not visible for CPU1 immediately, although CPU0 has
updated the queue list.
It is similar for CPU0 atomic_read() threads_active also.
So we'd need one smp_mb() before waitqueue_active.that, but removing
the waitqueue_active() check solves it as wel l and it makes
things simple and clear.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Xiaoming Wang <xiaoming.wang@intel.com>
Link: http://lkml.kernel.org/r/1393212590-32543-1-git-send-email-chuansheng.liu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 3.2: The corresponding check is in irq_thread()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 143582c684 upstream.
Only the first packet is currently handled correctly, but then
all others are assumed to have failed which is problematic. Fix
this, marking them all successful instead (since if they're not
then the firmware will have transmitted them as single frames.)
This fixes the lost packet reporting.
Also do a tiny variable scoping cleanup.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[Add the dvm part]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[bwh: Backported to 3.2:
- Drop the mvm part
- Adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b3619b288b upstream.
There is a typo in the Limiter2 Release Rate control, a wrong enum for
Limiter1 is assigned. It must point to Limiter2.
Spotted by a compile warning:
In file included from sound/soc/codecs/sta32x.c:34:0:
sound/soc/codecs/sta32x.c:223:29: warning: ‘sta32x_limiter2_release_rate_enum’ defined but not used [-Wunused-variable]
static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
^
include/sound/soc.h:275:18: note: in definition of macro ‘SOC_ENUM_DOUBLE_DECL’
struct soc_enum name = SOC_ENUM_DOUBLE(xreg, xshift_l, xshift_r, \
^
sound/soc/codecs/sta32x.c:223:8: note: in expansion of macro ‘SOC_ENUM_SINGLE_DECL’
static SOC_ENUM_SINGLE_DECL(sta32x_limiter2_release_rate_enum,
^
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1227f3c10 upstream.
ehci_irq() and ehci_hrtimer_func() can deadlock on ehci->lock when
threadirqs option is used. To prevent the deadlock use
spin_lock_irqsave() in ehci_irq().
This change can be reverted when hrtimer callbacks become threaded.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6dbd46c849 upstream.
Hello,
the following patch adds an entry for the PID of a Cressi Leonardo
diving computer interface to kernel 3.13.0.
It is detected as FT232RL.
Works with subsurface.
Signed-off-by: Joerg Dorchain <joerg@dorchain.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3ca416452 upstream.
acpi_processor_set_throttling() uses set_cpus_allowed_ptr() to make
sure that the (struct acpi_processor)->acpi_processor_set_throttling()
callback will run on the right CPU. However, the function may be
called from a worker thread already bound to a different CPU in which
case that won't work.
Make acpi_processor_set_throttling() use work_on_cpu() as appropriate
instead of abusing set_cpus_allowed_ptr().
Reported-and-tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da87ca4d4c upstream.
Since commit 7787380336 "net_dma: mark broken" we no longer pin dma
engines active for the network-receive-offload use case. As a result
the ->free_chan_resources() that occurs after the driver self test no
longer has a NET_DMA induced ->alloc_chan_resources() to back it up. A
late firing irq can lead to ksoftirqd spinning indefinitely due to the
tasklet_disable() performed by ->free_chan_resources(). Only
->alloc_chan_resources() can clear this condition in affected kernels.
This problem has been present since commit 3e037454bc "I/OAT: Add
support for MSI and MSI-X" in 2.6.24, but is now exposed. Given the
NET_DMA use case is deprecated we can revisit moving the driver to use
threaded irqs. For now, just tear down the irq and tasklet properly by:
1/ Disable the irq from triggering the tasklet
2/ Disable the irq from re-arming
3/ Flush inflight interrupts
4/ Flush the timer
5/ Flush inflight tasklets
References:
https://lkml.org/lkml/2014/1/27/282https://lkml.org/lkml/2014/2/19/672
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Mike Galbraith <bitbucket@online.de>
Reported-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Tested-by: Mike Galbraith <bitbucket@online.de>
Tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[bwh: Backported to 3.2:
- Adjust context
- As there is no ioatdma_device::irq_mode member, check
pci_dev::msix_enabled instead]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Dan Williams <dan.j.williams@intel.com>
commit 75135da0d6 upstream.
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0f5eeed0f upstream.
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1362f4ea20 upstream.
Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:
CPU1 CPU2
dqput()
spin_lock(&dq_list_lock);
if (atomic_read(&dquot->dq_count) > 1) {
- not taken
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
->release_dquot(dquot);
if (atomic_read(&dquot->dq_count) > 1)
- not taken
dquot_scan_active()
spin_lock(&dq_list_lock);
if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
- not taken
atomic_inc(&dquot->dq_count);
spin_unlock(&dq_list_lock);
- proceeds to release dquot
ret = fn(dquot, priv);
- called for inactive dquot
Fix the problem by making sure possible ->release_dquot() is finished by
the time we call the callback and new calls to it will notice reference
dquot_scan_active() has taken and bail out.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b3050248c1 upstream.
The minimum CCA power threshold values have to be adjusted
for existing cards to be in compliance with new regulations.
Newer cards will make use of the values obtained from EEPROM,
support for this was added earlier. To make sure that cards
that are already in use and don't have proper values in EEPROM,
do not violate regulations, use the initvals instead.
Reported-by: Jeang Daniel <dyjeong@qca.qualcomm.com>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9085a64229 upstream.
When writing policy via /sys/fs/selinux/policy I wrote the type and class
of filename trans rules in CPU endian instead of little endian. On
x86_64 this works just fine, but it means that on big endian arch's like
ppc64 and s390 userspace reads the policy and converts it from
le32_to_cpu. So the values are all screwed up. Write the values in le
format like it should have been to start.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1d147bfa64 upstream.
There is a race between the TX path and the STA wakeup: while
a station is sleeping, mac80211 buffers frames until it wakes
up, then the frames are transmitted. However, the RX and TX
path are concurrent, so the packet indicating wakeup can be
processed while a packet is being transmitted.
This can lead to a situation where the buffered frames list
is emptied on the one side, while a frame is being added on
the other side, as the station is still seen as sleeping in
the TX path.
As a result, the newly added frame will not be send anytime
soon. It might be sent much later (and out of order) when the
station goes to sleep and wakes up the next time.
Additionally, it can lead to the crash below.
Fix all this by synchronising both paths with a new lock.
Both path are not fastpath since they handle PS situations.
In a later patch we'll remove the extra skb queue locks to
reduce locking overhead.
BUG: unable to handle kernel
NULL pointer dereference at 000000b0
IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1
EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000)
iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9
Stack:
e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0
ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210
ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002
Call Trace:
[<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211]
[<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211]
[<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211]
[<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211]
[<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211]
[<c149ef70>] dev_hard_start_xmit+0x450/0x950
[<c14b9aa9>] sch_direct_xmit+0xa9/0x250
[<c14b9c9b>] __qdisc_run+0x4b/0x150
[<c149f732>] dev_queue_xmit+0x2c2/0xca0
Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
[reword commit log, use a separate lock]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd8ba20597 upstream.
Some devices have duplicate entries in there brightness levels table, ie
on my Dell Latitude E6430 the table looks like this:
[ 3.686060] acpi backlight index 0, val 80
[ 3.686095] acpi backlight index 1, val 50
[ 3.686122] acpi backlight index 2, val 5
[ 3.686147] acpi backlight index 3, val 5
[ 3.686172] acpi backlight index 4, val 5
[ 3.686197] acpi backlight index 5, val 5
[ 3.686223] acpi backlight index 6, val 5
[ 3.686248] acpi backlight index 7, val 5
[ 3.686273] acpi backlight index 8, val 6
[ 3.686332] acpi backlight index 9, val 7
[ 3.686356] acpi backlight index 10, val 8
[ 3.686380] acpi backlight index 11, val 9
etc.
Notice that brightness values 0-5 are all mapped to 5. This means that
if userspace writes any value between 0 and 5 to the brightness sysfs attribute
and then reads it, it will always return 0, which is somewhat unexpected.
This is a problem for ie gnome-settings-daemon, which uses read-modify-write
logic when the users presses the brightness up or down keys. This is done
this way to take brightness changes from other sources into account.
On this specific laptop what happens once the brightness has been set to 0,
is that gsd reads 0, adds 5, writes 5, and on the next brightness up key press
again reads 0, so things get stuck at the lowest brightness setting.
Filtering out the duplicate table entries, makes any write to brightness
read back as the written value as one would expect, fixing this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 025c3fa925 upstream.
Preset EQ enum of sta32x codec driver declares too many number of
items and it may lead to the access over the actual array size.
Use SOC_ENUM_SINGLE_DECL() helper and it's automatically fixed.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 500a91571f upstream.
When trying to set the minimum temperature, the driver was erroneously
writing the maximum temperature into the chip.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 532de3fc72 upstream.
Currently, there's nothing preventing cgroup_enable_task_cg_lists()
from missing set PF_EXITING and race against cgroup_exit(). Depending
on the timing, cgroup_exit() may finish with the task still linked on
css_set leading to list corruption. Fix it by grabbing siglock in
cgroup_enable_task_cg_lists() so that PF_EXITING is guaranteed to be
visible.
This whole on-demand cg_list optimization is extremely fragile and has
ample possibility to lead to bugs which can cause things like
once-a-year oops during boot. I'm wondering whether the better
approach would be just adding "cgroup_disable=all" handling which
disables the whole cgroup rather than tempting fate with this
on-demand craziness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5bdfff96c6 upstream.
When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop(). This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock. WORKER_DIE is
first set while holding pool->lock, the lock is dropped and
kthread_stop() is called.
Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().
Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.
tj: Improved patch description and comment. Moved pinning above
WORKER_DIE for better signify what it's protecting.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e8d6d85ad upstream.
High-speed USB connections revert back to full-speed signalling when
the device goes into suspend. This takes several milliseconds, and
during that time it's not possible to tell reliably whether the device
has been disconnected.
On some platforms, the Wake-On-Disconnect circuitry gets confused
during this intermediate state. It generates a false wakeup signal,
which can prevent the controller from going to sleep.
To avoid this problem, this patch adds a 5-ms delay to the
ehci_bus_suspend() routine if any ports have to switch over to
full-speed signalling. (Actually, the delay was already present for
devices using a particular kind of PHY power management; the patch
merely causes the delay to be used more widely.)
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reviewed-by: Peter Chen <Peter.Chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- s/has_tdi_phy_lpm/has_hostpc/
- Always re-lock ehci->lock after the sleep]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 67809f85d3 upstream.
Samsung's pci-e SSDs with device ID 0x1600 which are found on some
macbooks time out on NCQ commands. Blacklist NCQ on the device so
that the affected machines can at least boot.
Original-patch-by: Levente Kurusa <levex@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60731
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8d80390cfc upstream.
For avr32 cross compiler, do not define '__linux__' internally, so it
will cause issue with allmodconfig.
The related error:
CC [M] fs/coda/psdev.o
In file included from include/linux/coda.h:64,
from fs/coda/psdev.c:45:
include/uapi/linux/coda.h:221: error: expected specifier-qualifier-list before 'u_quad_t'
The related toolchain version (which only download, not re-compile):
[root@gchen linux-next]# /upstream/toolchain/download/avr32-gnu-toolchain-linux_x86/bin/avr32-gcc -v
Using built-in specs.
Target: avr32
Configured with: /data2/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/src/gcc/configure --target=avr32 --host=i686-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-languages=c,c++ --disable-nls --disable-libssp --disable-libstdcxx-pch --with-dwarf2 --enable-version-specific-runtime-libs --disable-shared --enable-doc --with-mpfr-lib=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/lib --with-mpfr-include=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86/include --with-gmp=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --with-mpc=/home/toolsbuild/jenkins-knuth/workspace/avr32-gnu-toolchain/avr32-gnu-toolchain-linux_x86 --enable-__cxa_atexit --disable-shared --with-newlib --with-pkgversion=AVR_32_bit_GNU_Toolchain_3.4.2_435 --with-bugurl=http://www
.atmel.com/avr
Thread model: single
gcc version 4.4.7 (AVR_32_bit_GNU_Toolchain_3.4.2_435)
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hans-Christian Egtvedt <hegtvedt@cisco.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5745d6a41a upstream.
Causing this:
In file included from arch/avr32/boards/mimc200/fram.c:13:
include/linux/miscdevice.h:51: error: field 'list' has incomplete type
include/linux/miscdevice.h:55: error: expected specifier-qualifier-list before 'mode_t'
arch/avr32/boards/mimc200/fram.c:42: error: 'THIS_MODULE' undeclared here (not in a function)
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 624aef494f upstream.
When the driver tries to access Function Unit 10, the KEF X300A
speakers' firmware apparently locks up, making even PCM streaming
impossible. Work around this by ignoring this FU.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e9baa9d9d5 upstream.
It appears that in the DMA40 driver the DMA tasklet will very
often dereference memory for a descriptor just free:d from the
DMA40 slab. Nothing happens because no other part of the driver
has yet had a chance to claim this memory, but it's really
nasty to dereference free:d memory, so let's check the flag
before the descriptor is free and store it in a bool variable.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cb6ef42e51 upstream.
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b6213e413a upstream.
This patch fixes regression caused by commit a16dad7763 "MIPS: Fix
potencial corruption". That commit fixes one corruption scenario in
cost of adding another one, which actually start to cause crashes
on Yeeloong laptop when rtl8187 driver is used.
For correct DMA read operation on machines without DMA coherence, kernel
have to invalidate cache, such it will refill later with new data that
device wrote to memory, when that data is needed to process. We can only
invalidate full cache line. Hence when cache line includes both dma
buffer and some other data (written in cache, but not yet in main
memory), the other data can not hit memory due to invalidation. That
happen on rtl8187 where struct rtl8187_priv fields are located just
before and after small buffers that are passed to USB layer and DMA
is performed on them.
To fix the problem we align buffers and reserve space after them to make
them match cache line.
This patch does not resolve all possible MIPS problems entirely, for
that we have to assure that we always map cache aligned buffers for DMA,
what can be complex or even not possible. But patch fixes visible and
reproducible regression and seems other possible corruptions do not
happen in practice, since Yeeloong laptop works stable without rtl8187
driver.
Bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=54391
Reported-by: Petr Pisar <petr.pisar@atlas.cz>
Bisected-by: Tom Li <biergaizi2009@gmail.com>
Reported-and-tested-by: Tom Li <biergaizi2009@gmail.com>
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Acked-by: Larry Finger <Larry.Finger@lwfinger.next>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a16dad7763 upstream.
Normally r4k_dma_cache_inv should only ever be called with cacheline
aligned addresses. If however, it isn't there is the theoretical
possibility of data corruption. There is no correct way of handling this
and anyway, it should only happen if the DMA API is used incorrectly
so drop
There is a different corruption scenario with these CACHE instructions
removed but again there is no way of handling this correctly and it can
be triggered only through incorrect use of the DMA API.
So just get rid of the complexity.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: James Rodriguez <jamesr@juniper.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f78bccd79b upstream.
rtl8192ce is disabling for too long the local interrupts during hw initiatialisation when performing scans
The observable symptoms in dmesg can be:
- underruns from ALSA playback
- clock freezes (tstamps do not change for several dmesg entries until irqs are finaly reenabled):
[ 250.817669] rtlwifi:rtl_op_config():<0-0-0> 0x100
[ 250.817685] rtl8192ce:_rtl92ce_phy_set_rf_power_state():<0-1-0> IPS Set eRf nic enable
[ 250.817732] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.817796] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.817910] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818024] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818139] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818253] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818367] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:18051d59:11
[ 250.818472] rtl8192ce:_rtl92ce_init_mac():<0-1-0> reg0xec:98053f15:10
[ 250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[ 250.818472] rtl8192c_common:rtl92c_download_fw():<0-1-0> Firmware Version(49), Signature(0x88c1),Size(32)
[ 250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> PairwiseEncAlgorithm = 0 GroupEncAlgorithm = 0
[ 250.818472] rtl8192ce:rtl92ce_enable_hw_security_config():<0-1-0> The SECR-value cc
[ 250.818472] rtl8192c_common:rtl92c_dm_check_txpower_tracking_thermal_meter():<0-1-0> Schedule TxPowerTracking direct call!!
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> rtl92c_dm_txpower_tracking_callback_thermalmeter
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial pathA ele_d reg0xc80 = 0x40000000, ofdm_index=0xc
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Initial reg0xa24 = 0x90e1317, cck_index=0xc, ch14 0
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> Readback Thermal Meter = 0xe pre thermal meter 0xf eeprom_thermalmeter 0xf delta 0x1 delta_lck 0x0 delta_iqk 0x0
[ 250.818472] rtl8192c_common:rtl92c_dm_txpower_tracking_callback_thermalmeter():<0-1-0> <===
[ 250.818472] rtl8192c_common:rtl92c_dm_initialize_txpower_tracking_thermalmeter():<0-1-0> pMgntInfo->txpower_tracking = 1
[ 250.818472] rtl8192ce:rtl92ce_led_control():<0-1-0> ledaction 3
[ 250.818472] rtl8192ce:rtl92ce_sw_led_on():<0-1-0> LedAddr:4E ledpin=1
[ 250.818472] rtlwifi:rtl_ips_nic_on():<0-1-0> before spin_unlock_irqrestore
[ 251.154656] PCM: Lost interrupts? [Q]-0 (stream=0, delta=15903, new_hw_ptr=293408, old_hw_ptr=277505)
The exact code flow that causes that is:
1. wpa_supplicant send a start_scan request to the nl80211 driver
2. mac80211 module call rtl_op_config with IEEE80211_CONF_CHANGE_IDLE
3. rtl_ips_nic_on is called which disable local irqs
4. rtl92c_phy_set_rf_power_state() is called
5. rtl_ps_enable_nic() is called and hw_init()is executed and then the interrupts on the device are enabled
A good solution could be to refactor the code to avoid calling rtl92ce_hw_init() with the irqs disabled
but a quick and dirty solution that has proven to work is
to reenable the irqs during the function rtl92ce_hw_init().
I think that it is safe doing so since the device interrupt will only be enabled after the init function succeed.
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2ec197db1a upstream.
If an NFS client attempts to get a lock (using NLM) and the lock is
not available, the server will remember the request and when the lock
becomes available it will send a GRANT request to the client to
provide the lock.
If the client already held an adjacent lock, the GRANT callback will
report the union of the existing and new locks, which can confuse the
client.
This happens because __posix_lock_file (called by vfs_lock_file)
updates the passed-in file_lock structure when adjacent or
over-lapping locks are found.
To avoid this problem we take a copy of the two fields that can
be changed (fl_start and fl_end) before the call and restore them
afterwards.
An alternate would be to allocate a 'struct file_lock', initialise it,
use locks_copy_lock() to take a copy, then locks_release_private()
after the vfs_lock_file() call. But that is a lot more work.
Reported-by: Olaf Kirch <okir@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
--
v1 had a couple of issues (large on-stack struct and didn't really work properly).
This version is much better tested.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 04eada25d1 upstream.
Give more slack to sink devices before retrying on native aux
defer. AFAICT the 100 us timeout was not based on the DP spec.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 789b5e0315 upstream.
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Interestingly, the raid5 code can actually prevent double initialization and
hence can use the following simplified form of callback registration:
register_cpu_notifier(&foobar_cpu_notifier);
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
put_online_cpus();
A hotplug operation that occurs between registering the notifier and calling
get_online_cpus(), won't disrupt anything, because the code takes care to
perform the memory allocations only once.
So reorganize the code in raid5 this way to fix the deadlock with callback
registration.
Cc: linux-raid@vger.kernel.org
Fixes: 36d1c6476b
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[Srivatsa: Fixed the unregister_cpu_notifier() deadlock, added the
free_scratch_buffer() helper to condense code further and wrote the changelog.]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c8123f8c9c upstream.
When mkfs issues a full device discard and the device only
supports discards of a smallish size, we can loop in
blkdev_issue_discard() for a long time. If preempt isn't enabled,
this can turn into a softlock situation and the kernel will
start complaining.
Add an explicit cond_resched() at the end of the loop to avoid
that.
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3635c7e2d5 upstream.
Interface #5 of 19d2:1270 is a net interface which has been submitted to the
qmi_wwan driver so consequently remove it from the option driver.
Signed-off-by: Raymond Wanyoike <raymond.wanyoike@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d651aa1d68 upstream.
Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
that page use a 27 bit delta against that timestamp in order to save on
bits written to the ring buffer. If the time between events is larger than
what the 27 bits can hold, a "time extend" event is added to hold the
entire 64 bit timestamp again and the events after that hold a delta from
that timestamp.
As a "time extend" is always paired with an event, it is logical to just
allocate the event with the time extend, to make things a bit more efficient.
Unfortunately, when the pairing code was written, it removed the "delta = 0"
from the first commit on a page, causing the events on the page to be
slightly skewed.
Fixes: 69d1b839f7 "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 564eb714f5 upstream.
xen/gntdev.h and xen/gntalloc.h both provide userspace ABIs so they
should be installed.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[bwh: Backported to 3.2: no renaming is required]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06ea0bfe6e upstream.
When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.
Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1 (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 96c7a2ff21 upstream.
Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept. At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1. The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.
I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.
There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available. Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.
A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.
Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem. Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification). We have already tried everything
reasonable to allocate a page and the only thing left to do is wait. So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7c8746a9eb upstream.
When unlocking a spinlock, we require the following, strictly ordered
sequence of events:
<barrier> /* dmb */
<unlock>
<barrier> /* dsb */
<sev>
Whilst the code does indeed reflect this in terms of the architecture,
the final <barrier> + <sev> have been contracted into a single inline
asm without a "memory" clobber, therefore the compiler is at liberty to
reorder the unlock to the end of the above sequence. In such a case,
a waiting CPU may be woken up before the lock has been unlocked, leading
to extremely poor performance.
This patch reworks the dsb_sev() function to make use of the dsb()
macro and ensure ordering against the unlock.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: 'ishst' variant is not used here]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bae0ca2bc5 upstream.
During __v{6,7}_setup, we invalidate the TLBs since we are about to
enable the MMU on return to head.S. Unfortunately, without a subsequent
dsb instruction, the invalidation is not guaranteed to have completed by
the time we write to the sctlr, potentially exposing us to junk/stale
translations cached in the TLB.
This patch reworks the init functions so that the dsb used to ensure
completion of cache/predictor maintenance is also used to ensure
completion of the TLB invalidation.
Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 03b56329f9 upstream.
Commit afe2dab4f6 ("USB: add hex/bcd detection to usb modalias generation")
changed the routine that generates alias ranges. Before that change, only
digits 0-9 were supported; the commit tried to fix the case when the range
includes higher values than 0x9.
Unfortunately, the commit didn't fix the case when the range includes both
0x9 and 0xA, meaning that the final range must look like [x-9A-y] where
x <= 0x9 and y >= 0xA -- instead the [x-9A-x] range was produced.
Modprobe doesn't complain as it sees no difference between no-match and
bad-pattern results of fnmatch().
Fixing this simple bug to fix the aliases.
Also changing the hardcoded beginning of the range to uppercase as all the
other letters are also uppercase in the device version numbers.
Fortunately, this affects only the dvb-usb-dib0700 module, AFAIK.
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3661371701 upstream.
Backend drivers shouldn't transistion to CLOSED unless the frontend is
CLOSED. If a backend does transition to CLOSED too soon then the
frontend may not see the CLOSING state and will not properly shutdown.
So, treat an unexpected backend CLOSED state the same as CLOSING.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1e85c1ea1f upstream.
The last value written to a analog output channel is cached in the
private data of this driver for readback.
Currently, the wrong value is cached in the (*insn_write) functions.
The current code stores the data[n] value for readback afer the loop
has written all the values. At this time 'n' points past the end of
the data array.
Fix the functions by using a local variable to hold the data being
written to the analog output channel. This variable is then used
after the loop is complete to store the readback value. The current
value is retrieved before the loop in case no values are actually
written..
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Reviewed-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3ac06b9056 upstream.
3GPP TS 07.10 states in section 5.4.6.3.7:
"The length byte contains the value 2 or 3 ... depending on the break
signal." The break byte is optional and if it is sent, the length is
3. In fact the driver was not able to work with modems that send this
break byte in their modem status control message. If the modem just
sends the break byte if it is really set, then weird things might
happen.
The code for deconding the modem status to the internal linux
presentation in gsm_process_modem has already a big comment about
this 2 or 3 byte length thing and it is already able to decode the
brk, but the code calling the gsm_process_modem function in
gsm_control_modem does not encode it and hand it over the right way.
This patch fixes this.
Without this fix if the modem sends the brk byte in it's modem status
control message the driver will hang when opening a muxed channel.
Signed-off-by: Lars Poeschel <poeschel@lemonage.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5bbb2ae3d6 upstream.
bind_get() checks the device number it is called with. It uses
MAX_RAW_MINORS for the upper bound. But MAX_RAW_MINORS is set at compile
time while the actual number of raw devices can be set at runtime. This
means the test can either be too strict or too lenient. And if the test
ends up being too lenient bind_get() might try to access memory beyond
what was allocated for "raw_devices".
So check against the runtime value (max_raw_minors) in this function.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 269f979467 upstream.
When the guest attempts to connect with the host when there may already be a
connection with the host (as would be the case during the kdump/kexec path),
it is difficult to guarantee timely response from the host. Starting with
WS2012 R2, the host supports this ability to re-connect with the host
(explicitly to support kexec). Prior to responding to the guest, the host
needs to ensure that device states based on the previous connection to
the host have been properly torn down. This may introduce unbounded delays.
To deal with this issue, don't do a timed wait during the initial connect
with the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 227d53b397 upstream.
To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.
aio_migratepage [disable interrupt]
migrate_page_copy
clear_page_dirty_for_io
set_page_dirty
__set_page_dirty_buffers
__set_page_dirty
spin_lock_irq
This mean, current aio migration is a deadlockable. spin_lock_irqsave
is a safer alternative and we should use it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a85d9df1ea upstream.
During aio stress test, we observed the following lockdep warning. This
mean AIO+numa_balancing is currently deadlockable.
The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.
Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&ctx->completion_lock)->rlock);
<Interrupt>
lock(&(&ctx->completion_lock)->rlock);
*** DEADLOCK ***
dump_stack+0x19/0x1b
print_usage_bug+0x1f7/0x208
mark_lock+0x21d/0x2a0
mark_held_locks+0xb9/0x140
trace_hardirqs_on_caller+0x105/0x1d0
trace_hardirqs_on+0xd/0x10
_raw_spin_unlock_irq+0x2c/0x50
__set_page_dirty_nobuffers+0x8c/0xf0
migrate_page_copy+0x434/0x540
aio_migratepage+0xb1/0x140
move_to_new_page+0x7d/0x230
migrate_pages+0x5e5/0x700
migrate_misplaced_page+0xbc/0xf0
do_numa_page+0x102/0x190
handle_pte_fault+0x241/0x970
handle_mm_fault+0x265/0x370
__do_page_fault+0x172/0x5a0
do_page_fault+0x1a/0x70
page_fault+0x28/0x30
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f893ab41e4 upstream.
swapoff clear swap_info's SWP_USED flag prematurely and free its
resources after that. A concurrent swapon will reuse this swap_info
while its previous resources are not cleared completely.
These late freed resources are:
- p->percpu_cluster
- swap_cgroup_ctrl[type]
- block_device setting
- inode->i_flags &= ~S_SWAPFILE
This patch clears the SWP_USED flag after all its resources are freed,
so that swapon can reuse this swap_info by alloc_swap_info() safely.
[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6583327c4d upstream.
Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.
The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions. It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.
This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80d767d770 upstream.
When compiling for the IA-64 ski emulator, HZ is set to 32 because the
emulation is slow and we don't want to waste too many cycles processing
timers. Alpha also has an option to set HZ to 32.
This causes integer underflow in
kernel/time/jiffies.c:
kernel/time/jiffies.c:66:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
.mult = NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
^
This patch reduces the JIFFIES_SHIFT value to avoid the overflow.
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1401241639100.23871@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 338f977f4e upstream.
The "new" fragmentation code (since my rewrite almost 5 years ago)
erroneously sets skb->len rather than using skb_trim() to adjust
the length of the first fragment after copying out all the others.
This leaves the skb tail pointer pointing to after where the data
originally ended, and thus causes the encryption MIC to be written
at that point, rather than where it belongs: immediately after the
data.
The impact of this is that if software encryption is done, then
a) encryption doesn't work for the first fragment, the connection
becomes unusable as the first fragment will never be properly
verified at the receiver, the MIC is practically guaranteed to
be wrong
b) we leak up to 8 bytes of plaintext (!) of the packet out into
the air
This is only mitigated by the fact that many devices are capable
of doing encryption in hardware, in which case this can't happen
as the tail pointer is irrelevant in that case. Additionally,
fragmentation is not used very frequently and would normally have
to be configured manually.
Fix this by using skb_trim() properly.
Fixes: 2de8e0d999 ("mac80211: rewrite fragmentation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76f24e3f39 upstream.
Adding two more IDs to the ftdi_sio usb serial driver.
It now connects Tagsys RFID readers.
There might be more IDs out there for other Tagsys models.
Signed-off-by: Ulrich Hahn <uhahn@eanco.de>
Cc: Johan Hovold <johan@hovold.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2172fa709a upstream.
Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.
Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy. In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.
Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo
Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled. Any subsequent access to foo
after doing the above will also trigger the BUG.
BUG output from Matthew Thode:
[ 473.893141] ------------[ cut here ]------------
[ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
[ 473.995314] invalid opcode: 0000 [#6] SMP
[ 474.027196] Modules linked in:
[ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
3.13.0-grsec #1
[ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
[ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[ 474.556058] Stack:
[ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[ 474.690461] Call Trace:
[ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[ 475.255884] RIP [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[ 475.296120] RSP <ffff8805c0ac3c38>
[ 475.328734] ---[ end trace f076482e9d754adc ]---
Reported-by: Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 14e2abb732 upstream.
On IBM pseries systems the device_type device-tree property of a PCIe
bridge contains the string "pciex". The of_bus_pci_match() function was
looking only for "pci" on this property, so in such cases the bus
matching code was falling back to the default bus, causing problems on
functions that should be using "assigned-addresses" for region address
translation. This patch fixes the problem by also looking for "pciex" on
the PCI bus match function.
v2: added comment
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6dd18e4684 upstream.
Commit:
e38c0a1fbc
of/address: Handle #address-cells > 2 specially
broke real time clock access on Bimini, js2x, and similar powerpc
machines using the "maple" platform. That code was indirectly relying
on the old (broken) behaviour of the translation for the hypertransport
to ISA bridge.
This fixes it by treating hypertransport as a PCI bus
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d3c56568f4 upstream.
We've seen often problems after suspend/resume on Acer Aspire One
AO725 with ALC271X codec as reported in kernel bugzilla, and it turned
out that some COEFs doesn't work and triggers the codec communication
stall.
Since these magic COEF setups are specific to ALC269VB for some PLL
configurations, the machine works even without these manual
adjustment. So, let's simply avoid applying them for ALC271X.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=52181
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: return 0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 823d12c95c upstream.
People sometimes create their own custom-configured kernels and forget
to enable CONFIG_SCSI_MULTI_LUN. This causes problems when they plug
in a USB storage device (such as a card reader) with more than one
LUN.
Fortunately, we can tell fairly easily when a storage device claims to
have more than one LUN. When that happens, this patch asks the SCSI
layer to probe all the LUNs automatically, regardless of the config
setting.
The patch also updates the Kconfig help text for usb-storage,
explaining that CONFIG_SCSI_MULTI_LUN may be necessary.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Thomas Raschbacher <lordvan@lordvan.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- slave_alloc() already has a us_data pointer]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a9c143c826 upstream.
The Cypress ATACB unusual-devs entry for the Super Top SATA bridge
causes problems. Although it was originally reported only for
bcdDevice = 0x160, its range was much larger. This resulted in a bug
report for bcdDevice 0x220, so the range was capped at 0x219. Now
Milan reports errors with bcdDevice 0x150.
Therefore this patch restricts the range to just 0x160.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Milan Svoboda <milan.svoboda@centrum.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8298383c2c upstream.
Even though we make sure PowerSave is not enabled by default
by disabling the flag, WIPHY_FLAG_PS_ON_BY_DEFAULT on init,
PS could be enabled by userspace based on various factors
like battery usage etc. Since PS in ath9k is just broken
and has been untested for years, remove support for it, but
allow a user to explicitly enable it using a module parameter.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6bca610d97 upstream.
It is a copy/paste of patch provided by Sujith for ath9k.
"Even though we make sure PowerSave is not enabled by default
by disabling the flag, WIPHY_FLAG_PS_ON_BY_DEFAULT on init,
PS could be enabled by userspace based on various factors
like battery usage etc. Since PS in ath9k is just broken
and has been untested for years, remove support for it, but
allow a user to explicitly enable it using a module parameter."
Signed-off-by: Oleksij Rempel <linux@rempel-privat.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d7736ff5be upstream.
Dumps created by kdump or zfcpdump can contain invalid memory holes when
dumping z/VM systems that have memory pressure.
For example:
# zgetdump -i /proc/vmcore.
Memory map:
0000000000000000 - 0000000000bfffff (12 MB)
0000000000e00000 - 00000000014fffff (7 MB)
000000000bd00000 - 00000000f3bfffff (3711 MB)
The memory detection function find_memory_chunks() issues tprot to
find valid memory chunks. In case of CMM it can happen that pages are
marked as unstable via set_page_unstable() in arch_free_page().
If z/VM has released that pages, tprot returns -EFAULT and indicates
a memory hole.
So fix this and switch off CMM in case of kdump or zfcpdump.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 13e1b87c98 upstream.
Fix the following build error:
drivers/media/usb/dvb-usb-v2/
mxl111sf-tuner.h:72:9: error: expected ‘;’, ‘,’ or ‘)’ before ‘struct’
struct mxl111sf_tuner_config *cfg)
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f9c47f00c upstream.
It's a bit odd to see a newer device showing mod15write; however, the
reported behavior is highly consistent and other factors which could
contribute seem to have been verified well enough. Also, both
sata_sil itself and the drive are fairly outdated at this point making
the risk of this change fairly low. It is possible, probably likely,
that other drive models in the same family have the same problem;
however, for now, let's just add the specific model which was tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: matson <lists-matsonpa@luxsci.me>
References: http://lkml.kernel.org/g/201401211912.s0LJCk7F015058@rs103.luxsci.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0ef38d70d4 upstream.
The patch 3ddc5b46a8 breaks networking on
alpha (there is a follow-up fix 5cfe8f1ba5,
but networking is still broken even with the second patch).
The patch 3ddc5b46a8 makes
csum_partial_copy_from_user check the pointer with access_ok. However,
csum_partial_copy_from_user is called also from csum_partial_copy_nocheck
and csum_partial_copy_nocheck is called on kernel pointers and it is
supposed not to check pointer validity.
This bug results in ssh session hangs if the system is loaded and bulk
data are printed to ssh terminal.
This patch fixes csum_partial_copy_nocheck to call set_fs(KERNEL_DS), so
that access_ok in csum_partial_copy_from_user accepts kernel-space
addresses.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit efb9e0f4f4 upstream.
Without the patch the kernel generates the following error.
ata11.15: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata11.15: Port Multiplier vendor mismatch '0x197b' != '0x123'
ata11.15: PMP revalidation failed (errno=-19)
ata11.15: failed to recover PMP after 5 tries, giving up
This patch helps to bypass this error and the device becomes
functional.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: <linux-ide@vger.kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 778c14affa upstream.
A 3% of system memory bonus is sometimes too excessive in comparison to
other processes.
With commit a63d83f427 ("oom: badness heuristic rewrite"), the OOM
killer tries to avoid killing privileged tasks by subtracting 3% of
overall memory (system or cgroup) from their per-task consumption. But
as a result, all root tasks that consume less than 3% of overall memory
are considered equal, and so it only takes 33+ privileged tasks pushing
the system out of memory for the OOM killer to do something stupid and
kill dhclient or other root-owned processes. For example, on a 32G
machine it can't tell the difference between the 1M agetty and the 10G
fork bomb member.
The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not
the same as discounting 3% of overall memory from _every privileged task
individually_ during OOM selection.
Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.
By giving root tasks a bonus that is proportional to their actual size,
they remain comparable even when relatively small. In the example
above, the OOM killer will discount the 1M agetty's 256 badness points
down to 179, and the 10G fork bomb's 262144 points down to 183500 points
and make the right choice, instead of discounting both to 0 and killing
agetty because it's first in the task list.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: existing code changes 'points' directly rather
than using 'adj' variable]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee97dc7db4 upstream.
In s390 des and 3des ctr mode there is one preallocated page
used to speed up the en/decryption. This page is not protected
against concurrent usage and thus there is a potential of data
corruption with multiple threads.
The fix introduces locking/unlocking the ctr page and a slower
fallback solution at concurrency situations.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit adc3fcf155 upstream.
In s390 des and des3_ede cbc mode the iv value is not protected
against concurrency access and modifications from another running
en/decrypt operation which is using the very same tfm struct
instance. This fix copies the iv to the local stack before
the crypto operation and stores the value back when done.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0519e9ad89 upstream.
The aes-ctr mode uses one preallocated page without any concurrency
protection. When multiple threads run aes-ctr encryption or decryption
this can lead to data corruption.
The patch introduces locking for the page and a fallback solution with
slower en/decryption performance in concurrency situations.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 36eb2caa7b upstream.
Remove the BUG_ON's that check for failure or incomplete
results of the s390 hardware crypto instructions.
Rather report the errors as -EIO to the crypto layer.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ee291e6329 upstream.
When creating network portals rapidly, such as when restoring a
configuration, LIO's code to reuse existing portals can return a false
negative if the thread hasn't run yet and set np_thread_state to
ISCSI_NP_THREAD_ACTIVE. This causes an error in the network stack
when attempting to bind to the same address/port.
This patch sets NP_THREAD_ACTIVE before the np is placed on g_np_list,
so even if the thread hasn't run yet, iscsit_get_np will return the
existing np.
Also, convert np_lock -> np_mutex + hold across adding new net portal
to g_np_list to prevent a race where two threads may attempt to create
the same network portal, resulting in one of them failing.
(nab: Add missing mutex_unlocks in iscsit_add_np failure paths)
(DanC: Fix incorrect spin_unlock -> spin_unlock_bh)
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aac5c4226e upstream.
If kvm_io_bus_register_dev() fails then it returns success but it should
return an error code.
I also did a little cleanup like removing an impossible NULL test.
Fixes: 2b3c246a68 ('KVM: Make coalesced mmio use a device per zone')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90d3e592e9 upstream.
We have a race during inode init because the BTRFS_I(inode)->location is setup
after the inode hash table lock is dropped. btrfs_find_actor uses the location
field, so our search might not find an existing inode in the hash table if we
race with the inode init code.
This commit changes things to setup the location field sooner. Also the find actor now
uses only the location objectid to match inodes. For inode hashing, we just
need a unique and stable test, it doesn't have to reflect the inode numbers we
show to userland.
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2:
- No hashval in btrfs_iget_locked()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 91b973f90c upstream.
The code in remove_cache_dir() is supposed to remove the "cache"
subdirectory from the sysfs directory for a CPU when that CPU is
being offlined. It tries to do this by calling kobject_put() on
the kobject for the subdirectory. However, the subdirectory only
gets removed once the last reference goes away, and the reference
being put here may well not be the last reference. That means
that the "cache" subdirectory may still exist when the offlining
operation has finished. If the same CPU subsequently gets onlined,
the code tries to add a new "cache" subdirectory. If the old
subdirectory has not yet been removed, we get a WARN_ON in the
sysfs code, with stack trace, and an error message printed on the
console. Further, we ultimately end up with an online cpu with no
"cache" subdirectory.
This fixes it by doing an explicit kobject_del() at the point where
we want the subdirectory to go away. kobject_del() removes the sysfs
directory even though the object still exists in memory. The object
will get freed at some point in the future. A subsequent onlining
operation can create a new sysfs directory, even if the old object
still exists in memory, without causing any problems.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 49a12877d2 upstream.
There is currently no facility in ACPI to express the hookup of voltage
regulators, the expectation is that the regulators that exist in the
system will be handled transparently by firmware if they need software
control at all. This means that if for some reason the regulator API is
enabled on such a system it should assume that any supplies that devices
need are provided by the system at all relevant times without any software
intervention.
Tell the regulator core to make this assumption by calling
regulator_has_full_constraints(). Do this as soon as we know we are using
ACPI so that the information is available to the regulator core as early
as possible. This will cause the regulator core to pretend that there is
an always on regulator supplying any supply that is requested but that has
not otherwise been mapped which is the behaviour expected on a system with
ACPI.
Should the ability to specify regulators be added in future revisions of
ACPI then once we have support for ACPI mappings in the kernel the same
assumptions will apply. It is also likely that systems will default to a
mode of operation which does not require any interpretation of these
mappings in order to be compatible with existing operating system releases
so it should remain safe to make these assumptions even if the mappings
exist but are not supported by the kernel.
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d024206133 upstream.
Currently, any user can snapshot any subvolume if the path is accessible and
thus indirectly create and keep files he does not own under his direcotries.
This is not possible with traditional directories.
In security context, a user can snapshot root filesystem and pin any
potentially buggy binaries, even if the updates are applied.
All the snapshots are visible to the administrator, so it's possible to
verify if there are suspicious snapshots.
Another more practical problem is that any user can pin the space used
by eg. root and cause ENOSPC.
Original report:
https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
[bwh: Backported to 3.2:
- Adjust context
- Use the same cleanup code for success and error cases, as done
upstream in commit ecd188159e
('switch btrfs_ioctl_snap_create_transid() to fget_light()')]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90515e7f5d upstream.
We may return early in btrfs_drop_snapshot(), we shouldn't
call btrfs_std_err() for this case, fix it.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70713fe315 upstream.
Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss().
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 032f708bc4 upstream.
The locations of SMBus register base address and enablement bit are changed
from AMD ML, which need this patch to be supported.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
[bwh: Backported to 3.2:
- Adjust context
- Aux bus support is not included]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 64e5acb09c upstream.
Use the right function to update frequency value.
If rx skb is probe response or beacon, the wrong frequency value can
cause problem that bss info can't be updated when it should be.
Fixes: 8318d78a44 ("cfg80211 API for channels/bitrates, mac80211
and driver conversion")
Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit aad560b7f6 upstream.
At IO preparation we calculate the max pages at each device and
allocate a BIO per device of that size. The calculation was wrong
on some unaligned corner cases offset/length combination and would
make prepare return with -ENOMEM. This would be bad for pnfs-objects
that would in that case IO through MDS. And fatal for exofs were it
would fail writes with EIO.
Fix it by doing the proper math, that will work in all cases. (I
ran a test with all possible offset/length combinations this time
round).
Also when reading we do not need to allocate for the parity units
since we jump over them.
Also lower the max_io_length to take into account the parity pages
so not to allocate BIOs bigger than PAGE_SIZE
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6e0ea9e6cb upstream.
The GSI QP type is compatible with and should be allowed to send data
to/from any UD QP. This was found when testing ibacm on the same node
as an SA.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 28a625cbc2 upstream.
Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.
Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 08336fd218 upstream.
dma_pte_free_level() has an off-by-one error when checking whether a pte
is completely covered by a range. Take for example the case of
attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first
2M superpage.
The level_size() is 0x200 and we test:
static void dma_pte_free_level(...
...
if (!(0 > 0 || 0x1ff < 0 + 0x200)) {
...
}
Clearly the 2nd test is true, which means we fail to take the branch to
clear and free the pagetable entry. As a result, we're leaking
pagetables and failing to install new pages over the range.
This was found with a PCI device assigned to a QEMU guest using vfio-pci
without a VGA device present. The first 1M of guest address space is
mapped with various combinations of 4K pages, but eventually the range
is entirely freed and replaced with a 2M contiguous mapping.
intel-iommu errors out with something like:
ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083)
In this case 5c2b8003 is the pointer to the previous leaf page that was
neither freed nor cleared and 849c00083 is the superpage entry that
we're trying to replace it with.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d45b964a22 upstream.
Needed to properly flush the read caches for fences.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2:
- Adjust context
- s/\bring\b/rdev/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2b92865e64 upstream.
turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems
that have certain security features enabled by default that make -fPIC
the default, this causes a build error:
turbostat.c: In function ‘check_cpuid’:
turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
^
GCC provides a header cpuid.h, containing a __get_cpuid function that
works with both PIC and non-PIC. (On PIC, it saves and restores ebx
around the cpuid instruction.) Use that instead.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ecd75ad514 upstream.
For some reason, some early WD drives spin up and down drives
erratically when the link is put into slumber mode which can reduce
the life expectancy of the device significantly. Unfortunately, we
don't have full list of devices and given the nature of the issue it'd
be better to err on the side of false positives than the other way
around. Let's disable LPM on all WD devices which match one of the
known problematic model prefixes and are SATA-I.
As horkage list doesn't support matching SATA capabilities, this is
implemented as two horkages - WD_BROKEN_LPM and NOLPM. The former is
set for the known prefixes and sets the latter if the matched device
is SATA-I.
Note that this isn't optimal as this disables all LPM operations and
partial link power state reportedly works fine on these; however, the
way LPM is implemented in libata makes it difficult to precisely map
libata LPM setting to specific link power state. Well, these devices
are already fairly outdated. Let's just disable whole LPM for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Nikos Barkas <levelwol@gmail.com>
Reported-and-tested-by: Ioannis Barkas <risc4all@yahoo.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=57211
[bwh: Backported to 3.2:
- Adjust context
- Use literal 76 instead of ATA_ID_SATA_CAPABILITY]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit da6139e49c upstream.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
When a cpu is downed on a system, the irqs on the cpu are assigned to
other cpus. It is possible, however, that when a cpu is downed there
aren't enough free vectors on the remaining cpus to account for the
vectors from the cpu that is being downed.
This results in an interesting "overflow" condition where irqs are
"assigned" to a CPU but are not handled.
For example, when downing cpus on a 1-64 logical processor system:
<snip>
[ 232.021745] smpboot: CPU 61 is now offline
[ 238.480275] smpboot: CPU 62 is now offline
[ 245.991080] ------------[ cut here ]------------
[ 245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
[ 246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
[ 246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
[ 246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
[ 246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[ 246.057371] 0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
[ 246.065728] ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
[ 246.074073] 0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
[ 246.082430] Call Trace:
[ 246.085174] <IRQ> [<ffffffff8164fbf6>] dump_stack+0x46/0x58
[ 246.091633] [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
[ 246.098352] [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
[ 246.104786] [<ffffffff815710d6>] dev_watchdog+0x246/0x250
[ 246.110923] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.119097] [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
[ 246.125224] [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
[ 246.132137] [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[ 246.140308] [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
[ 246.146933] [<ffffffff81059a80>] __do_softirq+0xe0/0x220
[ 246.152976] [<ffffffff8165fedc>] call_softirq+0x1c/0x30
[ 246.158920] [<ffffffff810045f5>] do_softirq+0x55/0x90
[ 246.164670] [<ffffffff81059d35>] irq_exit+0xa5/0xb0
[ 246.170227] [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
[ 246.177324] [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
[ 246.184041] <EOI> [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
[ 246.191559] [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
[ 246.198374] [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
[ 246.204900] [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
[ 246.210846] [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
[ 246.217371] [<ffffffff81646b47>] rest_init+0x77/0x80
[ 246.223028] [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
[ 246.229165] [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
[ 246.235787] [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
[ 246.242990] [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
[ 246.249610] ---[ end trace fb74fdef54d79039 ]---
[ 246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
[ 246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
Last login: Mon Nov 11 08:35:14 from 10.18.17.119
[root@(none) ~]# [ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[ 246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[ 249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
(last lines keep repeating. ixgbe driver is dead until module reload.)
If the downed cpu has more vectors than are free on the remaining cpus on the
system, it is possible that some vectors are "orphaned" even though they are
assigned to a cpu. In this case, since the ixgbe driver had a watchdog, the
watchdog fired and notified that something was wrong.
This patch adds a function, check_vectors(), to compare the number of vectors
on the CPU going down and compares it to the number of vectors available on
the system. If there aren't enough vectors for the CPU to go down, an
error is returned and propogated back to userspace.
v2: Do not need to look at percpu irqs
v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
Priority Mode
v4: Additional changes suggested by Gong Chen.
v5/v6/v7/v8: Updated comment text
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com
Reviewed-by: Gong Chen <gong.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Janet Morgan <janet.morgan@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ruiv Wang <ruiv.wang@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9f97e4b128 upstream.
Before a write starts we set a bit in the write-intent bitmap.
When the write completes we clear that bit if the write was successful
to all devices. However if the write wasn't fully successful we
should not clear the bit. If the faulty drive is subsequently
re-added, the fact that the bit is still set ensure that we will
re-write the data that is missing.
This logic is mediated by the STRIPE_DEGRADED flag - we only clear the
bitmap bit when this flag is not set.
Currently we correctly set the flag if a write starts when some
devices are failed or missing. But we do *not* set the flag if some
device failed during the write attempt.
This is wrong and can result in clearing the bit inappropriately.
So: set the flag when a write fails.
This bug has been present since bitmaps were introduces, so the fix is
suitable for any -stable kernel.
Reported-by: Ethan Wilson <ethan.wilson@shiftmail.org>
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9ed96e87c5 upstream.
Limit PIT timer frequency similarly to the limit applied by
LAPIC timer.
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- s/ps->period/pt->period/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2995fa78e4 upstream.
This reverts commit be35f48610 ("dm: wait until embedded kobject is
released before destroying a device") and provides an improved fix.
The kobject release code that calls the completion must be placed in a
non-module file, otherwise there is a module unload race (if the process
calling dm_kobject_release is preempted and the DM module unloaded after
the completion is triggered, but before dm_kobject_release returns).
To fix this race, this patch moves the completion code to dm-builtin.c
which is always compiled directly into the kernel if BLK_DEV_DM is
selected.
The patch introduces a new dm_kobject_holder structure, its purpose is
to keep the completion and kobject in one place, so that it can be
accessed from non-module code without the need to export the layout of
struct mapped_device to that code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Remove paranoid check of container_of() result in dm_get_from_kobject(),
which would now be incorrect
- Include <linux/export.h> in dm-builtin.c, included indirectly upstream]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit feffe09f51 upstream.
According to Freescale imx28 Errata, "ENGR119653 USB: ARM to USB
register error issue", All USB register write operations must
use the ARM SWP instruction. So, we implement a special ehci_write
for imx28.
Discussion for it at below:
http://marc.info/?l=linux-usb&m=137996395529294&w=2
Without this patcheset, imx28 works unstable at high AHB bus loading.
If the bus loading is not high, the imx28 usb can work well at the most
of time. There is a IC errata for this problem, usually, we consider
IC errata is a problem not a new feature, and this workaround is needed
for that, so we need to add them to stable tree 3.11+.
Cc: robert.hodaszi@digi.com
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78b19bae08 upstream.
Don't check for -NFS4ERR_NOTSUPP, it's already been mapped to -ENOTSUPP
by nfs4_stat_to_errno.
This allows the client to mount v4.1 servers that don't support
SECINFO_NO_NAME by falling back to the "guess and check" method of
nfs4_find_root_sec.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4c35ed241 upstream.
The synchronization needed after ftrace_ops are unregistered must happen
after the callback is disabled from becing called by functions.
The current location happens after the function is being removed from the
internal lists, but not after the function callbacks were disabled, leaving
the functions susceptible of being called after their callbacks are freed.
This affects perf and any externel users of function tracing (LTTng and
SystemTap).
Fixes: cdbe61bfe7 "ftrace: Allow dynamically allocated function tracers"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: drop change for control ops]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
commit 7614c3dc74 upstream.
The function tracer uses preempt_disable/enable_notrace() for
synchronization between reading registered ftrace_ops and unregistering
them.
Most of the ftrace_ops are global permanent structures that do not
require this synchronization. That is, ops may be added and removed from
the hlist but are never freed, and wont hurt if a synchronization is
missed.
But this is not true for dynamically created ftrace_ops or control_ops,
which are used by the perf function tracing.
The problem here is that the function tracer can be used to trace
kernel/user context switches as well as going to and from idle.
Basically, it can be used to trace blind spots of the RCU subsystem.
This means that even though preempt_disable() is done, a
synchronize_sched() will ignore CPUs that haven't made it out of user
space or idle. These can include functions that are being traced just
before entering or exiting the kernel sections.
To implement the RCU synchronization, instead of using
synchronize_sched() the use of schedule_on_each_cpu() is performed. This
means that when a dynamically allocated ftrace_ops, or a control ops is
being unregistered, all CPUs must be touched and execute a ftrace_sync()
stub function via the work queues. This will rip CPUs out from idle or
in dynamic tick mode. This only happens when a user disables perf
function tracing or other dynamically allocated function tracers, but it
allows us to continue to debug RCU and context tracking with function
tracing.
Link: http://lkml.kernel.org/r/1369785676.15552.55.camel@gandalf.local.home
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: drop change for control ops]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66b512eda7 upstream.
With some SDIO devices, timeout errors can happen when reading data.
To solve this issue, the DMA transfer has to be activated before sending
the command to the device. This order is incorrect in PDC mode. So we
have to take care if we are using DMA or PDC to know when to send the
MMC command.
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 80ab8eae70 upstream.
The PCI devices with DMA masks smaller than 32bit should enable
CONFIG_ZONE_DMA. Since the recent change of page allocator, page
allocations via dma_alloc_coherent() with the limited DMA mask bits
may fail more frequently, ended up with no available buffers, when
CONFIG_ZONE_DMA isn't enabled. With CONFIG_ZONE_DMA, the system has
much more chance to obtain such pages.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68221
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 370169516e upstream.
It's never allocated on systems without an ATOMBIOS or COMBIOS ROM.
Should fix an oops I encountered while resetting the GPU after a lockup
on my PowerBook with an RV350.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 12c91a5c2d upstream.
When extending a low level space map we should update nr_blocks at
the start so the new space is used for the index entries.
Otherwise extend can fail, e.g.: sm_metadata_extend call sequence
that fails:
-> sm_ll_extend
-> dm_tm_new_block -> dm_sm_new_block -> sm_bootstrap_new_block
=> returns -ENOSPC because smm->begin == smm->ll.nr_blocks
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit be35f48610 upstream.
There may be other parts of the kernel holding a reference on the dm
kobject. We must wait until all references are dropped before
deallocating the mapped_device structure.
The dm_kobject_release method signals that all references are dropped
via completion. But dm_kobject_release doesn't free the kobject (which
is embedded in the mapped_device structure).
This is the sequence of operations:
* when destroying a DM device, call kobject_put from dm_sysfs_exit
* wait until all users stop using the kobject, when it happens the
release method is called
* the release method signals the completion and should return without
delay
* the dm device removal code that waits on the completion continues
* the dm device removal code drops the dm_mod reference the device had
* the dm device removal code frees the mapped_device structure that
contains the kobject
Using kobject this way should avoid the module unload race that was
mentioned at the beginning of this thread:
https://lkml.org/lkml/2014/1/4/83
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3685f19e07 upstream.
Tegra chips have 4 or 5 identical UART modules embedded. UARTs C..E have
their MODEM-control signals tied off to a static state. However UARTs A
and B can optionally route those signals to/from package pins, depending
on the exact pinmux configuration.
When these signals are not routed to package pins, false interrupts may
trigger either temporarily, or permanently, all while not showing up in
the IIR; it will read as NO_INT. This will eventually lead to the UART
IRQ being disabled due to unhandled interrupts. When this happens, the
kernel may print e.g.:
irq 68: nobody cared (try booting with the "irqpoll" option)
In order to prevent this, enable UART_BUG_NOMSR. This prevents
UART_IER_MSI from being enabled, which prevents the false interrupts
from triggering.
In practice, this is not needed under any of the following conditions:
* On Tegra chips after Tegra30, since the HW bug has apparently been
fixed.
* On UARTs C..E since their MODEM control signals are tied to the correct
static state which doesn't trigger the issue.
* On UARTs A..B if the MODEM control signals are routed out to package
pins, since they will then carry valid signals.
However, we ignore these exceptions for now, since they are only relevant
if a board actually hooks up more than a 4-wire UART, and no currently
supported board does this. If we ever support a board that does, we can
refine the algorithm that enables UART_BUG_NOMSR to take those exceptions
into account, and/or read a flag from DT/... that indicates that the
board has hooked up and pinmux'd more than a 4-wire UART.
Reported-by: Olof Johansson <olof@lixom.net> # autotester
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust filename
- s/port->/up->port./]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c1f15196ac upstream.
Genuine FTDI chips support only CS7/8. A previous fix in commit
8704211f65 ("USB: ftdi_sio: fixed handling of unsupported CSIZE
setting") enforced this limitation and reported it back to userspace.
However, certain types of smartcard readers depend on specific
driver behaviour that requests 0 data bits (not 5) to change into a
different operating mode if CS5 has been set.
This patch reenables this behaviour for all FTDI devices.
Tagged to be added to stable, because it affects a lot of users of
embedded systems which rely on these readers to work properly.
Reported-by: Heinrich Siebmanns <H.Siebmanns@t-online.de>
Tested-by: Heinrich Siebmanns <H.Siebmanns@t-online.de>
Signed-off-by: Colin Leitner <colin.leitner@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: s/ddev/\&port->dev/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d195178297 upstream.
The hw i2c engines are disabled by default as the
current implementation is still experimental. Print
a warning when users enable it so that it's obvious
when the option is enabled.
v2: check for non-0 rather than 1
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8ed8146028 upstream.
Hello.
I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .
[ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
Below is a patch, but I don't know whether we need special handing for undoing
ebitmap_set_bit() call.
----------
>>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Mon, 6 Jan 2014 16:30:21 +0900
Subject: [PATCH] SELinux: Fix memory leak upon loading policy
Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
check return value from hashtab_insert() in filename_trans_read(). It leaks
memory if hashtab_insert() returns error.
unreferenced object 0xffff88005c9160d0 (size 8):
comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
hex dump (first 8 bytes):
57 0b 00 00 6b 6b 6b a5 W...kkk.
backtrace:
[<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360
[<ffffffff812aec5d>] policydb_read+0xd1d/0xf70
[<ffffffff812b345c>] security_load_policy+0x6c/0x500
[<ffffffff812a623c>] sel_write_load+0xac/0x750
[<ffffffff811eb680>] vfs_write+0xc0/0x1f0
[<ffffffff811ec08c>] SyS_write+0x4c/0xa0
[<ffffffff81690419>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
However, we should not return EEXIST error to the caller, or the systemd will
show below message and the boot sequence freezes.
systemd[1]: Failed to load SELinux policy. Freezing.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 770bd4bf2e upstream.
The lack of comma leads to the wrong channel for an SPDIF channel.
Unfortunately this wasn't caught by compiler because it's still a
valid expression.
Reported-by: Alexander Aristov <aristov.alexander@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 440ebadeae upstream.
Fix ring-indicator (RI) status-bit definition, which was defined as CTS,
effectively preventing RI-changes from being detected while reporting
false RI status.
This bug predates git.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 623c826337 upstream.
Some PL2303 devices are known to lose bytes if you change serial
settings even to the same values as before. Avoid this by comparing the
encoded settings with the previsouly used ones before configuring the
device.
The common case was fixed by commit bf5e5834bf ("pl2303: Fix mode
switching regression"), but this problem was still possible to trigger,
for instance, by using the TCSETS2-interface to repeatedly request
115201 baud, which gets mapped to 115200 and thus always triggers a
settings update.
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context; use dbg() instead of dev_dbg()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8afb1474db upstream.
/sys/kernel/slab/:t-0000048 # cat cpu_slabs
231 N0=16 N1=215
/sys/kernel/slab/:t-0000048 # cat slabs
145 N0=36 N1=109
See, the number of slabs is smaller than that of cpu slabs.
The bug was introduced by commit 49e2258586
("slub: per cpu cache for partial pages").
We should use page->pages instead of page->pobjects when calculating
the number of cpu partial slabs. This also fixes the mapping of slabs
and nodes.
As there's no variable storing the number of total/active objects in
cpu partial slabs, and we don't have user interfaces requiring those
statistics, I just add WARN_ON for those cases.
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a7f84f03f6 upstream.
Current code check boot service region with kernel text region by:
start+size >= __pa_symbol(_text)
The end of the above region should be start + size - 1 instead.
I see this problem in ovmf + Fedora 19 grub boot:
text start: 1000000 md start: 800000 md size: 800000
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[bwh: Backported to 3.2: s/__pa_symbol/virt_to_phys/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5ac64ba12a upstream.
As the dvb-frontend kthread can be called anytime, it can race
with some get status ioctl. So, it seems better to avoid one to
race with the other while reading a 32 bits register.
I can't see any other reason for having a mutex there at I2C, except
to provide such kind of protection, as the I2C core already has a
mutex to protect I2C transfers.
Note: instead of this approach, it could eventually remove the dib8000
specific mutex for it, and either group the 4 ops into one xfer or
to manually control the I2C mutex. The main advantage of the current
approach is that the changes are smaller and more puntual.
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Patrick Boettcher <pboettcher@kernellabs.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dcaf9aed99 upstream.
Bfa driver crash is observed while pushing the firmware on to chinook
quad port card due to uninitialized bfi_image_ct2 access which gets
initialized only for CT2 ASIC based cards after request_firmware().
For quard port chinook (CT2 ASIC based), bfi_image_ct2 is not getting
initialized as there is no check for chinook PCI device ID before
request_firmware and instead bfi_image_cb is initialized as it is the
default case for card type check.
This patch includes changes to read the right firmware for quad port chinook.
Signed-off-by: Vijaya Mohan Guvva <vmohan@brocade.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f248dae13 upstream.
byBBPreEDIndex value is initially 0, this means that from
cold BBvUpdatePreEDThreshold is never set.
This means that sensitivity may be in an ambiguous state,
failing to scan any wireless points or at least distant ones.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d6a484520c upstream.
In commit 85747f ("PATCH] parport: add NetMOS 9805 support") Max added
the PCI ID for NetMOS 9805 based on a Debian bug report from 2k4 which
was at the v2.4.26 time frame. The patch made into 2.6.14.
Shortly before that patch akpm merged commit 296d3c783b ("[PATCH] Support
NetMOS based PCI cards providing serial and parallel ports") which made
into v2.6.9-rc1.
Now we have two different entries for the same PCI id.
I have here the NetMos 9805 which claims to support SPP/EPP/ECP mode.
This patch takes Max's entry for titan_1284p1 (base != -1 specifies the
ioport for ECP mode) and replaces akpm's entry for netmos_9805 which
specified -1 (=none). Both share the same PCI-ID (my card has subsystem
0x1000 / 0x0020 so it should match PCI_ANY).
While here I also drop the entry for titan_1284p2 which is the same as
netmos_9815.
Cc: Maximilian Attems <maks@stro.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5c6c26813a upstream.
Due to difficulty in arriving at the proper security label for
TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
while/before they are undergoing XFRM transforms instead of waiting
until afterwards so that we can determine the correct security label.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
[bwh: Backported to 3.2:
s/selinux_peerlbl_enabled()/netlbl_enabled() || selinux_xfrm_enabled()/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 57d2aa00dc upstream.
The issue below was found in 2.6.34-rt rather than mainline rt
kernel, but the issue still exists upstream as well.
So please let me describe how it was noticed on 2.6.34-rt:
On this version, each softirq has its own thread, it means there
is at least one RT FIFO task per cpu. The priority of these
tasks is set to 49 by default. If user launches an RT FIFO task
with priority lower than 49 of softirq RT tasks, it's possible
there are two RT FIFO tasks enqueued one cpu runqueue at one
moment. By current strategy of balancing RT tasks, when it comes
to RT tasks, we really need to put them off to a CPU that they
can run on as soon as possible. Even if it means a bit of cache
line flushing, we want RT tasks to be run with the least latency.
When the user RT FIFO task which just launched before is
running, the sched timer tick of the current cpu happens. In this
tick period, the timeout value of the user RT task will be
updated once. Subsequently, we try to wake up one softirq RT
task on its local cpu. As the priority of current user RT task
is lower than the softirq RT task, the current task will be
preempted by the higher priority softirq RT task. Before
preemption, we check to see if current can readily move to a
different cpu. If so, we will reschedule to allow the RT push logic
to try to move current somewhere else. Whenever the woken
softirq RT task runs, it first tries to migrate the user FIFO RT
task over to a cpu that is running a task of lesser priority. If
migration is done, it will send a reschedule request to the found
cpu by IPI interrupt. Once the target cpu responds the IPI
interrupt, it will pick the migrated user RT task to preempt its
current task. When the user RT task is running on the new cpu,
the sched timer tick of the cpu fires. So it will tick the user
RT task again. This also means the RT task timeout value will be
updated again. As the migration may be done in one tick period,
it means the user RT task timeout value will be updated twice
within one tick.
If we set a limit on the amount of cpu time for the user RT task
by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
upon reaching the soft limit.
But exactly when the SIGXCPU signal should be sent depends on the
RT task timeout value. In fact the timeout mechanism of sending
the SIGXCPU signal assumes the RT task timeout is increased once
every tick.
However, currently the timeout value may be added twice per
tick. So it results in the SIGXCPU signal being sent earlier
than expected.
To solve this issue, we prevent the timeout value from increasing
twice within one tick time by remembering the jiffies value of
last updating the timeout. As long as the RT task's jiffies is
different with the global jiffies value, we allow its timeout to
be updated.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan <lizefan@huawei.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4c96ae319 upstream.
migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.
Instead unthrottle rt runqueues before migrating tasks.
Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ lizf: backported to 3.4: adjust context ]
Signed-off-by: Li Zefan <lizefan@huawei.com>
[bwh: Backported to 3.2:
- Adjust filenames
- unthrottle_offline_cfs_rqs() is already static, but defined in sched.c
after including sched_fair.c, so add forward declaration
- unthrottle_offline_cfs_rqs() also needs to be defined for all CONFIG_SMP
configurations now]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 454c79999f upstream.
task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq. However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup. In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice. If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.
Modify task_tick_rt() to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27c73ae759 upstream.
Commit 7cb2ef56e6 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page->private field.
Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.
The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.
A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).
By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:
echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
BUG: Bad page state in process bash pfn:59a01
page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0
page flags: 0x1c00000000008000(tail)
Modules linked in:
CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x55/0x76
bad_page+0xd5/0x130
free_pages_prepare+0x213/0x280
__free_pages+0x36/0x80
update_and_free_page+0xc1/0xd0
free_pool_huge_page+0xc2/0xe0
set_max_huge_pages.part.58+0x14c/0x220
nr_hugepages_store_common.isra.60+0xd0/0xf0
nr_hugepages_store+0x13/0x20
kobj_attr_store+0xf/0x20
sysfs_write_file+0x189/0x1e0
vfs_write+0xc5/0x1f0
SyS_write+0x55/0xb0
system_call_fastpath+0x16/0x1b
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Khalid Aziz: Backported to 3.4]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7cb2ef56e6 upstream.
I am working with a tool that simulates oracle database I/O workload.
This tool (orion to be specific -
<http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then
does aio into these pages from flash disks using various common block
sizes used by database. I am looking at performance with two of the most
common block sizes - 1M and 64K. aio performance with these two block
sizes plunged after Transparent HugePages was introduced in the kernel.
Here are performance numbers:
pre-THP 2.6.39 3.11-rc5
1M read 8384 MB/s 5629 MB/s 6501 MB/s
64K read 7867 MB/s 4576 MB/s 4251 MB/s
I have narrowed the performance impact down to the overheads introduced by
THP in __get_page_tail() and put_compound_page() routines. perf top shows
>40% of cycles being spent in these two routines. Every time direct I/O
to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
the pages and calls put_page() when I/O completes to put the reference
away. THP introduced significant amount of locking overhead to get_page()
and put_page() when dealing with compound pages because hugepages can be
split underneath get_page() and put_page(). It added this overhead
irrespective of whether it is dealing with hugetlbfs pages or transparent
hugepages. This resulted in 20%-45% drop in aio performance when using
hugetlbfs pages.
Since hugetlbfs pages can not be split, there is no reason to go through
all the locking overhead for these pages from what I can see. I added
code to __get_page_tail() and put_compound_page() to bypass all the
locking code when working with hugetlbfs pages. This improved performance
significantly. Performance numbers with this patch:
pre-THP 3.11-rc5 3.11-rc5 + Patch
1M read 8384 MB/s 6501 MB/s 8371 MB/s
64K read 7867 MB/s 4251 MB/s 6510 MB/s
Performance with 64K read is still lower than what it was before THP, but
still a 53% improvement. It does mean there is more work to be done but I
will take a 53% improvement for now.
Please take a look at the following patch and let me know if it looks
reasonable.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bee09ed91c upstream.
On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:
Enabling non-boot CPUs ...
smpboot: Booting Node 0 Processor 1 APIC 0x1
[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
perf: IBS APIC setup failed on cpu #1
process: Switch to broadcast mode on CPU1
CPU1 is up
...
ACPI: Waking up from system sleep state S3
Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.
The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.
This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.
Marking it as stable to let distros pick up this fix.
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 70f2fe3a26 upstream.
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.
The problem shows itself with the following kernel log message:
nilfs_sufile_do_cancel_free: segment 6533 must be clean
Usually a few hours later the file system gets corrupted:
NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)
The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.
This is what happens:
1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
including the newly allocated segment, which will contain active
data and can be allocated at a later time
10. A few hours later another segment construction allocates the
segment and causes file system corruption
This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3f9aec7610 upstream.
When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.
Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e8b8491585 upstream.
commit e875ecea26
md/raid10 record bad blocks as needed during recovery.
added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.
So move the freeing of r10bio down where it is safe.
Fixes: e875ecea26
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b50c259e25 upstream.
If we discover a bad block when reading we split the request and
potentially read some of it from a different device.
The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
seen by comparison with raid1.c
This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.
Fixes: 856e08e237
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1cc03eb932 upstream.
commit 5d8c71f9e5
md: raid5 crash during degradation
Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.
commit 14a75d3e07
md/raid5: preferentially read from replacement device if possible.
Fixed the same bug if a more effective way, so we can now revert
the original commit.
Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Fixes: 5d8c71f9e5
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3dc91d4338 upstream.
While running stress tests on adding and deleting ftrace instances I hit
this bug:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_inode_permission+0x85/0x160
PGD 63681067 PUD 7ddbe067 PMD 0
Oops: 0000 [#1] PREEMPT
CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
Call Trace:
security_inode_permission+0x1c/0x30
__inode_permission+0x41/0xa0
inode_permission+0x18/0x50
link_path_walk+0x66/0x920
path_openat+0xa6/0x6c0
do_filp_open+0x43/0xa0
do_sys_open+0x146/0x240
SyS_open+0x1e/0x20
system_call_fastpath+0x16/0x1b
Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
RIP selinux_inode_permission+0x85/0x160
CR2: 0000000000000020
Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.
in selinux_inode_permission():
isec = inode->i_security;
rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);
Note, the crash came from stressing the deletion and reading of debugfs
files. I was not able to recreate this via normal files. But I'm not
sure they are safe. It may just be that the race window is much harder
to hit.
What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().
The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct. Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here. (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).
Note, this is a hack, but it fixes the problem at hand. A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback. But that is a major job to do, and requires a
lot of work. For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.
Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 29c350bf28 upstream.
The array was missing the final entry for the undefined instruction
exception handler; this commit adds it.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e098f5cbe9 upstream.
This patch adds support for the PCI ID provided by the Marvell 88SE9170
SATA controller.
Signed-off-by: Simon Guinot <sguinot@lacie.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
This was added as part of commit 3d567e0e29 ('tg3: Set 10_100_ONLY
flag for additional 10/100 Mbps devices') upstream and is needed by
the following patch to ahci.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fcce9a35f8 upstream.
A third possible PCI ID, as personally observed, and found in the
pci.ids list.
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 90ff5d688e upstream.
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel. If it's not valid, we die but
with a nice oops message.
Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative. Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.
This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.
Kudos to Paulus for finding this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4ff859fe1d upstream.
The clockevents code was being told that the footbridge clock event
device ticks at 16x the rate which it actually does. This leads to
timekeeping problems since it allows the clocksource to wrap before
the kernel notices. Fix this by using the correct clock.
Fixes: 4e8d76373c ("ARM: footbridge: convert to clockevents/clocksource")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: fold in the relevant parts of commit 838a2ae80a
('ARM: use clockevents_config_and_register() where possible')]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c0c1439541 upstream.
selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
this looks confusing and triggers the "suspicious RCU usage"
warning because ptrace_parent() does rcu_dereference_check().
And in theory this is wrong, spin_lock()->preempt_disable()
doesn't necessarily imply rcu_read_lock() we need to access
the ->parent.
Reported-by: Evan McNabb <emcnabb@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46d01d6322 upstream.
Fix a broken networking check. Return an error if peer recv fails. If
secmark is active and the packet recv succeeds the peer recv error is
ignored.
Signed-off-by: Chad Hanson <chanson@trustedcs.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f5a44db5d2 upstream.
The missing casts can cause the high 64-bits of the physical blocks to
be lost. Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.
Thanks to the Emese Revfy and the PaX security team for reporting this
issue.
Reported-by: PaX Team <pageexec@freemail.hu>
Reported-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2:
- Adjust context
- Drop inapplicable change to ext4_ext_rm_leaf()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4263c86dca upstream.
Certain dm962x revisions contain an bug, where if a USB bulk transfer retry
(E.G. if bulk crc mismatch) happens right after a transfer with odd or
maxpacket length, the internal tx hardware fifo gets out of sync causing
the interface to stop working.
Work around it by adding up to 3 bytes of padding to ensure this situation
cannot trigger.
This workaround also means we never pass multiple-of-maxpacket size skb's
to usbnet, so the length adjustment to handle usbnet's padding of those can
be removed.
Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 407900cfb5 upstream.
dm9620/dm9621a require room for 4 byte padding even in dm9601 (3 byte
header) mode.
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7787380336 upstream.
net_dma can cause data to be copied to a stale mapping if a
copy-on-write fault occurs during dma. The application sees missing
data.
The following trace is triggered by modifying the kernel to WARN if it
ever triggers copy-on-write on a page that is undergoing dma:
WARNING: CPU: 24 PID: 2529 at lib/dma-debug.c:485 debug_dma_assert_idle+0xd2/0x120()
ioatdma 0000:00:04.0: DMA-API: cpu touching an active dma mapped page [pfn=0x16bcd9]
Modules linked in: iTCO_wdt iTCO_vendor_support ioatdma lpc_ich pcspkr dca
CPU: 24 PID: 2529 Comm: linbug Tainted: G W 3.13.0-rc1+ #353
00000000000001e5 ffff88016f45f688 ffffffff81751041 ffff88017ab0ef70
ffff88016f45f6d8 ffff88016f45f6c8 ffffffff8104ed9c ffffffff810f3646
ffff8801768f4840 0000000000000282 ffff88016f6cca10 00007fa2bb699349
Call Trace:
[<ffffffff81751041>] dump_stack+0x46/0x58
[<ffffffff8104ed9c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff810f3646>] ? ftrace_pid_func+0x26/0x30
[<ffffffff8104ee86>] warn_slowpath_fmt+0x46/0x50
[<ffffffff8139c062>] debug_dma_assert_idle+0xd2/0x120
[<ffffffff81154a40>] do_wp_page+0xd0/0x790
[<ffffffff811582ac>] handle_mm_fault+0x51c/0xde0
[<ffffffff813830b9>] ? copy_user_enhanced_fast_string+0x9/0x20
[<ffffffff8175fc2c>] __do_page_fault+0x19c/0x530
[<ffffffff8175c196>] ? _raw_spin_lock_bh+0x16/0x40
[<ffffffff810f3539>] ? trace_clock_local+0x9/0x10
[<ffffffff810fa1f4>] ? rb_reserve_next_event+0x64/0x310
[<ffffffffa0014c00>] ? ioat2_dma_prep_memcpy_lock+0x60/0x130 [ioatdma]
[<ffffffff8175ffce>] do_page_fault+0xe/0x10
[<ffffffff8175c862>] page_fault+0x22/0x30
[<ffffffff81643991>] ? __kfree_skb+0x51/0xd0
[<ffffffff813830b9>] ? copy_user_enhanced_fast_string+0x9/0x20
[<ffffffff81388ea2>] ? memcpy_toiovec+0x52/0xa0
[<ffffffff8164770f>] skb_copy_datagram_iovec+0x5f/0x2a0
[<ffffffff8169d0f4>] tcp_rcv_established+0x674/0x7f0
[<ffffffff816a68c5>] tcp_v4_do_rcv+0x2e5/0x4a0
[..]
---[ end trace e30e3b01191b7617 ]---
Mapped at:
[<ffffffff8139c169>] debug_dma_map_page+0xb9/0x160
[<ffffffff8142bf47>] dma_async_memcpy_pg_to_pg+0x127/0x210
[<ffffffff8142cce9>] dma_memcpy_pg_to_iovec+0x119/0x1f0
[<ffffffff81669d3c>] dma_skb_copy_datagram_iovec+0x11c/0x2b0
[<ffffffff8169d1ca>] tcp_rcv_established+0x74a/0x7f0:
...the problem is that the receive path falls back to cpu-copy in
several locations and this trace is just one of the areas. A few
options were considered to fix this:
1/ sync all dma whenever a cpu copy branch is taken
2/ modify the page fault handler to hold off while dma is in-flight
Option 1 adds yet more cpu overhead to an "offload" that struggles to compete
with cpu-copy. Option 2 adds checks for behavior that is already documented as
broken when using get_user_pages(). At a minimum a debug mode is warranted to
catch and flag these violations of the dma-api vs get_user_pages().
Thanks to David for his reproducer.
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: David Whipple <whipple@securedatainnovations.ch>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f447ef4a56 upstream.
If a user calls 'cpupower set --perf-bias 15', the process will end with
a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi
call. This is because the getopt_long structure currently has all of
the options as having an optional_argument when they really have a
required argument. We change the structure to use required_argument to
match the short options and it resolves the issue.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 73f0b56a1f upstream.
This patch adds a driver workaround for a HW issue.
A race condition in the HW results in missing interrupts,
which can be avoided by a read/write with the ISR register.
All chips in the AR9002 series are affected by this bug - AR9003
and above do not have this problem.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a885b3ccc7 upstream.
The GMCH_CTRL register (or MGCC in the spec) is at a different address
on Sandybridge, and the address to which we currently write to is
undefined. These stray writes appear to upset (hard hang) my Ivybridge
machine whilst it is in UEFI mode.
Note that the register is still marked as locked RO on Sandybridge, so
vgaarb is still dysfunctional.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2: add definition of SNB_GMCH_CTRL in i915_reg.h]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ed697e1aaf upstream.
When the process is sleeping at the SNDRV_PCM_STATE_PAUSED
state from the wait_for_avail function, the sleep process will be woken by
timeout(10 seconds). Even if the sleep process wake up by timeout, by this
patch, the process will continue with sleep and wait for the other state.
Signed-off-by: JongHo Kim <furmuwon@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 757dfcaa41 upstream.
This patch touches the RT group scheduling case.
Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.
The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq. The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.
The patch below fixes the problem.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
CC: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d386735588 upstream.
VMAs covering a bo but that didn't start at the same address space offset as
the bo they were mapping were incorrectly generating SEGFAULT errors in
the fault handler.
Reported-by: Joseph Dolinak <kanilo2@yahoo.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
[bwh: Backported to 3.2: drm_vma_node_start() is open-coded;
vma_pages() was open-coded]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c4602c1c81 upstream.
Ftrace currently initializes only the online CPUs. This implementation has
two problems:
- If we online a CPU after we enable the function profile, and then run the
test, we will lose the trace information on that CPU.
Steps to reproduce:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# cd <debugfs>/tracing/
# echo <some function name> >> set_ftrace_filter
# echo 1 > function_profile_enabled
# echo 1 > /sys/devices/system/cpu/cpu1/online
# run test
- If we offline a CPU before we enable the function profile, we will not clear
the trace information when we enable the function profile. It will trouble
the users.
Steps to reproduce:
# cd <debugfs>/tracing/
# echo <some function name> >> set_ftrace_filter
# echo 1 > function_profile_enabled
# run test
# cat trace_stat/function*
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 0 > function_profile_enabled
# echo 1 > function_profile_enabled
# cat trace_stat/function*
# run test
# cat trace_stat/function*
So it is better that we initialize the ftrace profiler for each possible cpu
every time we enable the function profile instead of just the online ones.
Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit bd02cd2549 upstream.
Evan Huus found (by fuzzing in wireshark) that the radiotap
iterator code can access beyond the length of the buffer if
the first bitmap claims an extension but then there's no
data at all. Fix this.
Reported-by: Evan Huus <eapache@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4cc629b7a2 upstream.
We should be writing bits here but instead we're writing the
numbers that correspond to the bits we want to write. Fix it by
wrapping the numbers in the BIT() macro. This fixes gpios acting
as interrupts.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 693e0cb052 upstream.
While enabling these machines, we found we would sometimes lose an
interrupt if we change hardware volume during playback, and that
disabling msi fixed this issue. (Losing the interrupt caused underruns
and crackling audio, as the one second timeout is usually bigger than
the period size.)
The machines were all machines from HP, running AMD Hudson controller,
and Realtek ALC282 codec.
BugLink: https://bugs.launchpad.net/bugs/1260225
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8333f0fe13 upstream.
Some RS690 boards with 64MB of sideport memory show up as
having 128MB sideport + 256MB of UMA. In this case,
just skip the sideport memory and use UMA. This fixes
rendering corruption and should improve performance.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=35457
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4454b66cb6 upstream.
This patch changes special case handling for ISCSI_OP_SCSI_CMD
where an initiator sends a zero length Expected Data Transfer
Length (EDTL), but still sets the WRITE and/or READ flag bits
when no payload transfer is requested.
Many, many moons ago two special cases where added for an ancient
version of ESX that has long since been fixed, so instead of adding
a new special case for the reported bug with a Broadcom 57800 NIC,
go ahead and always strip off the incorrect WRITE + READ flag bits.
Also, avoid sending a reject here, as RFC-3720 does mandate this
case be handled without protocol error.
Reported-by: Witold Bazakbal <865perl@wp.pl>
Tested-by: Witold Bazakbal <865perl@wp.pl>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6962d914f3 upstream.
We've got regression reports that my previous fix for spurious wakeups
after S5 on HP Haswell machines leads to the automatic reboot at
shutdown on some machines. It turned out that the fix for one side
triggers another BIOS bug in other side. So, it's exclusive.
Since the original S5 wakeups have been confirmed only on HP machines,
it'd be safer to apply it only to limited machines. As a wild guess,
limiting to machines with HP PCI SSID should suffice.
This patch should be backported to kernels as old as 3.12, that
contain the commit 638298dc66 "xhci: Fix
spurious wakeups after S5 on Haswell".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: <dashing.meng@gmail.com>
Reported-by: Niklas Schnelle <niklas@komani.de>
Reported-by: Giorgos <ganastasiouGR@gmail.com>
Reported-by: <art1@vhex.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9105bb149b upstream.
That thing should be del_timer_sync(); consider what happens
if ext4_put_super() call of del_timer() happens to come just as it's
getting run on another CPU. Since that timer reschedules itself
to run next day, you are pretty much guaranteed that you'll end up
with kfree'd scheduled timer, with usual fun consequences. AFAICS,
that's -stable fodder all way back to 2010... [the second del_timer_sync()
is almost certainly not needed, but it doesn't hurt either]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit df4e7ac0bb upstream.
ext2_quota_write() doesn't properly setup bh it passes to
ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
ext2_get_blocks() (or we could actually ask for mapping arbitrary number
of blocks depending on whatever value was on stack).
Fix ext2_quota_write() to properly fill in number of blocks to map.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5946d08937 upstream.
A corrupted ext4 may have out of order leaf extents, i.e.
extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
^^^^ overlap with previous extent
Reading such extent could hit BUG_ON() in ext4_es_cache_extent().
BUG_ON(end < lblk);
The problem is that __read_extent_tree_block() tries to cache holes as
well but assumes 'lblk' is greater than 'prev' and passes underflowed
length to ext4_es_cache_extent(). Fix it by checking for overlapping
extents in ext4_valid_extent_entries().
I hit this when fuzz testing ext4, and am able to reproduce it by
modifying the on-disk extent by hand.
Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
make sure the value is not overflow.
Ran xfstests on patched ext4 and no regression.
Cc: Lukáš Czerner <lczerner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae1495b12d upstream.
While it's true that errors can only happen if there is a bug in
jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
the kernel or remount the file system read-only in order to avoid
further data loss. The ext4_journal_abort_handle() function doesn't
do any of this, and while it's likely that this call (since it doesn't
adjust refcounts) will likely result in the file system eventually
deadlocking since the current transaction will never be able to close,
it's much cleaner to call let ext4's error handling system deal with
this situation.
There's a separate bug here which is that if certain jbd2 errors
errors occur and file system is mounted errors=continue, the file
system will probably eventually end grind to a halt as described
above. But things have been this way in a long time, and usually when
we have these sorts of errors it's pretty much a disaster --- and
that's why the jbd2 layer aggressively retries memory allocations,
which is the most likely cause of these jbd2 errors.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
[bwh: Backported to 3.2: drop logging of missing transaction debug data]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 87809942d3 upstream.
We've received multiple reports in Fedora via (BZ 907193)
that the Seagate Momentus SpinPoint M8 errors out when enabling AA:
[ 2.555905] ata2.00: failed to enable AA (error_mask=0x1)
[ 2.568482] ata2.00: failed to enable AA (error_mask=0x1)
Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk.
Reported-by: Nicholas <arealityfarbetween@googlemail.com>
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Tested-by: Nicholas <arealityfarbetween@googlemail.com>
Acked-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 84ed8a9905 upstream.
E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m:
ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined!
For "lib-y", if no symbols in a compilation unit are referenced by other
units, the compilation unit will not be included in vmlinux. This
breaks modules that do reference those symbols.
Use "obj-y" instead to fix this.
http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/
This doesn't fix all cases. There are others, e.g. udivsi3.
This is also not limited to sh, many architectures handle this in the
same way.
A simple solution is to unconditionally include all helper functions.
A more complex solution is to make the choice of "lib-y" or "obj-y" depend
on CONFIG_MODULES:
obj-$(CONFIG_MODULES) += ...
lib-y($CONFIG_MODULES) += ...
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eb1b8af33c upstream.
Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2afc745f3e upstream.
This patch fixes the problem that get_unmapped_area() can return illegal
address and result in failing mmap(2) etc.
In case that the address higher than PAGE_SIZE is set to
/proc/sys/vm/mmap_min_addr, the address lower than mmap_min_addr can be
returned by get_unmapped_area(), even if you do not pass any virtual
address hint (i.e. the second argument).
This is because the current get_unmapped_area() code does not take into
account mmap_min_addr.
This leads to two actual problems as follows:
1. mmap(2) can fail with EPERM on the process without CAP_SYS_RAWIO,
although any illegal parameter is not passed.
2. The bottom-up search path after the top-down search might not work in
arch_get_unmapped_area_topdown().
Note: The first and third chunk of my patch, which changes "len" check,
are for more precise check using mmap_min_addr, and not for solving the
above problem.
[How to reproduce]
--- test.c -------------------------------------------------
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/errno.h>
int main(int argc, char *argv[])
{
void *ret = NULL, *last_map;
size_t pagesize = sysconf(_SC_PAGESIZE);
do {
last_map = ret;
ret = mmap(0, pagesize, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
// printf("ret=%p\n", ret);
} while (ret != MAP_FAILED);
if (errno != ENOMEM) {
printf("ERR: unexpected errno: %d (last map=%p)\n",
errno, last_map);
}
return 0;
}
---------------------------------------------------------------
$ gcc -m32 -o test test.c
$ sudo sysctl -w vm.mmap_min_addr=65536
vm.mmap_min_addr = 65536
$ ./test (run as non-priviledge user)
ERR: unexpected errno: 1 (last map=0x10000)
Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com>
Signed-off-by: Kiyoshi Owada <owada.kiyoshi@jp.panasonic.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
As we do not have vm_unmapped_area(), make arch_get_unmapped_area_topdown()
calculate the lower limit for the new area's end address and then compare
addresses with this instead of with len. In the process, fix an off-by-one
error which could result in returning 0 if mm->mmap_base == len.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fda4e2e855 upstream.
In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
potential to corrupt kernel memory if userspace provides an address that
is at the end of a page. This patches concerts those functions to use
kvm_write_guest_cached and kvm_read_guest_cached. It also checks the
vapic_address specified by userspace during ioctl processing and returns
an error to userspace if the address is not a valid GPA.
This is generally not guest triggerable, because the required write is
done by firmware that runs before the guest. Also, it only affects AMD
processors and oldish Intel that do not have the FlexPriority feature
(unless you disable FlexPriority, of course; then newer processors are
also affected).
Fixes: b93463aa59 ('KVM: Accelerated apic support')
Reported-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[dannf: backported to Debian's 3.2]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 657eb17d87 upstream.
Pick the MAC address of the first virtual interface as the new hardware MAC
address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579.
Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c19ce0ab53 upstream.
Use proper cpp defined(...) constructs to avoid this:
arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo':
arch/ia64/kernel/machine_kexec.c:160:8: warning: "CONFIG_PGTABLE_4" is not defined
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0283f7a100 upstream.
At some point, Measurement Computing / ComputerBoards redesigned the
PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip.
This meant they had to put their hardware registers in the PCI BAR 2
region instead of PCI BAR 1. Unfortunately, they kept the same PCI
device ID for the new design. This means the driver recognizes the
newer cards, but doesn't work (and is likely to screw up the local
configuration registers of the PLX chip) because it's using the wrong
region.
Since all the supported boards have the DIO registers in the PCI BAR 2
region except for older PCI-DIO48H boards which have an empty PCI BAR 2
region and the DIO registers in PCI BAR 1, determine which PCI BAR
region to use based on whether the PCI BAR 2 region is empty or not.
This change makes the `dioregs_badrindex` member of `struct
pcidio_board` redundant. The `pcicontroler_badrindex` member is also
unused, so remove both members.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: kernel-team@lists.ubuntu.com
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a49ecbcd7b upstream.
After a successful hugetlb page migration by soft offline, the source
page will either be freed into hugepage_freelists or buddy(over-commit
page). If page is in buddy, page_hstate(page) will be NULL. It will
hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0
PGD c23762067 PUD c24be2067 PMD 0
Oops: 0000 [#1] SMP
So check PageHuge(page) after call migrate_pages() successfully.
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wujg: backport to 3.4:
- adjust context
- s/num_poisoned_pages/mce_bad_pages/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b0cc6020e1 upstream.
Currently, we enable ARI in a device's upstream bridge if the bridge and
the device support it. But we never disable ARI, even if the device is
removed and replaced with a device that doesn't support ARI.
This means that if we hot-remove an ARI device and replace it with a
non-ARI multi-function device, we find only function 0 of the new device
because the upstream bridge still has ARI enabled, and next_ari_fn()
only returns function 0 for the new non-ARI device.
This patch disables ARI in the upstream bridge if the device doesn't
support ARI. See the PCIe spec, r3.0, sec 6.13.
[bhelgaas: changelog, function comment]
[yijing: replace PCIe Cap accessor with legacy PCI accessor]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3948659e30 upstream.
There have been a few reports of this warning appearing recently:
XFS (dm-4): xlog_space_left: head behind tail
tail_cycle = 129, tail_bytes = 20163072
GH cycle = 129, GH bytes = 20162880
The common cause appears to be lots of freeze and unfreeze cycles,
and the output from the warnings indicates that we are leaking
around 8 bytes of log space per freeze/unfreeze cycle.
When we freeze the filesystem, we write an unmount record and that
uses xlog_write directly - a special type of transaction,
effectively. What it doesn't do, however, is correctly account for
the log space it uses. The unmount record writes an 8 byte structure
with a special magic number into the log, and the space this
consumes is not accounted for in the log ticket tracking the
operation. Hence we leak 8 bytes every unmount record that is
written.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 95f4a45de1 ]
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.
This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.
Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.
Fixes: f0ad0860d0 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Based upon upstream commit 70315d22d3 ]
Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and
FIN_WAIT2 connections are represented by inet_timewait_sock (not just
TIME_WAIT). Thus:
(a) We need to iterate through the time_wait buckets if the user wants
either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state
fin-wait-2" would not return any sockets, even if there were some in
FIN_WAIT2.)
(b) We need to check tw_substate to see if the user wants to dump
sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a
given connection is in. (Before fixing this, "ss -nemoi state
time-wait" would actually return sockets in state FIN_WAIT2.)
An analogous fix is in v3.13: 70315d22d3
("inet_diag: fix inet_diag_dump_icsk() to use correct state for
timewait sockets") but that patch is quite different because 3.13 code
is very different in this area due to the unification of TCP hash
tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1.
I tested that this applies cleanly between v3.3 and v3.12, and tested
that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2
and earlier (though it makes semantic sense), and semantically is not
the right fix for 3.13 and beyond (as mentioned above).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 95e92fd40c ]
bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y:
WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with
different size [device address=0x00000000da2b389e] [map size=1490 bytes]
[unmap size=66 bytes]
The reason is that bnx2x splits a TSO BD into two BDs (headers + data)
using one DMA mapping for both, but it uses only the length of the first
BD when unmapping.
This patch fixes the bug by unmapping the whole length of the two BDs.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit fe0d692bbc ]
br_multicast_set_hash_max() is called from process context in
net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.
br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
which can deadlock the CPU if a softirq that also tries to take the
same lock interrupts br_multicast_set_hash_max() while the lock is
held . This can happen quite easily when any of the bridge multicast
timers expire, which try to take the same lock.
The fix here is to use spin_lock_bh(), preventing other softirqs from
executing on this CPU.
Steps to reproduce:
1. Create a bridge with several interfaces (I used 4).
2. Set the "multicast query interval" to a low number, like 2.
3. Enable the bridge as a multicast querier.
4. Repeatedly set the bridge hash_max parameter via sysfs.
# brctl addbr br0
# brctl addif br0 eth1 eth2 eth3 eth4
# brctl setmcqi br0 2
# brctl setmcquerier br0 1
# while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done
Signed-off-by: Curt Brune <curt@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 4d231b76ee ]
While commit 30a584d944 fixes datagram interface in LLC, a use
after free bug has been introduced for SOCK_STREAM sockets that do
not make use of MSG_PEEK.
The flow is as follow ...
if (!(flags & MSG_PEEK)) {
...
sk_eat_skb(sk, skb, false);
...
}
...
if (used + offset < skb->len)
continue;
... where sk_eat_skb() calls __kfree_skb(). Therefore, cache
original length and work on skb_len to check partial reads.
Fixes: 30a584d944 ("[LLX]: SOCK_DGRAM interface fixes")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 2205369a31 ]
When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.
But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.
This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.
Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.
To facilitate this add a dev_rebuild_header(). There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP). So we need a helper function just like
dev_hard_header() to avoid crashes.
Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.
With lots of help from Florian Westphal.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f81152e350 ]
recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen.
After commit f3d3342602
(net: rework recvmsg handler msg_name and msg_namelen logic), we now
always take the else branch due to namelen being initialized to 0.
Digging in netdev-vger-cvs git repo shows that msg_namelen was
initialized with a fixed-size since at least 1995, so the else branch
was never taken.
Compile tested only.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 8e3fbf8704 ]
The yam_ioctl() code fails to initialise the cmd field
of the struct yamdrv_ioctl_cfg. Add an explicit memset(0)
before filling the structure to avoid the 4-byte info leak.
Signed-off-by: Salva Peiró <speiro@ai2.upv.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e9db5c21d3 ]
The local variable 'bi' comes from userspace. If userspace passed a
large number to 'bi.data.calibrate', there would be an integer overflow
in the following line:
s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
Signed-off-by: Wenliang Fan <fanwlexca@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b1aac815c0 ]
Jakub reported while working with nlmon netlink sniffer that parts of
the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
memory through this. At least, in udp_dump_one(), we allocate a skb in ...
rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
... and then pass that to inet_sk_diag_fill() that puts the whole struct
inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
r->id.idiag_dst[0] and leave the rest untouched:
r->id.idiag_src[0] = inet->inet_rcv_saddr;
r->id.idiag_dst[0] = inet->inet_daddr;
struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
fully filled out in IPv6 case, but for IPv4 not.
So just zero them out by using plain memset (for this little amount of
bytes it's probably not worth the extra check for idiag_family == AF_INET).
Similarly, fix also other places where we fill that out.
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 37ab4fa784 ]
This is similar to the set_peek_off patch where calling bind while the
socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
spew after a while.
This is also the last place that did a straightforward mutex_lock(), so
there shouldn't be any more of these patches.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 388d333557 ]
The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120)
uninitialized. From power on reset this register may have garbage in it. The
Register Base Address register defines the device local address of a
register. The data pointed to by this location is read or written using
the Register Data register (PCI config offset 128). When REG_BASE_ADDR has
garbage any read or write of Register Data Register (PCI 128) will cause the
PCI bus to lock up. The TCO watchdog will fire and bring down the system.
Signed-off-by: Nat Gurumoorthy <natg@google.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d323e92cc3 ]
maxattr in genl_family should be used to save the max attribute
type, but not the max command type. Drop monitor doesn't support
any attributes, so we should leave it as zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit a3300ef4bb ]
Brett Ciphery reported that new ipv6 addresses failed to get installed
because the addrconf generated dsts where counted against the dst gc
limit. We don't need to count those routes like we currently don't count
administratively added routes.
Because the max_addresses check enforces a limit on unbounded address
generation first in case someone plays with router advertisments, we
are still safe here.
Reported-by: Brett Ciphery <brett.ciphery@windriver.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 18fc25c94e ]
After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.
For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
The commit 6094628bfd
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
&& rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
= 8192
c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
while (ret) {
tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
[tmp = 65536 - 57632 = 7904]
conn->c_xmit_data_off += tmp;
[c_xmit_data_off = 57632 + 7904 = 65536]
ret -= tmp;
[ret = 8240 - 7904 = 336]
if (conn->c_xmit_data_off == sg->length) {
conn->c_xmit_data_off = 0;
sg++;
conn->c_xmit_sg++;
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
[c_xmit_sg = 1, rm->data.op_nents = 1]
What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.
Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 28e24c62ab ]
Few network drivers really supports frag_list : virtual drivers.
Some drivers wrongly advertise NETIF_F_FRAGLIST feature.
If skb with a frag_list is given to them, packet on the wire will be
corrupt.
Remove this flag, as core networking stack will make sure to
provide packets that can be sent without corruption.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Anirudha Sarangi <anirudh@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c876006962 upstream.
Current MMC driver doesn't handle generic error (bit19 of device
status) in write sequence. As a result, write data gets lost when
generic error occurs. For example, a generic error when updating a
filesystem management information causes a loss of write data and
corrupts the filesystem. In the worst case, the system will never
boot.
This patch includes the following functionality:
1. To enable error checking for the response of CMD12 and CMD13
in write command sequence
2. To retry write sequence when a generic error occurs
Messages are added for v2 to show what occurs.
[Backported to 3.4-stable]
Signed-off-by: KOBAYASHI Yoshitake <yoshitake.kobayashi@toshiba.co.jp>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8a56d7761d upstream.
Commit 8c4f3c3fa9 "ftrace: Check module functions being traced on reload"
fixed module loading and unloading with respect to function tracing, but
it missed the function graph tracer. If you perform the following
# cd /sys/kernel/debug/tracing
# echo function_graph > current_tracer
# modprobe nfsd
# echo nop > current_tracer
You'll get the following oops message:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
Call Trace:
[<ffffffff814fe193>] dump_stack+0x4f/0x7c
[<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
[<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
[<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
[<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
[<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
[<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
[<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
[<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
[<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
[<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
[<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
[<ffffffff81122aed>] vfs_write+0xab/0xf6
[<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
[<ffffffff81122dbd>] SyS_write+0x59/0x82
[<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
[<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
---[ end trace 940358030751eafb ]---
The above mentioned commit didn't go far enough. Well, it covered the
function tracer by adding checks in __register_ftrace_function(). The
problem is that the function graph tracer circumvents that (for a slight
efficiency gain when function graph trace is running with a function
tracer. The gain was not worth this).
The problem came with ftrace_startup() which should always be called after
__register_ftrace_function(), if you want this bug to be completely fixed.
Anyway, this solution moves __register_ftrace_function() inside of
ftrace_startup() and removes the need to call them both.
Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Fixes: ed926f9b35 ("ftrace: Use counters to enable functions to trace")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8c4f3c3fa9 upstream.
There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:
WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
[<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
[<ffffffff811401cc>] ? kfree+0x2c/0x110
[<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
[<ffffffff81149f1e>] __fput+0xae/0x220
[<ffffffff8114a09e>] ____fput+0xe/0x10
[<ffffffff8105fa22>] task_work_run+0x72/0x90
[<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
[<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff815c0f88>] int_signal+0x12/0x17
---[ end trace 793179526ee09b2c ]---
It was finally narrowed down to unloading a module that was being traced.
It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.
The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).
When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.
Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).
The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.
With the help of Steve and Joern, we found a reproducer:
Using uinput module and uinput_release function.
cd /sys/kernel/debug/tracing
modprobe uinput
echo uinput_release > set_ftrace_filter
echo function > current_tracer
rmmod uinput
modprobe uinput
# check /proc/modules to see if loaded in same addr, otherwise try again
echo nop > current_tracer
[BOOM]
The above loads the uinput module, which creates a table of functions that
can be traced within the module.
We add uinput_release to the filter_hash to trace just that function.
Enable function tracincg, which increments the ref count of the record
associated to uinput_release.
Remove uinput, which frees the records including the one that represents
uinput_release.
Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.
Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).
The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.
There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.
Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
Reported-by: Jörn Engel <joern@logfs.org>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Steve Hodgson <steve@purestorage.com>
Tested-by: Steve Hodgson <steve@purestorage.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 06a51d9307 upstream.
There are two types of hashes in the ftrace_ops; one type
is the filter_hash and the other is the notrace_hash. Either
one may be null, meaning it has no elements. But when elements
are added, the hash is allocated.
Throughout the code, a check needs to be made to see if a hash
exists or the hash has elements, but the check if the hash exists
is usually missing causing the possible "NULL pointer dereference bug".
Add a helper routine called "ftrace_hash_empty()" that returns
true if the hash doesn't exist or its count is zero. As they mean
the same thing.
Last-bug-reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c842e97552 upstream.
When disabling the "notrace" records, that means we want to trace them.
If the notrace_hash is zero, it means that we want to trace all
records. But to disable a zero notrace_hash means nothing.
The check for the notrace_hash count was incorrect with:
if (hash && !hash->count)
return
With the correct comment above it that states that we do nothing
if the notrace_hash has zero count. But !hash also means that
the notrace hash has zero count. I think this was done to
protect against dereferencing NULL. But if !hash is true, then
we go through the following loop without doing a single thing.
Fix it to:
if (!hash || !hash->count)
return;
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f09234385 upstream.
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).
This issue were introduced by commit ec5efe7946
(rps: support IPIP encapsulation).
Cc: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: the affected code is in __skb_get_rxhash()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 31978b5cc6 upstream.
If we allocate less than sizeof(struct attrlist) then we end up
corrupting memory or doing a ZERO_PTR_SIZE dereference.
This can only be triggered with CAP_SYS_ADMIN.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 071c529eb6)
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4789b8e6b upstream.
It appears that driver runs into a problem here if fibsize is too small
because we allocate user_srbcmd with fibsize size only but later we
access it until user_srbcmd->sg.count to copy it over to srbcmd.
It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this
structure already includes one sg element and this is not needed for
commands without data. So, we would recommend to add the following
(instead of test for fibsize == 0).
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com>
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a497e47d4a upstream.
If we do a zero size allocation then it will oops. Also we can't be
sure the user passes us a NUL terminated string so I've added a
terminator.
This code can only be triggered by root.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8404663f81 upstream.
The {get,put}_user macros don't perform range checking on the provided
__user address when !CPU_HAS_DOMAINS.
This patch reworks the out-of-line assembly accessors to check the user
address against a specified limit, returning -EFAULT if is is out of
range.
[will: changed get_user register allocation to match put_user]
[rmk: fixed building on older ARM architectures]
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: TUSER() was called T()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e40f193f5b upstream.
The iommu integration into memory slots expects memory slots to be
added or removed and doesn't handle the move case. We can unmap
slots from the iommu after we mark them invalid and map them before
installing the final memslot array. Also re-order the kmemdup vs
map so we don't leave iommu mappings if we get ENOMEM.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 12d6e7538e upstream.
PPC must flush all translations before the new memory slot
is visible.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 714b33d151 upstream.
Stephan Mueller reported to me recently a error in random number generation in
the ansi cprng. If several small requests are made that are less than the
instances block size, the remainder for loop code doesn't increment
rand_data_valid in the last iteration, meaning that the last bytes in the
rand_data buffer gets reused on the subsequent smaller-than-a-block request for
random data.
The fix is pretty easy, just re-code the for loop to make sure that
rand_data_valid gets incremented appropriately
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Stephan Mueller <stephan.mueller@atsec.com>
CC: Stephan Mueller <stephan.mueller@atsec.com>
CC: Petr Matousek <pmatouse@redhat.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8821f5dc18 upstream.
When working on report indexes, always validate that they are in bounds.
Without this, a HID device could report a malicious feature report that
could trick the driver into a heap overflow:
[ 634.885003] usb 1-1: New USB device found, idVendor=0596, idProduct=0500
...
[ 676.469629] BUG kmalloc-192 (Tainted: G W ): Redzone overwritten
Note that we need to change the indexes from s8 to s16 as they can
be between -1 and 255.
CVE-2013-2897
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
[bwh: Backported to 3.2: mt_device::{cc,cc_value,inputmode}_index do not
exist and the corresponding indices do not need to be validated.
mt_device::maxcontact_report_id does not exist either. So all we need
to do is to widen mt_device::inputmode.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 3868204d6b ]
commit a553e4a631 ("[PKTGEN]: IPSEC support")
tried to support IPsec ESP transport transformation for pktgen, but acctually
this doesn't work at all for two reasons(The orignal transformed packet has
bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)
- After transpormation, IPv4 header total length needs update,
because encrypted payload's length is NOT same as that of plain text.
- After transformation, IPv4 checksum needs re-caculate because of payload
has been changed.
With this patch, armmed pktgen with below cofiguration, Wireshark is able to
decrypted ESP packet generated by pktgen without any IPv4 checksum error or
auth value error.
pgset "flag IPSEC"
pgset "flows 1"
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 7f88c6b23a ]
IPv6 stats are 64 bits and thus are protected with a seqlock. By not
disabling bottom-half we could deadlock here if we don't disable bh and
a softirq reentrantly updates the same mib.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f1d8cba61c ]
In commit c9e9042994 ("ipv4: fix possible seqlock deadlock") I left
another places where IP_INC_STATS_BH() were improperly used.
udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
process context, not from softirq context.
This was detected by lockdep seqlock support.
Reported-by: jongman heo <jongman.heo@samsung.com>
Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit e40526cb20 ]
Salam reported a use after free bug in PF_PACKET that occurs when
we're sending out frames on a socket bound device and suddenly the
net device is being unregistered. It appears that commit 827d9780
introduced a possible race condition between {t,}packet_snd() and
packet_notifier(). In the case of a bound socket, packet_notifier()
can drop the last reference to the net_device and {t,}packet_snd()
might end up suddenly sending a packet over a freed net_device.
To avoid reverting 827d9780 and thus introducing a performance
regression compared to the current state of things, we decided to
hold a cached RCU protected pointer to the net device and maintain
it on write side via bind spin_lock protected register_prot_hook()
and __unregister_prot_hook() calls.
In {t,}packet_snd() path, we access this pointer under rcu_read_lock
through packet_cached_dev_get() that holds reference to the device
to prevent it from being freed through packet_notifier() while
we're in send path. This is okay to do as dev_put()/dev_hold() are
per-cpu counters, so this should not be a performance issue. Also,
the code simplifies a bit as we don't need need_rls_dev anymore.
Fixes: 827d978037 ("af-packet: Use existing netdev reference for bound sockets.")
Reported-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f873042093 ]
When the following commands are executed:
brctl addbr br0
ifconfig br0 hw ether <addr>
rmmod bridge
The calltrace will occur:
[ 563.312114] device eth1 left promiscuous mode
[ 563.312188] br0: port 1(eth1) entered disabled state
[ 563.468190] kmem_cache_destroy bridge_fdb_cache: Slab cache still has objects
[ 563.468197] CPU: 6 PID: 6982 Comm: rmmod Tainted: G O 3.12.0-0.7-default+ #9
[ 563.468199] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 563.468200] 0000000000000880 ffff88010f111e98 ffffffff814d1c92 ffff88010f111eb8
[ 563.468204] ffffffff81148efd ffff88010f111eb8 0000000000000000 ffff88010f111ec8
[ 563.468206] ffffffffa062a270 ffff88010f111ed8 ffffffffa063ac76 ffff88010f111f78
[ 563.468209] Call Trace:
[ 563.468218] [<ffffffff814d1c92>] dump_stack+0x6a/0x78
[ 563.468234] [<ffffffff81148efd>] kmem_cache_destroy+0xfd/0x100
[ 563.468242] [<ffffffffa062a270>] br_fdb_fini+0x10/0x20 [bridge]
[ 563.468247] [<ffffffffa063ac76>] br_deinit+0x4e/0x50 [bridge]
[ 563.468254] [<ffffffff810c7dc9>] SyS_delete_module+0x199/0x2b0
[ 563.468259] [<ffffffff814e0922>] system_call_fastpath+0x16/0x1b
[ 570.377958] Bridge firewalling registered
--------------------------- cut here -------------------------------
The reason is that when the bridge dev's address is changed, the
br_fdb_change_mac_address() will add new address in fdb, but when
the bridge was removed, the address entry in the fdb did not free,
the bridge_fdb_cache still has objects when destroy the cache, Fix
this by flushing the bridge address entry when removing the bridge.
v2: according to the Toshiaki Makita and Vlad's suggestion, I only
delete the vlan0 entry, it still have a leak here if the vlan id
is other number, so I need to call fdb_delete_by_port(br, NULL, 1)
to flush all entries whose dst is NULL for the bridge.
Suggested-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Suggested-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit d2615bf450 ]
The following commit:
b6c40d68ff
net: only invoke dev->change_rx_flags when device is UP
tried to fix a problem with VLAN devices and promiscuouse flag setting.
The issue was that VLAN device was setting a flag on an interface that
was down, thus resulting in bad promiscuity count.
This commit blocked flag propagation to any device that is currently
down.
A later commit:
deede2fabe
vlan: Don't propagate flag changes on down interfaces
fixed VLAN code to only propagate flags when the VLAN interface is up,
thus fixing the same issue as above, only localized to VLAN.
The problem we have now is that if we have create a complex stack
involving multiple software devices like bridges, bonds, and vlans,
then it is possible that the flags would not propagate properly to
the physical devices. A simple examle of the scenario is the
following:
eth0----> bond0 ----> bridge0 ---> vlan50
If bond0 or eth0 happen to be down at the time bond0 is added to
the bridge, then eth0 will never have promisc mode set which is
currently required for operation as part of the bridge. As a
result, packets with vlan50 will be dropped by the interface.
The only 2 devices that implement the special flag handling are
VLAN and DSA and they both have required code to prevent incorrect
flag propagation. As a result we can remove the generic solution
introduced in b6c40d68ff and leave
it to the individual devices to decide whether they will block
flag propagation or not.
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b5de4a22f1 ]
init_card() calls dev_get_by_name() to get a network deceive. But it
doesn't decrease network device reference count after the device is
used.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit db31c55a6f ]
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
original code that would lead to memory corruption in the kernel if you
had audit configured. If you didn't have audit configured it was
harmless.
There are some programs such as beta versions of Ruby which use too
large of a buffer and returning an error code breaks them. We should
clamp the ->msg_namelen value instead.
Fixes: 1661bf364a ("net: heap overflow in __audit_sockaddr()")
Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Eric Wong <normalperson@yhbt.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 85fbaa7503 ]
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.
As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.
This broke traceroute and such.
Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 68c6beb373 ]
In that case it is probable that kernel code overwrote part of the
stack. So we should bail out loudly here.
The BUG_ON may be removed in future if we are sure all protocols are
conformant.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f3d3342602 ]
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.
This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.
Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.
Also document these changes in include/linux/net.h as suggested by David
Miller.
Changes since RFC:
Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.
With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".
This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.
Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.
Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit bceaa90240 ]
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.
If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit c9e9042994 ]
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.
Fixes: 584bdf8cbd ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1ca1a4cf59 ]
In af3e095a1f, Erik Jacobsen fixed one type of unaligned access
bug for ia64 by converting a 64-bit write to use put_unaligned().
Unfortunately, since gcc will convert a short memset() to a series
of appropriately-aligned stores, the problem is now visible again
on tilegx, where the memset that zeros out proc_event is converted
to three 64-bit stores, causing an unaligned access panic.
A better fix for the original problem is to ensure that proc_event
is aligned to 8 bytes here. We can do that relatively easily by
arranging to start the struct cn_msg aligned to 8 bytes and then
offset by 4 bytes. Doing so means that the immediately following
proc_event structure is then correctly aligned to 8 bytes.
The result is that the memset() stores are now aligned, and as an
added benefit, we can remove the put_unaligned() calls in the code.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f9a23c8448 ]
These strings come from a copy_from_user() and there is no way to be
sure they are NUL terminated.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit b869ccfab1 ]
This patch fixes two race conditions between bond_store_updelay/downdelay
and bond_store_miimon which could lead to division by zero as miimon can
be set to 0 while either updelay/downdelay are being set and thus miss the
zero check in the beginning, the zero div happens because updelay/downdelay
are stored as new_value / bond->params.miimon. Use rtnl to synchronize with
miimon setting.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 1188f05497 ]
If priority/traffic class field in IPv6 header is set (seen when
using ssh), the uncompression sets the TC and Flow fields incorrectly.
Example:
This is IPv6 header of a sent packet. Note the priority/TC (=1) in
the first byte.
00000000: 61 00 00 00 00 2c 06 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: 02 1e ab ff fe 4c 52 57
This gets compressed like this in the sending side
00000000: 72 31 04 06 02 1e ab ff fe 4c 52 57 ec c2 00 16
00000010: aa 2d fe 92 86 4e be c6 ....
In the receiving end, the packet gets uncompressed to this
IPv6 header
00000000: 60 06 06 02 00 2a 1e 40 fe 80 00 00 00 00 00 00
00000010: 02 02 72 ff fe c6 42 10 fe 80 00 00 00 00 00 00
00000020: ab ff fe 4c 52 57 ec c2
First four bytes are set incorrectly and we have also lost
two bytes from destination address.
The fix is to switch the case values in switch statement
when checking the TC field.
Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit ec9f1d15db ]
Currently the ARP monitoring is not supported with 802.3ad, and it's
prohibited to use it via the module params.
However we still can set it afterwards via sysfs, cause we only check for
*LB modes there.
To fix this - add a check for 802.3ad mode in bonding_store_arp_interval.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 51c37a70aa ]
For properly initialising the Tausworthe generator [1], we have
a strict seeding requirement, that is, s1 > 1, s2 > 7, s3 > 15.
Commit 697f8d0348 ("random32: seeding improvement") introduced
a __seed() function that imposes boundary checks proposed by the
errata paper [2] to properly ensure above conditions.
However, we're off by one, as the function is implemented as:
"return (x < m) ? x + m : x;", and called with __seed(X, 1),
__seed(X, 7), __seed(X, 15). Thus, an unwanted seed of 1, 7, 15
would be possible, whereas the lower boundary should actually
be of at least 2, 8, 16, just as GSL does. Fix this, as otherwise
an initialization with an unwanted seed could have the effect
that Tausworthe's PRNG properties cannot not be ensured.
Note that this PRNG is *not* used for cryptography in the kernel.
[1] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme.ps
[2] http://www.iro.umontreal.ca/~lecuyer/myftp/papers/tausme2.ps
Joint work with Hannes Frederic Sowa.
Fixes: 697f8d0348 ("random32: seeding improvement")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit f104a567e6 ]
As the rfc 4191 said, the Router Preference and Lifetime values in a
::/0 Route Information Option should override the preference and lifetime
values in the Router Advertisement header. But when the kernel deals with
a ::/0 Route Information Option, the rt6_get_route_info() always return
NULL, that means that overriding will not happen, because those default
routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router().
In order to deal with that condition, we should call rt6_get_dflt_router
when the prefix length is 0.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[ Upstream commit 13eb2ab2d3 ]
When trying to delete a table >= 256 using iproute2 the local table
will be deleted.
The table id is specified as a netlink attribute when it needs more then
8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0).
Preconditions to matching the table id in the rule delete code
doesn't seem to take the "table id in netlink attribute" into condition
so the frh_get_table helper function never gets to do its job when
matching against current rule.
Use the helper function twice instead of peaking at the table value directly.
Originally reported at: http://bugs.debian.org/724783
Reported-by: Nicolas HICHER <nhicher@avencall.com>
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f9f9ffc237 upstream.
throttle_cfs_rq() doesn't check to make sure that period_timer is running,
and while update_curr/assign_cfs_runtime does, a concurrently running
period_timer on another cpu could cancel itself between this cpu's
update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
in the tg to restart the timer, this causes the cfs_rq to be stranded
forever.
Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
inactive.
(Also add some sched_debug lines for cfs_bandwidth.)
Tested: make a run/sleep task in a cgroup, loop switching the cgroup
between 1ms/100ms quota and unlimited, checking for timer_active=0 and
throttled=1 as a failure. With the throttle_cfs_rq() change commented out
this fails, with the full patch it passes.
Signed-off-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust filenames]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 446b802437 upstream.
In selinux_ip_postroute() we perform access checks based on the
packet's security label. For locally generated traffic we get the
packet's security label from the associated socket; this works in all
cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
the correct security label is stored in the connection's request_sock,
not the server's socket. Unfortunately, at the point in time when
selinux_ip_postroute() is called we can't query the request_sock
directly, we need to recreate the label using the same logic that
originally labeled the associated request_sock.
See the inline comments for more explanation.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4718006827 upstream.
In selinux_ip_output() we always label packets based on the parent
socket. While this approach works in almost all cases, it doesn't
work in the case of TCP SYN-ACK packets when the correct label is not
the label of the parent socket, but rather the label of the larval
socket represented by the request_sock struct.
Unfortunately, since the request_sock isn't queued on the parent
socket until *after* the SYN-ACK packet is sent, we can't lookup the
request_sock to determine the correct label for the packet; at this
point in time the best we can do is simply pass/NF_ACCEPT the packet.
It must be said that simply passing the packet without any explicit
labeling action, while far from ideal, is not terrible as the SYN-ACK
packet will inherit any IP option based labeling from the initial
connection request so the label *should* be correct and all our
access controls remain in place so we shouldn't have to worry about
information leaks.
Reported-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Tested-by: Janak Desai <Janak.Desai@gtri.gatech.edu>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b963a22e6d upstream.
Under guest controllable circumstances apic_get_tmcct will execute a
divide by zero and cause a crash. If the guest cpuid support
tsc deadline timers and performs the following sequence of requests
the host will crash.
- Set the mode to periodic
- Set the TMICT to 0
- Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
- Set the TMICT to non-zero.
Then the lapic_timer.period will be 0, but the TMICT will not be. If the
guest then reads from the TMCCT then the host will perform a divide by 0.
This patch ensures that if the lapic_timer.period is 0, then the division
does not occur.
Reported-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[bwh: Backported to 3.2: s/kvm_apic_get_reg/apic_get_reg/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 338c7dbadd upstream.
In multiple functions the vcpu_id is used as an offset into a bitfield. Ag
malicious user could specify a vcpu_id greater than 255 in order to set or
clear bits in kernel memory. This could be used to elevate priveges in the
kernel. This patch verifies that the vcpu_id provided is less than 255.
The api documentation already specifies that the vcpu_id must be less than
max_vcpus, but this is currently not checked.
Reported-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f12d5bfceb upstream.
The hugepage code had the exact same bug that regular pages had in
commit 7485d0d375 ("futexes: Remove rw parameter from
get_futex_key()").
The regular page case was fixed by commit 9ea71503a8 ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0: "thp: update futex compound knowledge") case
remained broken.
Found by Dave Jones and his trinity tool.
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3806b45ba4 upstream.
The "rpm * div" operations can overflow here, so this patch adds an
upper limit to rpm to prevent that. Jean Delvare helped me with this
patch.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Roger Lucas <vt8231@hiddenengine.co.uk>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 33a7ab91d5 upstream.
The W83L786NG stores the fan speed on 4 bits while the sysfs interface
uses a 0-255 range. Thus the driver should scale the user input down
to map it to the device range, and scale up the value read from the
device before presenting it to the user. The reserved register nibble
should be left unchanged.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit cf7559bc05 upstream.
The wrong mask is used, which causes some fan speed control modes
(pwmX_enable) to be incorrectly reported, and some modes to be
impossible to set.
[JD: add subject and description.]
Signed-off-by: Brian Carnes <bmcarnes@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ff88b4724f upstream.
Erratum 71 of PXA270M Processor Family Specification Update
(April 19, 2010) explains that watchdog reset time is just
8us insead of 10ms in EMTS.
If SDRAM is not reset, it causes memory bus congestion and
the device hangs. We put SDRAM in selfresh mode before watchdog
reset, removing potential freezes.
Without this patch PXA270-based ICP DAS LP-8x4x hangs after up to 40
reboots. With this patch it has successfully rebooted 500 times.
Signed-off-by: Sergei Ianovich <ynvich@gmail.com>
Tested-by: Marek Vasut <marex@denx.de>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 506cac15ac upstream.
When converting from tosa-keyboard driver to matrix keyboard, tosa keys
received extra 1 column shift. Replace that with correct values to make
keyboard work again.
Fixes: f69a6548c9 ('[ARM] pxa/tosa: make use of the matrix keypad driver')
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4cb57ab4a2 upstream.
Some module parameters in dm-bufio are read-only. These parameters
inform the user about memory consumption. They are not supposed to be
changed by the user.
However, despite being read-only, these parameters can be set on
modprobe or insmod command line, for example:
modprobe dm-bufio current_allocated_bytes=12345
The kernel doesn't expect that these variables can be non-zero at module
initialization and if the user sets them, it results in BUG.
This patch initializes the variables in the module init routine, so that
user-supplied values are ignored.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5b2d06576c upstream.
The dm_round_up function may overflow to zero. In this case,
dm_table_create() must fail rather than go on to allocate an empty array
with alloc_targets().
This fixes a possible memory corruption that could be caused by passing
too large a number in "param->target_count".
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 230c83afdd upstream.
There is a possible leak of snapshot space in case of crash.
The reason for space leaking is that chunks in the snapshot device are
allocated sequentially, but they are finished (and stored in the metadata)
out of order, depending on the order in which copying finished.
For example, supposed that the metadata contains the following records
SUPERBLOCK
METADATA (blocks 0 ... 250)
DATA 0
DATA 1
DATA 2
...
DATA 250
Now suppose that you allocate 10 new data blocks 251-260. Suppose that
copying of these blocks finish out of order (block 260 finished first
and the block 251 finished last). Now, the snapshot device looks like
this:
SUPERBLOCK
METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256)
DATA 0
DATA 1
DATA 2
...
DATA 250
DATA 251
DATA 252
DATA 253
DATA 254
DATA 255
METADATA (blocks 255, 254, 253, 252, 251)
DATA 256
DATA 257
DATA 258
DATA 259
DATA 260
Now, if the machine crashes after writing the first metadata block but
before writing the second metadata block, the space for areas DATA 250-255
is leaked, it contains no valid data and it will never be used in the
future.
This patch makes dm-snapshot complete exceptions in the same order they
were allocated, thus fixing this bug.
Note: when backporting this patch to the stable kernel, change the version
field in the following way:
* if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0}
* if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it
to {1, 10, 2}
Userspace reads the version to determine if the bug was fixed, so the
version change is needed.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 932e9dec38 upstream.
When running a 32bit kernel the hda_intel driver is still reporting
a 64bit dma_mask if the HW supports it.
From sound/pci/hda/hda_intel.c:
/* allow 64bit DMA address if supported by H/W */
if ((gcap & ICH6_GCAP_64OK) && !pci_set_dma_mask(pci, DMA_BIT_MASK(64)))
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(64));
else {
pci_set_dma_mask(pci, DMA_BIT_MASK(32));
pci_set_consistent_dma_mask(pci, DMA_BIT_MASK(32));
}
which means when there is a call to dma_alloc_coherent from
snd_malloc_dev_pages a machine address bigger than 32bit can be returned.
This can be true in particular if running the 32bit kernel as a pv dom0
under the Xen Hypervisor or PAE on bare metal.
The problem is that when calling setup_bdle to program the BLE the
dma_addr_t returned from the dma_alloc_coherent is wrongly truncated
from snd_sgbuf_get_addr if running a 32bit kernel:
static inline dma_addr_t snd_sgbuf_get_addr(struct snd_dma_buffer *dmab,
size_t offset)
{
struct snd_sg_buf *sgbuf = dmab->private_data;
dma_addr_t addr = sgbuf->table[offset >> PAGE_SHIFT].addr;
addr &= PAGE_MASK;
return addr + offset % PAGE_SIZE;
}
where PAGE_MASK in a 32bit kernel is zeroing the upper 32bit af addr.
Without this patch the HW will fetch the 32bit truncated address,
which is not the one obtained from dma_alloc_coherent and will result
to a non working audio but can corrupt host memory at a random location.
The current patch apply to v3.13-rc3-74-g6c843f5
Signed-off-by: Stefano Panella <stefano.panella@citrix.com>
Reviewed-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8b3b005d67 upstream.
In checkin
5551a34e5a x86-64, build: Always pass in -mno-sse
we unconditionally added -mno-sse to the main build, to keep newer
compilers from generating SSE instructions from autovectorization.
However, this did not extend to the special environments
(arch/x86/boot, arch/x86/boot/compressed, and arch/x86/realmode/rm).
Add -mno-sse to the compiler command line for these environments, and
add -mno-mmx to all the environments as well, as we don't want a
compiler to generate MMX code either.
This patch also removes a $(cc-option) call for -m32, since we have
long since stopped supporting compilers too old for the -m32 option,
and in fact hardcode it in other places in the Makefiles.
Reported-by: Kevin B. Smith <kevin.b.smith@intel.com>
Cc: Sunil K. Pandey <sunil.k.pandey@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: H. J. Lu <hjl.tools@gmail.com>
Link: http://lkml.kernel.org/n/tip-j21wzqv790q834n7yc6g80j1@git.kernel.org
[bwh: Backported to 3.2:
- Drop changes to arch/x86/Makefile, which sets these flags earlier
- Adjust context
- Drop changes to arch/x86/realmode/rm/Makefile which doesn't exist]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 389a539058 upstream.
Now that scatterwalk_sg_chain sets the chain pointer bit the sg_page
call in scatterwalk_sg_next hits a BUG_ON when CONFIG_DEBUG_SG is
enabled. Use sg_chain_ptr instead of sg_page on a chain entry.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2d51f3cd11 upstream.
This patch adds a check for USB_STATE_NOTATTACHED to the
hub_port_warm_reset_required() workaround for ports that end up in
Compliance Mode in hub_events() when trying to decide which reset
function to use. Trying to call usb_reset_device() with a NOTATTACHED
device will just fail and leave the port broken.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4af6ef99a upstream.
According to WM8731 "PD, Rev 4.9 October 2012" datasheet, when it
works in DSP mode A, LRP = 1, while works in DSP mode B, LRP = 0.
So, fix LRP for DSP mode as the datesheet specification.
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1aeef303b5 upstream.
For MPC8572/MPC8536, the status of GPIOs defined as output
cannot be determined by reading GPDAT register, so the code
use shadow data register instead. But the code may give the
wrong status of GPIOs defined as input under some scenarios:
1. If some pins were configured as inputs and were asserted
high before booting the kernel, the shadow data has been
initialized with those pin values.
2. Some pins have been configured as output first and have
been set to the high value, then reconfigured as input.
The above cases will make the shadow data for those input
pins to be set to high. Then reading the pin status will
always return high even if the actual pin status is low.
The code should eliminate the effects of the shadow data to
the input pins, and the status of those pins should be
read directly from GPDAT.
Acked-by: Scott Wood <scottwood@freescale.com>
Acked-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a313249937 upstream.
This patch fixes the CS5 setting on the PL2303 USB-to-serial devices. CS5 has a
value of 0 and the CSIZE setting has been skipped altogether by the enclosing
if. Tested on 3.11.6 and the scope shows the correct output after the fix has
been applied.
Tagged to be added to stable, because it fixes a user visible driver bug and is
simple enough to backport easily.
Signed-off-by: Colin Leitner <colin.leitner@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Old code is cosmetically different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8704211f65 upstream.
FTDI UARTs support only 7 or 8 data bits. Until now the ftdi_sio driver would
only report this limitation for CS6 to dmesg and fail to reflect this fact to
tcgetattr.
This patch reverts the unsupported CSIZE setting and reports the fact with less
severance to dmesg for both CS5 and CS6.
To test the patch it's sufficient to call
stty -F /dev/ttyUSB0 cs5
which will succeed without the patch and report an error with the patch
applied.
As an additional fix this patch ensures that the control request will always
include a data bit size.
Signed-off-by: Colin Leitner <colin.leitner@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Old code is cosmetically different
- s/ddev/\&port->dev/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78692cc338 upstream.
This patch removes an erroneous check of CSIZE, which made it impossible to set
CS5.
Compiles clean, but couldn't test against hardware.
Signed-off-by: Colin Leitner <colin.leitner@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 711fbdfbf2 upstream.
This patch removes an erroneous check of CSIZE, which made it impossible to set
CS5.
Compiles clean, but couldn't test against hardware.
Signed-off-by: Colin Leitner <colin.leitner@gmail.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bf308d7bc upstream.
Add new supporting declarations to option.c, to support Huawei new
devices with new bInterfaceProtocol value.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f173e22ab upstream.
Interface 1 on this device isn't for option to bind to otherwise an oops
on usb_wwan with log flooding will happen when accessing the port:
tty_release: ttyUSB1: read/write wait queue active!
It doesn't seem to respond to QMI if it's added to qmi_wwan so don't add
it there - it's likely used by the card reader.
Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a1470c7bf3 upstream.
Bug report from: wenxiong@linux.vnet.ibm.com
The issue is happened in dual controller configuration. We got the
sysfs warnings when rmmod the ipr module.
enclosure_unregister() in drivers/msic/enclosure.c, call device_unregister()
for each componment deivce, device_unregister() ->device_del()->kobject_del()
->sysfs_remove_dir(). In sysfs_remove_dir(), set kobj->sd = NULL.
For each componment device,
enclosure_component_release()->enclosure_remove_links()->sysfs_remove_link()
in which checking kobj->sd again, it has been set as NULL when doing
device_unregister. So we saw all these sysfs WARNING.
Tested-by: wenxiong@linux.vnet.ibm.com
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 88bf6d62db upstream.
A return value of 1 is interpreted as an error. See pci_driver.
in local_pci_probe(). If you're wondering how this ever could
have worked, it's because it used to be the case that only return
values less than zero were interpreted as failure. But even in
the current kernel if the driver registers its various entry
points with the kernel, and then returns a value which is
interpreted as failure, those registrations aren't undone, so
the driver still mostly works. However, the driver's remove
function wouldn't be called on rmmod, and pci power management
functions wouldn't work. In the case of Smart Array, since it
has a battery backed cache (or else no cache) even if the driver
is not shut down properly as long as there is no outstanding
i/o, nothing too bad happens, which is why it took so long to
notice.
Requesting backport to stable because the change to pci-driver.c
which requires driver probe functions to return 0 occurred between
2.6.35 and 2.6.36 (the pci power management breakage) and again
between 3.7 and 3.8 (pci_dev->driver getting set to NULL in
local_pci_probe() preventing driver remove function from being
called on rmmod.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2e311fbabd upstream.
We inadvertantly discarded the scsi status for aborted commands.
For some commands (e.g. reads from tape drives) these can't be retried,
and if we discarded the scsi status, the scsi mid layer couldn't notice
anything was wrong and the error was not reported.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 43659222e7 upstream.
It's no good setting vga_base after the VGA console has been
initialised, because if we do that we get this:
Unable to handle kernel paging request at virtual address 000b8000
pgd = c0004000
[000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000
0Internal error: Oops: 5017 [#1] ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ #49
task: c03e2974 ti: c03d8000 task.ti: c03d8000
PC is at vgacon_startup+0x258/0x39c
LR is at request_resource+0x10/0x1c
pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053
sp : c03d9f68 ip : 000b8000 fp : c03d9f8c
r10: 000055aa r9 : 4401a103 r8 : ffffaa55
r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000
r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971
Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel
which is an access to the 0xb8000 without the PCI offset required to
make it work.
Fixes: cc22b4c185 ("ARM: set vga memory base at run-time")
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d3f7d56a7a upstream.
Commit 35f9c09fe (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag MSG_SENDPAGE_NOTLAST, similar to
MSG_MORE.
algif_hash, algif_skcipher, and udp used MSG_MORE from tcp_sendpages()
and need to see the new flag as identical to MSG_MORE.
This fixes sendfile() on AF_ALG.
v3: also fix udp
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Reported-and-tested-by: Shawn Landden <shawnlandden@gmail.com>
Original-patch: Richard Weinberger <richard@nod.at>
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89f4d45b27 upstream.
In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9dda2769af upstream.
Some s390 crypto algorithms incorrectly use the crypto_tfm structure to
store private data. As the tfm can be shared among multiple threads, this
can result in data corruption.
This patch fixes aes-xts by moving the xts and pcc parameter blocks from
the tfm onto the stack (48 + 96 bytes).
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 41da8b5adb upstream.
The scatterwalk_crypto_chain function invokes the scatterwalk_sg_chain
function to chain two scatterlists, but the chain pointer indication
bit is not set. When the resulting scatterlist is used, for example,
by sg_nents to count the number of scatterlist entries, a segfault occurs
because sg_nents does not follow the chain pointer to the chained scatterlist.
Update scatterwalk_sg_chain to set the chain pointer indication bit as is
done by the sg_chain function.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fc019c7122 upstream.
When performing an asynchronous ablkcipher operation the authenc
completion callback routine is invoked, but it does not locate and use
the proper IV.
The callback routine, crypto_authenc_encrypt_done, is updated to use
the same method of calculating the address of the IV as is done in
crypto_authenc_encrypt function which sets up the callback.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0fc0287c9e upstream.
Juri hit the below lockdep report:
[ 4.303391] ======================================================
[ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted
[ 4.303395] ------------------------------------------------------
[ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290
[ 4.303417]
[ 4.303417] and this task is already holding:
[ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100
[ 4.303431] which would create a new lock dependency:
[ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
[ 4.303436]
[ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
[ 4.303922] HARDIRQ-ON-W at:
[ 4.303923] [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0
[ 4.303926] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[ 4.303929] [<ffffffff81063dd6>] kthreadd+0x86/0x180
[ 4.303931] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[ 4.303933] SOFTIRQ-ON-W at:
[ 4.303933] [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0
[ 4.303935] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[ 4.303940] [<ffffffff81063dd6>] kthreadd+0x86/0x180
[ 4.303955] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[ 4.303959] INITIAL USE at:
[ 4.303960] [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0
[ 4.303963] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
[ 4.303966] [<ffffffff81063dd6>] kthreadd+0x86/0x180
[ 4.303969] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
[ 4.303972] }
Which reports that we take mems_allowed_seq with interrupts enabled. A
little digging found that this can only be from
cpuset_change_task_nodemask().
This is an actual deadlock because an interrupt doing an allocation will
hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
forever waiting for the write side to complete.
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mel Gorman <mgorman@suse.de>
Reported-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Juri Lelli <juri.lelli@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ae5fbae0cc upstream.
Since commit 110dd8f19d "[SCSI] libsas: fix scr_read/write users and
update the libata documentation" we have been passing pmp=1 and is_cmd=0
to ata_tf_to_fis(). Praveen reports that eSATA attached drives do not
discover correctly. His investigation found that the BIOS was passing
pmp=0 while Linux was passing pmp=1 and failing to discover the drives.
Update libsas to follow the libata example of pulling the pmp setting
from the ata_link and correct is_cmd to be 1 since all tf's submitted
through ->qc_issue are commands. Presumably libsas lldds do not care
about is_cmd as they have sideband mechanisms to perform link
management.
http://marc.info/?l=linux-scsi&m=138179681726990
[jejb: checkpatch fix]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ac01810c9d upstream.
When the system enters suspend, it disables all interrupts in
suspend_device_irqs(), including the interrupts marked EARLY_RESUME.
On the resume side things are different. The EARLY_RESUME interrupts
are reenabled in sys_core_ops->resume and the non EARLY_RESUME
interrupts are reenabled in the normal system resume path.
When suspend_noirq() failed or suspend is aborted for any other
reason, we might omit the resume side call to sys_core_ops->resume()
and therefor the interrupts marked EARLY_RESUME are not reenabled and
stay disabled forever.
To solve this, enable all irqs unconditionally in irq_resume()
regardless whether interrupts marked EARLY_RESUMEhave been already
enabled or not.
This might try to reenable already enabled interrupts in the non
failure case, but the only affected platform is XEN and it has been
confirmed that it does not cause any side effects.
[ tglx: Massaged changelog. ]
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Acked-by-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Pavel Machek <pavel@ucw.cz>
Cc: <ian.campbell@citrix.com>
Cc: <rjw@rjwysocki.net>
Cc: <len.brown@intel.com>
Cc: <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2fea6cd303 upstream.
This patch fixes the issue that the sja1000_interrupt() function may have
returned IRQ_NONE without processing the optional pre_irq() and post_irq()
function before. Further the irq processing counter 'n' is moved to the end of
the while statement to return correct IRQ_[NONE|HANDLED] values at error
conditions.
Reported-by: Wolfgang Grandegger <wg@grandegger.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
[bwh: Backported to 3.2: s/SJA1000_IER/REG_IER/; s/SJA1000_IR/REG_IR/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 1b672224d1 upstream.
As suggested by Minchan Kim and Jerome Marchand "The code in reset_store
get the block device (bdget_disk()) but it does not put it (bdput()) when
it's done using it. The usage count is therefore incremented but never
decremented."
This patch also puts bdput() for all error cases.
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 46a51c8021 upstream.
This patch fixes the bug in reset_store caused by accessing NULL pointer.
The bdev gets its value from bdget_disk() which could fail when memory
pressure is severe and hence can return NULL because allocation of
inode in bdget could fail.
Hence, this patch introduces a check for bdev to prevent reference to a
NULL pointer in the later part of the code. It also removes unnecessary
check of bdev for fsync_bdev().
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a535d81c92 upstream.
The dwc3 UDC driver doesn't implement endpoint wedging correctly.
When an endpoint is wedged, the gadget driver should be allowed to
clear the wedge by calling usb_ep_clear_halt(). Only the host is
prevented from resetting the endpoint.
This patch fixes the implementation.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Pratyush Anand <pratyush.anand@st.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2bac51a182 upstream.
The delayed_status value is used to keep track of status response
packets on ep0. It needs to be reset or the set_config function would
still delay the answer, if the usb device got unplugged while waiting
for setup_continue to be called.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6f6485463a upstream.
Fix race in generic write implementation, which could lead to
temporarily degraded throughput.
The current generic write implementation introduced by commit
27c7acf220 ("USB: serial: reimplement generic fifo-based writes") has
always had this bug, although it's fairly hard to trigger and the
consequences are not likely to be noticed.
Specifically, a write() on one CPU while the completion handler is
running on another could result in only one of the two write urbs being
utilised to empty the remainder of the write fifo (unless there is a
second write() that doesn't race during that time).
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: deleted code is a bit different]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 051a41fa4e upstream.
Multicast frames can't be transmitted as part of an aggregation
session (such a session couldn't even be set up) so don't try to
reorder them. Trying to do so would cause the reorder to stop
working correctly since multicast QoS frames (as transmitted by
the Aruba APs this was found with) would cause sequence number
confusion in the buffer.
Reported-by: Blaise Gassend <blaise@suitabletech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit ec67ad8281 upstream.
In a recent patch:
commit c13f20ac48
Author: Michael Neuling <mikey@neuling.org>
powerpc/signals: Mark VSX not saved with small contexts
We fixed an issue but an improved solution was later discussed after the patch
was merged.
Firstly, this patch doesn't handle the 64bit signals case, which could also hit
this issue (but has never been reported).
Secondly, the original patch isn't clear what MSR VSX should be set to. The
new approach below always clears the MSR VSX bit (to indicate no VSX is in the
context) and sets it only in the specific case where VSX is available (ie. when
VSX has been used and the signal context passed has space to provide the
state).
This reverts the original patch and replaces it with the improved solution. It
also adds a 64 bit version.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 2435dcb98c upstream.
The new IBM Akebono board has a PPC476GTR SoC with an AHCI compliant
SATA controller. This patch adds a compatible property for the new SoC
to the AHCI platform driver.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 76ae281f63 upstream.
A race window in configfs, it starts from one dentry is UNHASHED and end
before configfs_d_iput is called. In this window, if a lookup happen,
since the original dentry was UNHASHED, so a new dentry will be
allocated, and then in configfs_attach_attr(), sd->s_dentry will be
updated to the new dentry. Then in configfs_d_iput(),
BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
sys_open: sys_close:
... fput
dput
dentry_kill
__d_drop <--- dentry unhashed here,
but sd->dentry still point
to this dentry.
lookup_real
configfs_lookup
configfs_attach_attr---> update sd->s_dentry
to new allocated dentry here.
d_kill
configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
triggered here.
To fix it, change configfs_d_iput to not update sd->s_dentry if
sd->s_count > 2, that means there are another dentry is using the sd
beside the one that is going to be put. Use configfs_dirent_lock in
configfs_attach_attr to sync with configfs_d_iput.
With the following steps, you can reproduce the bug.
1. enable ocfs2, this will mount configfs at /sys/kernel/config and
fill configure in it.
2. run the following script.
while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 86784c6bde upstream.
In iSCSI negotiations with initiator CHAP enabled, usernames with
trailing garbage are permitted, because the string comparison only
checks the strlen of the configured username.
e.g. "usernameXXXXX" will be permitted to match "username".
Just check one more byte so the trailing null char is also matched.
Signed-off-by: Eric Seppanen <eric@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 369653e4fb upstream.
extract_param() is called with max_length set to the total size of the
output buffer. It's not safe to allow a parameter length equal to the
buffer size as the terminating null would be written one byte past the
end of the output buffer.
Signed-off-by: Eric Seppanen <eric@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c13f20ac48 upstream.
The VSX MSR bit in the user context indicates if the context contains VSX
state. Currently we set this when the process has touched VSX at any stage.
Unfortunately, if the user has not provided enough space to save the VSX state,
we can't save it but we currently still set the MSR VSX bit.
This patch changes this to clear the MSR VSX bit when the user doesn't provide
enough space. This indicates that there is no valid VSX state in the user
context.
This is needed to support get/set/make/swapcontext for applications that use
VSX but only provide a small context. For example, getcontext in glibc
provides a smaller context since the VSX registers don't need to be saved over
the glibc function call. But since the program calling getcontext may have
used VSX, the kernel currently says the VSX state is valid when it's not. If
the returned context is then used in setcontext (ie. a small context without
VSX but with MSR VSX set), the kernel will refuse the context. This situation
has been reported by the glibc community.
Based on patch from Carlos O'Donell.
Tested-by: Haren Myneni <haren@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e7cc5cf745 upstream.
The pcie_portdrv .probe() method calls pci_enable_device() once, in
pcie_port_device_register(), but the .remove() method calls
pci_disable_device() twice, in pcie_port_device_remove() and in
pcie_portdrv_remove().
That causes a "disabling already-disabled device" warning when removing a
PCIe port device. This happens all the time when removing Thunderbolt
devices, but is also easy to reproduce with, e.g.,
"echo 0000:00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
This patch removes the disable from pcie_portdrv_remove().
[bhelgaas: changelog, tag for stable]
Reported-by: David Bulkow <David.Bulkow@stratus.com>
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit c97cf606e4 upstream.
If the DELEGRETURN errors out with something like NFS4ERR_BAD_STATEID
then there is no recovery possible. Just quit without returning an error.
Also, note that the client must not assume that the NFSv4 lease has been
renewed when it sees an error on DELEGRETURN.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4a82fd7c4e upstream.
When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session
draining is off, but DELEGRETURN can still get a session error.
The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and
the DELEGRETURN done then restarts the RPC task in the prepare state.
With the state manager still processing the NFS4CLNT_DELEGRETURN flag with
session draining off, these DELEGRETURNs will cycle with errors filling up the
session slots.
This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the
NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the
state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task
as seen in this kernel thread dump:
kernel: 4.12.32.53-ma D 0000000000000000 0 3393 2 0x00000000
kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140
kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001
kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058
kernel: Call Trace:
kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc]
kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc]
kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90
kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50
kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs]
kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs]
kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs]
kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs]
kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs]
kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs]
kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30
kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs]
kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs]
kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs]
kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0
kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20
kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
The state manager can not therefore process the DELEGRETURN session errors.
Change the async handler to wait for recovery on session errors.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2:
- Adjust context
- There's no restart_call label]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d617b338bb upstream.
This patch fixes following error (for big kernels):
---8<---
arch/avr32/boot/u-boot/head.o: In function `no_tag_table':
(.init.text+0x44): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o
arch/avr32/kernel/built-in.o: In function `bad_return':
(.ex.text+0x236): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o
--->8---
It comes up when the kernel increases and 'panic()' is too far away to fit in
the +/- 2MiB range. Which in turn issues from the 21-bit displacement in
'br{cond4}' mnemonic which is one of the two ways to do jumps (rjmp has just
10-bit displacement and therefore a way smaller range). This fact was stated
before in 8d29b7b9f8.
One solution to solve this is to add a local storage for the symbol address
and just load the $pc with that value.
Signed-off-by: Andreas Bießmann <andreas@biessmann.de>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7a2a74f4b8 upstream.
Before the CRT was (fully) set up in kernel_entry (bss cleared before in
_start, but also not before jump to panic() in no_tag_table case).
This patch fixes this up to have a fully working CRT when branching to panic()
in no_tag_table.
Signed-off-by: Andreas Bießmann <andreas@biessmann.de>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 365da4adeb upstream.
This fixes a regression from 247500820e
"nfsd4: fix decoding of compounds across page boundaries". The previous
code was correct: argp->pagelist is initialized in
nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a
pointer to the page *after* the page we are currently decoding.
The reason that patch nevertheless fixed a problem with decoding
compounds containing write was a bug in the write decoding introduced by
5a80a54d21 "nfsd4: reorganize write
decoding", after which write decoding no longer adhered to the rule that
argp->pagelist point to the next page.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: adjust context; there is only one instance to fix]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 987da47910 upstream.
Use a straight goto error label style in nfsd_setattr to make sure
we always do the put_write_access call after we got it earlier.
Note that the we have been failing to do that in the case
nfsd_break_lease() returns an error, a bug introduced into 2.6.38 with
6a76bebefe "nfsd4: break lease on nfsd
setattr".
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: notify_change() takes only 2 arguments]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 818e5a22e9 upstream.
Split out two helpers to make the code more readable and easier to verify
for correctness.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
[bwh: Backported to 3.2: s/umode_t/int/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 718822c1c1 upstream.
The dm-delay target uses a shared workqueue for multiple instances. This
can cause deadlock if two or more dm-delay targets are stacked on the top
of each other.
This patch changes dm-delay to use a per-instance workqueue.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b1d9335642 upstream.
setfacl over cifs mounts can remove the default ACL when setting the
(non-default part of) the ACL and vice versa (we were leaving at 0
rather than setting to -1 the count field for the unaffected
half of the ACL. For example notice the setfacl removed
the default ACL in this sequence:
steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir ; setfacl
-m default:user:test:rwx,user:test:rwx /mnt/test-dir
getfacl: Removing leading '/' from absolute path names
user::rwx
group::r-x
other::r-x
default:user::rwx
default:user:test:rwx
default:group::r-x
default😷:rwx
default:other::r-x
steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir
getfacl: Removing leading '/' from absolute path names
user::rwx
user:test:rwx
group::r-x
mask::rwx
other::r-x
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Jeremy Allison <jra@samba.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 97b6ff6be9 upstream.
GPU with low amount of ram can fails at pinning new framebuffer before
unpinning old one. On such failure, retry with unpinning old one before
pinning new one allowing to work around the issue. This is somewhat
ugly but only affect those old GPU we care about.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit eafbdde9c5 upstream.
This driver uses a number of macros to get and set various fields in the
RX and TX descriptors. To work correctly, a u8 pointer to the descriptor
must be used; however, in some cases a descriptor structure pointer is used
instead. In addition, a duplicated statement is removed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b2ea8ef559 upstream.
Apparently they need the same treatment as primary planes. This fixes
modesetting failures because of stuck cursors (!) on Thomas' i830M
machine.
I've figured while at it I'll also roll it out for the ivb 3 pipe
version of this function. I didn't do this for i845/i865 since Bspec
says the update mechanism works differently, and there's some
additional rules about what can be updated in which order.
Tested-by: Thomas Richter <thor@math.tu-berlin.de>
Cc: Thomas Richter <thor@math.tu-berlin.de>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e41fae2b1e upstream.
Bit 2 of status register 2 on MAX6696 (external diode 2 open)
sets ALERT; the bit thus has to be listed in alert_alarms.
Also display a message in the alert handler if the condition
is encountered.
Even though not all overtemperature conditions cause ALERT
to be set, we should not ignore them in the alert handler.
Display messages for all out-of-range conditions.
Reported-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 30aeadd44d upstream.
This turns on the internal integrator LCD display(s). It seems that the code
to do this got lost in refactoring of the CLCD driver.
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 6408eac266 upstream.
The current code may access to the already freed object. The input
device must be accessed and unregistered before freeing the top level
sound object.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4e9b45a192 upstream.
On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg(). So a long might very well contain a
positive value but when truncated to an int it would become negative.
That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction. 2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB. That almost instantly results in a
system crash or reset.
,-[ Reproducer (needs to be run as root) ]--
| #include <sys/stat.h>
| #include <sys/msg.h>
| #include <unistd.h>
| #include <fcntl.h>
|
| int main(void) {
| long msg = 1;
| int fd;
|
| fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
| write(fd, "-1", 2);
| close(fd);
|
| msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
|
| return 0;
| }
'---
Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length. This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.
Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e. signed vs.
unsigned checks. It should never become negative under normal
circumstances, though.
Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented. As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.
Hardening mechanisms for user copy operations would have catched that bug
early -- e.g. checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does. Or, for that matter, detect the
long vs. int sign change due to truncation, as the size overflow plugin
of the very same patch does.
[akpm@linux-foundation.org: fix i386 min() warnings]
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Drop changes to alloc_msg() and copy_msg(), which don't exist]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 66da0e1f90 upstream.
When devpts is unmounted, there may be a no-longer-used IDR tree hanging
off the superblock we are about to kill. This needs to be cleaned up
before destroying the SB.
The leak is usually not a big deal because unmounting devpts is typically
done when shutting down the whole machine. However, shutting down an LXC
container instead of a physical machine exposes the problem (the garbage
is detectable with kmemleak).
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d049f74f2d upstream.
The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.
Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }
Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }
Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)
The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.
CVE-2013-2929
Reported-by: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 312b4e2269 upstream.
Some setuid binaries will allow reading of files which have read
permission by the real user id. This is problematic with files which
use %pK because the file access permission is checked at open() time,
but the kptr_restrict setting is checked at read() time. If a setuid
binary opens a %pK file as an unprivileged user, and then elevates
permissions before reading the file, then kernel pointer values may be
leaked.
This happens for example with the setuid pppd application on Ubuntu 12.04:
$ head -1 /proc/kallsyms
00000000 T startup_32
$ pppd file /proc/kallsyms
pppd: In file /proc/kallsyms: unrecognized option 'c1000000'
This will only leak the pointer value from the first line, but other
setuid binaries may leak more information.
Fix this by adding a check that in addition to the current process having
CAP_SYSLOG, that effective user and group ids are equal to the real ids.
If a setuid binary reads the contents of a file which uses %pK then the
pointer values will be printed as NULL if the real user is unprivileged.
Update the sysctl documentation to reflect the changes, and also correct
the documentation to state the kptr_restrict=0 is the default.
This is a only temporary solution to the issue. The correct solution is
to do the permission check at open() time on files, and to replace %pK
with a function which checks the open() time permission. %pK uses in
printk should be removed since no sane permission check can be done, and
instead protected by using dmesg_restrict.
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joe Perches <joe@perches.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2:
- Adjust context
- Compare ids directly instead of using {uid,gid}_eq()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 72a0c55713 upstream.
On cris arch, the functions below aren't defined:
drivers/media/platform/sh_veu.c: In function 'sh_veu_reg_read':
drivers/media/platform/sh_veu.c:228:2: error: implicit declaration of function 'ioread32' [-Werror=implicit-function-declaration]
drivers/media/platform/sh_veu.c: In function 'sh_veu_reg_write':
drivers/media/platform/sh_veu.c:234:2: error: implicit declaration of function 'iowrite32' [-Werror=implicit-function-declaration]
drivers/media/platform/vsp1/vsp1.h: In function 'vsp1_read':
drivers/media/platform/vsp1/vsp1.h:66:2: error: implicit declaration of function 'ioread32' [-Werror=implicit-function-declaration]
drivers/media/platform/vsp1/vsp1.h: In function 'vsp1_write':
drivers/media/platform/vsp1/vsp1.h:71:2: error: implicit declaration of function 'iowrite32' [-Werror=implicit-function-declaration]
drivers/media/platform/vsp1/vsp1.h: In function 'vsp1_read':
drivers/media/platform/vsp1/vsp1.h:66:2: error: implicit declaration of function 'ioread32' [-Werror=implicit-function-declaration]
drivers/media/platform/vsp1/vsp1.h: In function 'vsp1_write':
drivers/media/platform/vsp1/vsp1.h:71:2: error: implicit declaration of function 'iowrite32' [-Werror=implicit-function-declaration]
drivers/media/platform/soc_camera/rcar_vin.c: In function 'rcar_vin_setup':
drivers/media/platform/soc_camera/rcar_vin.c:284:3: error: implicit declaration of function 'iowrite32' [-Werror=implicit-function-declaration]
drivers/media/platform/soc_camera/rcar_vin.c: In function 'rcar_vin_request_capture_stop':
drivers/media/platform/soc_camera/rcar_vin.c:353:2: error: implicit declaration of function 'ioread32' [-Werror=implicit-function-declaration]
Yet, they're available, as CONFIG_GENERIC_IOMAP is defined. What happens
is that asm/io.h was not including asm-generic/iomap.h.
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 11f918d3e2 upstream.
Do it the same way as done in microcode_intel.c: use pr_debug()
for missing firmware files.
There seem to be CPUs out there for which no microcode update
has been submitted to kernel-firmware repo yet resulting in
scary sounding error messages in dmesg:
microcode: failed to load file amd-ucode/microcode_amd_fam16h.bin
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1384274383-43510-1-git-send-email-trenn@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 092f9cd16a upstream.
msnd_pinnacle.c is used for both snd-msnd-pinnacle and
snd-msnd-classic drivers, and both should have different driver
names. Using the same driver name results in the sysfs warning for
duplicated entries like
kobject: 'msnd-pinnacle.7' (cec33408): kobject_release, parent (null) (delayed)
kobject: 'msnd-pinnacle' (cecd4980): kobject_release, parent cf3ad9b0 (delayed)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:486 sysfs_warn_dup+0x7d/0xa0()
sysfs: cannot create duplicate filename '/bus/isa/drivers/msnd-pinnacle'
......
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8e3ffa4710 upstream.
Userspace uses the netdev devtype for stuff like device naming and type
detection. Be nice and set it. Remove the pointless #if/#endif around
SET_NETDEV_DEV too.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7b3d2fb920 upstream.
[1] The gpmi uses the nand_command_lp to issue the commands to NAND chips.
The gpmi issues a DMA operation with gpmi_cmd_ctrl when it handles
a NAND_CMD_NONE control command. So when we read a page(NAND_CMD_READ0)
from the NAND, we may send two DMA operations back-to-back.
If we do not serialize the two DMA operations, we will meet a bug when
1.1) we enable CONFIG_DMA_API_DEBUG, CONFIG_DMADEVICES_DEBUG,
and CONFIG_DEBUG_SG.
1.2) Use the following commands in an UART console and a SSH console:
cmd 1: while true;do dd if=/dev/mtd0 of=/dev/null;done
cmd 1: while true;do dd if=/dev/mmcblk0 of=/dev/null;done
The kernel log shows below:
-----------------------------------------------------------------
kernel BUG at lib/scatterlist.c:28!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
.........................
[<80044a0c>] (__bug+0x18/0x24) from [<80249b74>] (sg_next+0x48/0x4c)
[<80249b74>] (sg_next+0x48/0x4c) from [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4)
[<80255398>] (debug_dma_unmap_sg+0x170/0x1a4) from [<8004af58>] (dma_unmap_sg+0x14/0x6c)
[<8004af58>] (dma_unmap_sg+0x14/0x6c) from [<8027e594>] (mxs_dma_tasklet+0x18/0x1c)
[<8027e594>] (mxs_dma_tasklet+0x18/0x1c) from [<8007d444>] (tasklet_action+0x114/0x164)
-----------------------------------------------------------------
1.3) Assume the two DMA operations is X (first) and Y (second).
The root cause of the bug:
Assume process P issues DMA X, and sleep on the completion
@this->dma_done. X's tasklet callback is dma_irq_callback. It firstly
wake up the process sleeping on the completion @this->dma_done,
and then trid to unmap the scatterlist S. The waked process P will
issue Y in another ARM core. Y initializes S->sg_magic to zero
with sg_init_one(), while dma_irq_callback is unmapping S at the same
time.
See the diagram:
ARM core 0 | ARM core 1
-------------------------------------------------------------
(P issues DMA X, then sleep) --> |
|
(X's tasklet wakes P) --> |
|
| <-- (P begin to issue DMA Y)
|
(X's tasklet unmap the |
scatterlist S with dma_unmap_sg) --> | <-- (Y calls sg_init_one() to init
| scatterlist S)
|
[2] This patch serialize both the X and Y in the following way:
Unmap the DMA scatterlist S firstly, and wake up the process at the end
of the DMA callback, in such a way, Y will be executed after X.
After this patch:
ARM core 0 | ARM core 1
-------------------------------------------------------------
(P issues DMA X, then sleep) --> |
|
(X's tasklet unmap the |
scatterlist S with dma_unmap_sg) --> |
|
(X's tasklet wakes P) --> |
|
| <-- (P begin to issue DMA Y)
|
| <-- (Y calls sg_init_one() to init
| scatterlist S)
|
Signed-off-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit d03b4aa77e upstream.
While receiving a packet on SDIO interface, we allocate skb with
size multiple of SDIO block size. We need to resize this skb
after RX using packet length from RX header.
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3545f3d5f4 upstream.
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 78dbfecb95 upstream.
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b4ade79766 upstream.
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=63881.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Matthieu Baerts <matttbe@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0c5d63f0ab upstream.
All of the rtlwifi drivers have an error in the routine that tests if
the data is "special". If it is, the subsequant transmission will be
at the lowest rate to enhance reliability. The 16-bit quantity is
big-endian, but was being extracted in native CPU mode. One of the
effects of this bug is to inhibit association under some conditions
as the TX rate is too high.
Based on suggestions by Joe Perches, the entire routine is rewritten.
One of the local headers contained duplicates of some of the ETH_P_XXX
definitions. These are deleted.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.2: adjust context; use rtl_lps_leave()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3aef7dde8d upstream.
There is a typo in the struct member name on assignment when checking
rtlphy->current_chan_bw == HT_CHANNEL_WIDTH_20_40, the check uses pwrgroup_ht40
for bound limit and uses pwrgroup_ht20 when assigning instead.
Signed-off-by: Felipe Pena <felipensp@gmail.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 603e772992 upstream.
qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all. Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.
So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.
This deadlock has actually been observed in the wild when the node
is under memory pressure.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
[bwh: Backported to 3.2:
- Adjust context
- Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4adcf7fb67 upstream.
ipath_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
ipath_user_sdma_pin_pages() we don't seem to need mmap_sem at all.
Even more interestingly the function ipath_user_sdma_queue_pkts() (and
also ipath_user_sdma_coalesce() called somewhat later) call
copy_from_user() which can hit a page fault and we deadlock on trying
to get mmap_sem when handling that fault. So just make
ipath_user_sdma_pin_pages() use get_user_pages_fast() and leave
mmap_sem locking for mm.
This deadlock has actually been observed in the wild when the node
is under memory pressure.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
[ Merged in fix for call to get_user_pages_fast from Tetsuo Handa
<penguin-kernel@I-love.SAKURA.ne.jp>. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6b31d18b0 upstream.
The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.
1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
and so the RPC client times out.
3) The client issues a second sendpage of the page data as part of
an RPC call retransmission.
4) The reply to the first transmission arrives from the server
_before_ the client hardware has emptied the TCP socket send
buffer.
5) After processing the reply, the RPC state machine rules that
the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
pages to store something else (e.g. a new write).
7) The client NIC drains the TCP socket send buffer. Since the
page data has now changed, it reads a corrupted version of the
initial RPC call, and puts it on the wire.
This patch fixes the problem in the following manner:
The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4912aa6c11 upstream.
crocode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp ioatdma dca be2net sg ses enclosure ext4 mbcache jbd2 sd_mod crc_t10dif ahci megaraid_sas(U) dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 491, comm: scsi_eh_0 Tainted: G W ---------------- 2.6.32-220.13.1.el6.x86_64 #1 IBM -[8722PAX]-/00D1461
RIP: 0010:[<ffffffff8124e424>] [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
RSP: 0018:ffff881057eefd60 EFLAGS: 00010012
RAX: ffff881d99e3e8a8 RBX: ffff881d99e3e780 RCX: ffff881d99e3e8a8
RDX: ffff881d99e3e8a8 RSI: ffff881d99e3e780 RDI: ffff881d99e3e780
RBP: ffff881057eefd80 R08: ffff881057eefe90 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff881057f92338
R13: 0000000000000000 R14: ffff881057f92338 R15: ffff883058188000
FS: 0000000000000000(0000) GS:ffff880040200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d3ec0 CR3: 000000302cd7d000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_0 (pid: 491, threadinfo ffff881057eee000, task ffff881057e29540)
Stack:
0000000000001057 0000000000000286 ffff8810275efdc0 ffff881057f16000
<0> ffff881057eefdd0 ffffffff81362323 ffff881057eefe20 ffffffff8135f393
<0> ffff881057e29af8 ffff8810275efdc0 ffff881057eefe78 ffff881057eefe90
Call Trace:
[<ffffffff81362323>] __scsi_queue_insert+0xa3/0x150
[<ffffffff8135f393>] ? scsi_eh_ready_devs+0x5e3/0x850
[<ffffffff81362a23>] scsi_queue_insert+0x13/0x20
[<ffffffff8135e4d4>] scsi_eh_flush_done_q+0x104/0x160
[<ffffffff8135fb6b>] scsi_error_handler+0x35b/0x660
[<ffffffff8135f810>] ? scsi_error_handler+0x0/0x660
[<ffffffff810908c6>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff81090830>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 00 00 eb d1 4c 8b 2d 3c 8f 97 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00
RIP [<ffffffff8124e424>] blk_requeue_request+0x94/0xa0
RSP <ffff881057eefd60>
The RIP is this line:
BUG_ON(blk_queued_rq(rq));
After digging through the code, I think there may be a race between the
request completion and the timer handler running.
A timer is started for each request put on the device's queue (see
blk_start_request->blk_add_timer). If the request does not complete
before the timer expires, the timer handler (blk_rq_timed_out_timer)
will mark the request complete atomically:
static inline int blk_mark_rq_complete(struct request *rq)
{
return test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
}
and then call blk_rq_timed_out. The latter function will call
scsi_times_out, which will return one of BLK_EH_HANDLED,
BLK_EH_RESET_TIMER or BLK_EH_NOT_HANDLED. If BLK_EH_RESET_TIMER is
returned, blk_clear_rq_complete is called, and blk_add_timer is again
called to simply wait longer for the request to complete.
Now, if the request happens to complete while this is going on, what
happens? Given that we know the completion handler will bail if it
finds the REQ_ATOM_COMPLETE bit set, we need to focus on the completion
handler running after that bit is cleared. So, from the above
paragraph, after the call to blk_clear_rq_complete. If the completion
sets REQ_ATOM_COMPLETE before the BUG_ON in blk_add_timer, we go boom
there (I haven't seen this in the cores). Next, if we get the
completion before the call to list_add_tail, then the timer will
eventually fire for an old req, which may either be freed or reallocated
(there is evidence that this might be the case). Finally, if the
completion comes in *after* the addition to the timeout list, I think
it's harmless. The request will be removed from the timeout list,
req_atom_complete will be set, and all will be well.
This will only actually explain the coredumps *IF* the request
structure was freed, reallocated *and* queued before the error handler
thread had a chance to process it. That is possible, but it may make
sense to keep digging for another race. I think that if this is what
was happening, we would see other instances of this problem showing up
as null pointer or garbage pointer dereferences, for example when the
request structure was not re-used. It looks like we actually do run
into that situation in other reports.
This patch moves the BUG_ON(test_bit(REQ_ATOM_COMPLETE,
&req->atomic_flags)); from blk_add_timer to the only caller that could
trip over it (blk_start_request). It then inverts the calls to
blk_clear_rq_complete and blk_add_timer in blk_rq_timed_out to address
the race. I've boot tested this patch, but nothing more.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 522e664644 upstream.
In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.
To quiet external interrupts, disable I/O APIC before shutdown local APIC.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com
[ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 778d226a14 upstream.
This patch fixes two memory errors:
1. During a probe failure (in mtd_device_parse_register?) the command
buffer would not be freed.
2. The command buffer's size is determined based on the 'fast_read'
boolean, but the assignment of fast_read is made after this
allocation. Thus, the buffer may be allocated "too small".
To fix the first, just switch to the devres version of kzalloc.
To fix the second, increase MAX_CMD_SIZE unconditionally. It's not worth
saving a byte to fiddle around with the conditions here.
This problem was reported by Yuhang Wang a while back.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reported-by: Yuhang Wang <wangyuhang2014@gmail.com>
Reviewed-by: Sourav Poddar <sourav.poddar@ti.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a4d62babf9 upstream.
Hardware:
CPU: XLP832,the 64-bit OS
NOR Flash:S29GL128S 128M
Software:
Kernel:2.6.32.41
Filesystem:JFFS2
When writing files, errors appear:
Write len 182 but return retlen 180
Write of 182 bytes at 0x072c815c failed. returned -5, retlen 180
Write len 186 but return retlen 184
Write of 186 bytes at 0x072caff4 failed. returned -5, retlen 184
These errors exist only in 64-bit systems,not in 32-bit systems. After analysis, we
found that the left shift operation is wrong in map_word_load_partial. For instance:
unsigned char buf[3] ={0x9e,0x3a,0xea};
map_bankwidth(map) is 4;
for (i=0; i < 3; i++) {
int bitpos;
bitpos = (map_bankwidth(map)-1-i)*8;
orig.x[0] &= ~(0xff << bitpos);
orig.x[0] |= buf[i] << bitpos;
}
The value of orig.x[0] is expected to be 0x9e3aeaff, but in this situation(64-bit
System) we'll get the wrong value of 0xffffffff9e3aeaff due to the 64-bit sign
extension:
buf[i] is defined as "unsigned char" and the left-shift operation will convert it
to the type of "signed int", so when left-shift buf[i] by 24 bits, the final result
will get the wrong value: 0xffffffff9e3aeaff.
If the left-shift bits are less than 24, then sign extension will not occur. Whereas
the bankwidth of the nor flash we used is 4, therefore this BUG emerges.
Signed-off-by: Pang Xunlei <pang.xunlei@zte.com.cn>
Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Signed-off-by: Lu Zhongjun <lu.zhongjun@zte.com.cn>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4355b70cf4 upstream.
Some bright specification writers decided to write this in the ONFI spec
(from ONFI 3.0, Section 3.1):
"The number of blocks and number of pages per block is not required to
be a power of two. In the case where one of these values is not a
power of two, the corresponding address shall be rounded to an
integral number of bits such that it addresses a range up to the
subsequent power of two value. The host shall not access upper
addresses in a range that is shown as not supported."
This breaks every assumption MTD makes about NAND block/chip-size
dimensions -- they *must* be a power of two!
And of course, an enterprising manufacturer has made use of this lovely
freedom. Exhibit A: Micron MT29F32G08CBADAWP
"- Plane size: 2 planes x 1064 blocks per plane
- Device size: 32Gb: 2128 blockss [sic]"
This quickly hits a BUG() in nand_base.c, since the extra dimensions
overflow so we think it's a second chip (on my single-chip setup):
ONFI param page 0 valid
ONFI flash detected
NAND device: Manufacturer ID: 0x2c, Chip ID: 0x44 (Micron MT29F32G08CBADAWP), 4256MiB, page size: 8192, OOB size: 744
------------[ cut here ]------------
kernel BUG at drivers/mtd/nand/nand_base.c:203!
Internal error: Oops - BUG: 0 [#1] SMP ARM
[... trim ...]
[<c02cf3e4>] (nand_select_chip+0x18/0x2c) from [<c02d25c0>] (nand_do_read_ops+0x90/0x424)
[<c02d25c0>] (nand_do_read_ops+0x90/0x424) from [<c02d2dd8>] (nand_read+0x54/0x78)
[<c02d2dd8>] (nand_read+0x54/0x78) from [<c02ad2c8>] (mtd_read+0x84/0xbc)
[<c02ad2c8>] (mtd_read+0x84/0xbc) from [<c02d4b28>] (scan_read.clone.4+0x4c/0x64)
[<c02d4b28>] (scan_read.clone.4+0x4c/0x64) from [<c02d4c88>] (search_bbt+0x148/0x290)
[<c02d4c88>] (search_bbt+0x148/0x290) from [<c02d4ea4>] (nand_scan_bbt+0xd4/0x5c0)
[... trim ...]
---[ end trace 0c9363860d865ff2 ]---
So to fix this, just truncate these dimensions down to the greatest
power-of-2 dimension that is less than or equal to the specified
dimension.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[bwh: Backported to 3.2:
- Adjust context
- p->lun_count is not used]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 8f42d76987 upstream.
It's a superset of the existing CX2075x codecs, so we can reuse the
existing parser code.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fd432b9f8c upstream.
When system has a lot of highmem (e.g. 16GiB using a 32 bits kernel),
the code to calculate how much memory we need to preallocate in
normal zone may cause overflow. As Leon has analysed:
It looks that during computing 'alloc' variable there is overflow:
alloc = (3943404 - 1970542) - 1978280 = -5418 (signed)
And this function goes to err_out.
Fix this by avoiding that overflow.
References: https://bugzilla.kernel.org/show_bug.cgi?id=60817
Reported-and-tested-by: Leon Drugi <eyak@wp.pl>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 59c8e66378 upstream.
Also check the busy placements before deciding to move a buffer object.
Failing to do this may result in a completely unneccessary move within a
single memory type.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 4d8fe7376a upstream.
Using the nlmsg_len member of the netlink header to test if the message
is valid is wrong as it includes the size of the netlink header itself.
Thereby allowing to send short netlink messages that pass those checks.
Use nlmsg_len() instead to test for the right message length. The result
of nlmsg_len() is guaranteed to be non-negative as the netlink message
already passed the checks of nlmsg_ok().
Also switch to min_t() to please checkpatch.pl.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
[bwh: Backported to 3.2: there aren't any optional fields for AUDIT_TTY_SET
so adjust the size test similarly as for AUDIT_SET]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 0868a5e150 upstream.
When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.
AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").
When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.
It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f262f0f5ca upstream.
The cbc-aes-s390 algorithm incorrectly places the IV in the tfm
data structure. As the tfm is shared between multiple threads,
this introduces a possibility of data corruption.
This patch fixes this by moving the parameter block containing
the IV and key onto the stack (the block is 48 bytes long).
The same bug exists elsewhere in the s390 crypto system and they
will be fixed in subsequent patches.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 27ef63c7e9 upstream.
When determining the page size we could use to map with the IOMMU, the
page size should also be aligned with the hva, not just the gfn. The
gfn may not reflect the real alignment within the hugetlbfs file.
Most of the time, this works fine. However, if the hugetlbfs file is
backed by non-contiguous huge pages, a multi-huge page memslot starts at
an unaligned offset within the hugetlbfs file, and the gfn is aligned
with respect to the huge page size, kvm_host_page_size() will return the
huge page size and we will use that to map with the IOMMU.
When we later unpin that same memslot, the IOMMU returns the unmap size
as the huge page size, and we happily unpin that many pfns in
monotonically increasing order, not realizing we are spanning
non-contiguous huge pages and partially unpin the wrong huge page.
Ensure the IOMMU mapping page size is aligned with the hva corresponding
to the gfn, which does reflect the alignment within the hugetlbfs file.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Edwards <gedwards@ddn.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
[bwh: Backported to 3.2: s/__gfn_to_hva_memslot/gfn_to_hva_memslot/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7b5bfb8288 upstream.
If you record the sound during playback,
the playback sound becomes silent.
Modify so that the codec driver does not clear
SG_SL1::DACL bit which is controlled under widget
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 954a73d5d3 upstream.
Whenever multipath_dtr() is happening we must prevent queueing any
further path activation work. Implement this by adding a new
'pg_init_disabled' flag to the multipath structure that denotes future
path activation work should be skipped if it is set. By disabling
pg_init and then re-enabling in flush_multipath_work() we also avoid the
potential for pg_init to be initiated while suspending an mpath device.
Without this patch a race condition exists that may result in a kernel
panic:
1) If after pg_init_done() decrements pg_init_in_progress to 0, a call
to wait_for_pg_init_completion() assumes there are no more pending path
management commands.
2) If pg_init_required is set by pg_init_done(), due to retryable
mode_select errors, then process_queued_ios() will again queue the
path activation work.
3) If free_multipath() completes before activate_path() work is called a
NULL pointer dereference like the following can be seen when
accessing members of the recently destructed multipath:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
[<ffffffff81090ac0>] worker_thread+0x170/0x2a0
[<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
[switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer]
Signed-off-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com>
Reviewed-by: Krishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com>
Tested-by: Speagle Andy <Andy.Speagle@netapp.com>
Acked-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
[bwh: Backported to 3.2:
- Adjust context
- Bump version to 1.3.2 not 1.6.0]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 5d0f801a2c upstream.
If we handle end of block messages with higher priority than a lost message,
we can run into an endless interrupt loop.
This is reproducable with a am335x processor and "cansequence -r" at 1Mbit.
As soon as we loose a packet we can't escape from an interrupt loop.
This patch fixes the problem by handling lost packets before EOB packets.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f36afb3957 upstream.
dm-mpath and dm-thin must process messages even if some device is
suspended, so we allocate argv buffer with GFP_NOIO. These messages have
a small fixed number of arguments.
On the other hand, dm-switch needs to process bulk data using messages
so excessive use of GFP_NOIO could cause trouble.
The patch also lowers the default number of arguments from 64 to 8, so
that there is smaller load on GFP_NOIO allocations.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e82b89a6f1 upstream.
modalias_show() should return an empty string on error, not -ENODEV.
This causes the following false and annoying error:
> find /sys/devices -name modalias -print0 | xargs -0 cat >/dev/null
cat: /sys/devices/vio/4000/modalias: No such device
cat: /sys/devices/vio/4001/modalias: No such device
cat: /sys/devices/vio/4002/modalias: No such device
cat: /sys/devices/vio/4004/modalias: No such device
cat: /sys/devices/vio/modalias: No such device
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 7b6bc07ab5 upstream.
For isochronous endpoints, set the RPIPE wMaxPacketSize value using
wOverTheAirPacketSize from the endpoint companion descriptor instead of
wMaxPacketSize from the normal endpoint descriptor.
Signed-off-by: Thomas Pugliese <thomas.pugliese@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit f3964fe1c9 upstream.
The CS2 region contains the Assabet board configuration and status
registers, which are 32-bit. Unfortunately, some boot loaders do not
configure this region correctly, leaving it setup as a 16-bit region.
Fix this.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 9b389a8a02 upstream.
The probe code of snd-usb-6fire driver overrides the devices[] pointer
wrongly without checking whether it's already occupied or not. This
would screw up the device disconnection later.
Spotted by coverity CID 141423.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 89dafa20f3 upstream.
Tested with Marvell 88se9125, attached with one port mulitplier(5 ports)
and one disk, we will get following boot log messages if using current
code:
ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 330)
ata8.15: Port Multiplier 1.2, 0x1b4b:0x9715 r160, 5 ports, feat 0x1/0x1f
ahci 0000:03:00.0: FBS is enabled
ata8.00: hard resetting link
ata8.00: SATA link down (SStatus 0 SControl 330)
ata8.01: hard resetting link
ata8.01: SATA link down (SStatus 0 SControl 330)
ata8.02: hard resetting link
ata8.02: SATA link down (SStatus 0 SControl 330)
ata8.03: hard resetting link
ata8.03: SATA link up 6.0 Gbps (SStatus 133 SControl 133)
ata8.04: hard resetting link
ata8.04: failed to resume link (SControl 133)
ata8.04: failed to read SCR 0 (Emask=0x40)
ata8.04: failed to read SCR 0 (Emask=0x40)
ata8.04: failed to read SCR 1 (Emask=0x40)
ata8.04: failed to read SCR 0 (Emask=0x40)
ata8.03: native sectors (2) is smaller than sectors (976773168)
ata8.03: ATA-8: ST3500413AS, JC4B, max UDMA/133
ata8.03: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata8.03: configured for UDMA/133
ata8.04: failed to IDENTIFY (I/O error, err_mask=0x100)
ata8.15: hard resetting link
ata8.15: SATA link up 6.0 Gbps (SStatus 133 SControl 330)
ata8.15: Port Multiplier vendor mismatch '0x1b4b' != '0x133'
ata8.15: PMP revalidation failed (errno=-19)
ata8.15: hard resetting link
ata8.15: SATA link up 6.0 Gbps (SStatus 133 SControl 330)
ata8.15: Port Multiplier vendor mismatch '0x1b4b' != '0x133'
ata8.15: PMP revalidation failed (errno=-19)
ata8.15: limiting SATA link speed to 3.0 Gbps
ata8.15: hard resetting link
ata8.15: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
ata8.15: Port Multiplier vendor mismatch '0x1b4b' != '0x133'
ata8.15: PMP revalidation failed (errno=-19)
ata8.15: failed to recover PMP after 5 tries, giving up
ata8.15: Port Multiplier detaching
ata8.03: disabled
ata8.00: disabled
ata8: EH complete
The reason is that current detection code doesn't follow AHCI spec:
First,the port multiplier detection process look like this:
ahci_hardreset(link, class, deadline)
if (class == ATA_DEV_PMP) {
sata_pmp_attach(dev) /* will enable FBS */
sata_pmp_init_links(ap, nr_ports);
ata_for_each_link(link, ap, EDGE) {
sata_std_hardreset(link, class, deadline);
if (link_is_online) /* do soft reset */
ahci_softreset(link, class, deadline);
}
}
But, according to chapter 9.3.9 in AHCI spec: Prior to issuing software
reset, software shall clear PxCMD.ST to '0' and then clear PxFBS.EN to
'0'.
The patch test ok with kernel 3.11.1.
tj: Patch white space contaminated, applied manually with trivial
updates.
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 3e85c3ecbc upstream.
6.0 Gbps link speed was not decoded properly:
speed was reported at 3.0 Gbps only.
Tested: On a machine where libata reports 6.0 Gbps in
/var/log/messages:
ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Before:
cat /sys/class/ata_link/link1/sata_spd
3.0 Gbps
After:
cat /sys/class/ata_link/link1/sata_spd
6.0 Gbps
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 98d6f4dd84 upstream.
Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide
on ARM. (http://bugs.ruby-lang.org/issues/9008)
Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no
RTC device is present) intruduced to return ENOTSUPP when
clock_get{time,res} can't find a RTC device. However this is incorrect.
First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the
closest userland equivlents).
Second, Posix and Linux man pages agree that clock_gettime and
clock_getres should return EINVAL if clk_id argument is invalid.
While the arugment that the clockid is valid, but just not supported
on this hardware could be made, this is just a technicality that
doesn't help userspace applicaitons, and only complicates error
handling.
Thus, this patch changes the code to use EINVAL.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: Vit Ondruch <v.ondruch@tiscali.cz>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[jstultz: Tweaks to commit message to include full rational]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit e92aee3308 upstream.
This patch adds the Port Reset Change flag to the set of bits that are
preemptively cleared on init/resume of a hub. In theory this bit should
never be set unexpectedly... in practice it can still happen if BIOS,
SMM or ACPI code plays around with USB devices without cleaning up
correctly. This is especially dangerous for XHCI root hubs, which don't
generate any more Port Status Change Events until all change bits are
cleared, so this is a good precaution to have (similar to how it's
already done for the Warm Port Reset Change flag).
Signed-off-by: Julius Werner <jwerner@chromium.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2:
- Adjust context
- s/usb_clear_port_feature/clear_port_feature/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit dcc01c0864 upstream.
Before the USB core resets a device, we need to disable the L1 timeout
for the roothub, if USB 2.0 Link PM is enabled. Otherwise the port may
transition into L1 in between descriptor fetches, before we know if the
USB device descriptors changed. LPM will be re-enabled after the
full device descriptors are fetched, and we can confirm the device still
supports USB 2.0 LPM after the reset.
We don't need to wait for the USB device to exit L1 before resetting the
device, since the xHCI roothub port diagrams show a transition to the
Reset state from any of the Ux states (see Figure 34 in the 2012-08-14
xHCI specification update).
This patch should be backported to kernels as old as 3.2, that contain
the commit 65580b4321 "xHCI: set USB2
hardware LPM". That was the first commit to enable USB 2.0
hardware-driven Link Power Management.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit a6f951ddbd upstream.
In nfs4_proc_getlk(), when some error causes a retry of the call to
_nfs4_proc_getlk(), we can end up with Oopses of the form
BUG: unable to handle kernel NULL pointer dereference at 0000000000000134
IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30
<snip>
Call Trace:
[<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70
[<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4]
[<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
[<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4]
[<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4]
The problem is that we don't clear the request->fl_ops after the first
try and so when we retry, nfs4_set_lock_state() exits early without
setting the lock stateid.
Regression introduced by commit 70cc6487a4
(locks: make ->lock release private data before returning in GETLK case)
Reported-by: Weston Andros Adamson <dros@netapp.com>
Reported-by: Jorge Mora <mora@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.