Patch for -stable. Function find_early_table_space removed upstream.
Fixes panic in alloc_low_page due to pgt_buf overflow during
init_memory_mapping.
find_early_table_space sizes pgt_buf based upon the size of the
memory being mapped, but it does not take into account the alignment
of the memory. When the region being mapped spans a 512GB (PGDIR_SIZE)
alignment, a panic from alloc_low_pages occurs.
kernel_physical_mapping_init takes into account PGDIR_SIZE alignment.
This causes an extra call to alloc_low_page to be made. This extra call
isn't accounted for by find_early_table_space and causes a kernel panic.
Change is to take into account PGDIR_SIZE alignment in find_early_table_space.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 12b2f117f3 upstream.
audit_trim_trees() calls get_tree(). If a failure occurs we must call
put_tree().
[akpm@linux-foundation.org: run put_tree() before mutex_lock() for small scalability improvement]
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fe70b579c upstream.
ftrace_dump() had a lot of issues. What ftrace_dump() does, is when
ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it
will dump out the ftrace buffers to the console when either a oops,
panic, or a sysrq-z occurs.
This was written a long time ago when ftrace was fragile to recursion.
But it wasn't written well even for that.
There's a possible deadlock that can occur if a ftrace_dump() is happening
and an NMI triggers another dump. This is because it grabs a lock
before checking if the dump ran.
It also totally disables ftrace, and tracing for no good reasons.
As the ring_buffer now checks if it is read via a oops or NMI, where
there's a chance that the buffer gets corrupted, it will disable
itself. No need to have ftrace_dump() do the same.
ftrace_dump() is now cleaned up where it uses an atomic counter to
make sure only one dump happens at a time. A simple atomic_inc_return()
is enough that is needed for both other CPUs and NMIs. No need for
a spinlock, as if one CPU is running the dump, no other CPU needs
to do it too.
The tracing_on variable is turned off and not turned on. The original
code did this, but it wasn't pretty. By just disabling this variable
we get the result of not seeing traces that happen between crashes.
For sysrq-z, it doesn't get turned on, but the user can always write
a '1' to the tracing_on file. If they are using sysrq-z, then they should
know about tracing_on.
The new code is much easier to read and less error prone. No more
deadlock possibility when an NMI triggers here.
Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e9dd0e889 upstream.
The "Mobile Sandy Bridge CPUs" in the Fujitsu Esprimo Q900
mini desktop PCs are probably misleading the LVDS detection
code in intel_lvds_supported. Nothing is connected to the
LVDS ports in these systems.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce8a5dbdf9 upstream.
When checking if an autofs mount point is busy it isn't sufficient to
only check if it's a mount point.
For example, if the mount of an offset mountpoint in a tree is denied
for this host by its export and the dentry becomes a process working
directory the check incorrectly returns the mount as not in use at
expire.
This can happen since the default when mounting within a tree is
nostrict, which means ingnore mount fails on mounts within the tree and
continue. The nostrict option is meant to allow mounting in this case.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7122beeee7 upstream.
The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.
commit 41eab6f88f
powerpc/numa: Use form 1 affinity to setup node distance
Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.
All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes. This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same. (value of 'level' in sched_init_numa() will remain 0)
Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f2e29031e upstream.
Commit b4cbb197c7 ("vm: add vm_iomap_memory() helper function") added
a helper function wrapper around io_remap_pfn_range(), and every other
architecture defined it in <asm/pgtable.h>.
The s390 choice of <asm/io.h> may make sense, but is not very convenient
for this case, and gratuitous differences like that cause unexpected errors like this:
mm/memory.c: In function 'vm_iomap_memory':
mm/memory.c:2439:2: error: implicit declaration of function 'io_remap_pfn_range' [-Werror=implicit-function-declaration]
Glory be the kbuild test robot who noticed this, bisected it, and
reported it to the guilty parties (ie me).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6cc25fda5 upstream.
The adp5520 unfortunately also clears the BL_EN bit when the nSTNDBY bit is
cleared. So we need to make sure to restore it during resume if it was set
before suspend.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 836dc2fe89 upstream.
PARTITION_SUPPORT needs to be set before doing the compare on version
number so the bit width test does not get invalid data. Before this
patch, a Sandisk iNAND eMMC card would detect 1-bit width although
the hardware supports 4-bit.
Only affects old emmc devices - pre 4.4 devices.
Reported-by: Elad Yi <elad.yi@gmail.com>
Signed-off-by: Philip Rakity <prakity@yahoo.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f3e3c7cfc upstream.
Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
change made by commit a0b30c1229: ext4: use module parameters instead
of debugfs for mballoc_debug
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d69f3bad46 upstream.
Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB. By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.
In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
ipc/shm.c:
458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
459 {
...
465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
474 if (ns->shm_tot + numpages > ns->shm_ctlall)
475 return -ENOSPC;
[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
Signed-off-by: Robin Holt <holt@sgi.com>
Reported-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 990de49f74 upstream.
When a full scan 2.4 and 5 GHz scan is scheduled, but then the 2.4 GHz
part of the scan disables a 5.2 GHz channel due to, e.g. receiving
country or frequency information, that 5.2 GHz channel might already
be in the list of channels to scan next. Then, when the driver checks
if it should do a passive scan, that will return false and attempt an
active scan. This is not only wrong but can also lead to the iwlwifi
device firmware crashing since it checks regulatory as well.
Fix this by not setting the channel flags to just disabled but rather
OR'ing in the disabled flag. That way, even if the race happens, the
channel will be scanned passively which is still (mostly) correct.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf8d909705 upstream.
The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c7c3e67ab upstream.
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8b6cc4d6f8 upstream.
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1dfd89af86 upstream.
After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.
Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.
Reported-by: Marc Eshel <eshel@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f7a05d701 upstream.
Vitaliy reported that a per cpu HPET timer interrupt crashes the
system during hibernation. What happens is that the per cpu HPET timer
gets shut down when the nonboot cpus are stopped. When the nonboot
cpus are onlined again the HPET code sets up the MSI interrupt which
fires before the clock event device is registered. The event handler
is still set to hrtimer_interrupt, which then crashes the machine due
to highres mode not being active.
See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333
There is no real good way to avoid that in the HPET code. The HPET
code alrady has a mechanism to detect spurious interrupts when event
handler == NULL for a similar reason.
We can handle that in the clockevent/tick layer and replace the
previous functional handler with a dummy handler like we do in
tick_setup_new_device().
The original clockevents code did this in clockevents_exchange_device(),
but that got removed by commit 7c1e76897 (clockevents: prevent
clockevent event_handler ending up handler_noop) which forgot to fix
it up in tick_shutdown(). Same issue with the broadcast device.
Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: 700333@bugs.debian.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ac1707a13 upstream.
The 3rd parameter of flex_array_prealloc() is the number of elements,
not the index of the last element.
The effect of the bug is, when opening cgroup.procs, a flex array will
be allocated and all elements of the array is allocated with
GFP_KERNEL flag, but the last one is GFP_ATOMIC, and if we fail to
allocate memory for it, it'll trigger a BUG_ON().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e005715efa upstream.
There's a bug where rtc alarms are ignored after the rtc cmos suspends
but before the system finishes suspend. Since hpet emulation is
disabled and it still handles the interrupts, a wake event is never
registered which is done from the rtc layer.
This patch reverts commit d1b2efa83f ("rtc: disable hpet emulation on
suspend") which disabled hpet emulation. To fix the problem mentioned
in that commit, hpet_rtc_timer_init() is called directly on resume.
Signed-off-by: Derek Basehore <dbasehore@chromium.org>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f294b5a13 upstream.
The settimeofday01 test in the LTP testsuite effectively does
gettimeofday(current time);
settimeofday(Jan 1, 1970 + 100 seconds);
settimeofday(current time);
This test causes a stack trace to be displayed on the console during the
setting of timeofday to Jan 1, 1970 + 100 seconds:
[ 131.066751] ------------[ cut here ]------------
[ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140()
[ 131.104935] Hardware name: Dinar
[ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod
[ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6
[ 131.182248] Call Trace:
[ 131.184684] <IRQ> [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0
[ 131.191312] [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20
[ 131.197131] [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140
[ 131.203721] [<ffffffff810bb584>] tick_program_event+0x24/0x30
[ 131.209534] [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230
[ 131.215437] [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130
[ 131.221509] [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99
[ 131.227839] [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80
[ 131.233816] <EOI> [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120
[ 131.240267] [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0
[ 131.246252] [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0
[ 131.252238] [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20
[ 131.257877] [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260
[ 131.263692] [<ffffffff8101c42f>] cpu_idle+0xaf/0x120
[ 131.268727] [<ffffffff815f8971>] start_secondary+0x255/0x257
[ 131.274449] ---[ end trace 1151a50552231615 ]---
When we change the system time to a low value like this, the value of
timekeeper->offs_real will be a negative value.
It seems that the WARN occurs because an hrtimer has been started in the time
between the releasing of the timekeeper lock and the IPI call (via a call to
on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result
is that a REALTIME_CLOCK timer has been added with softexpires = expires =
KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at
kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the
clock base's offset (which was set to timekeeper->offs_real in
do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which
was KTIME_MAX):
KTIME_MAX - (a negative value) = overflow
A simple check for an overflow can resolve this problem. Using KTIME_MAX
instead of the overflow value will result in the hrtimer function being run,
and the reprogramming of the timer after that.
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[jstultz: Tweaked commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51fd36f3fa upstream.
One can trigger an overflow when using ktime_add_ns() on a 32bit
architecture not supporting CONFIG_KTIME_SCALAR.
When passing a very high value for u64 nsec, e.g. 7881299347898368000
the do_div() function converts this value to seconds (7881299347) which
is still to high to pass to the ktime_set() function as long. The result
in is a negative value.
The problem on my system occurs in the tick-sched.c,
tick_nohz_stop_sched_tick() when time_delta is set to
timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is
valid, thus ktime_add_ns() is called with a too large value resulting in
a negative expire value. This leads to an endless loop in the ticker code:
time_delta: 7881299347898368000
expires = ktime_add_ns(last_update, time_delta)
expires: negative value
This fix caps the value to KTIME_MAX.
This error doesn't occurs on 64bit or architectures supporting
CONFIG_KTIME_SCALAR (e.g. ARM, x86-32).
Signed-off-by: David Engraf <david.engraf@sysgo.com>
[jstultz: Minor tweaks to commit message & header]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60af3d037e upstream.
We've got strange errors in get_ctl_value() in mixer.c during
probing, e.g. on Hercules RMX2 DJ Controller:
ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x201, wIndex = 0xa00, type = 4
ALSA mixer.c:352 cannot get ctl value: req = 0x83, wValue = 0x200, wIndex = 0xa00, type = 4
....
It turned out that the culprit is autopm: snd_usb_autoresume() returns
-ENODEV when called during card->probing = 1.
Since the call itself during card->probing = 1 is valid, let's fix the
return value of snd_usb_autoresume() as success.
Reported-and-tested-by: Daniel Schürmann <daschuer@mixxx.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbc200bca4 upstream.
Commit 88a8516a21 (ALSA: usbaudio: implement USB autosuspend)
introduced autopm for all USB audio/MIDI devices. However, many MIDI
devices, such as synthesizers, do not merely transmit MIDI messages but
use their MIDI inputs to control other functions. With autopm, these
devices would get powered down as soon as the last MIDI port device is
closed on the host.
Even some plain MIDI interfaces could get broken: they automatically
send Active Sensing messages while powered up, but as soon as these
messages cease, the receiving device would interpret this as an
accidental disconnection.
Commit f5f165418c (ALSA: usb-audio: Fix missing autopm for MIDI input)
introduced another regression: some devices (e.g. the Roland GAIA SH-01)
are self-powered but do a reset whenever the USB interface's power state
changes.
To work around all this, just disable autopm for all USB MIDI devices.
Reported-by: Laurens Holst
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de53e9caa4 upstream.
The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/kvm/vtlb.c.
I observed this on Kernel 3.2.35 but it is also true on the most
recent Kernel 3.9-rc1.
File arch/ia64/kvm/vtlb.c:
u64 guest_vhpt_lookup(u64 iha, u64 *pte)
{
u64 ret;
struct thash_data *data;
data = __vtr_lookup(current_vcpu, iha, D_TLB);
if (data != NULL)
thash_vhpt_insert(current_vcpu, data->page_flags,
data->itir, iha, D_TLB);
asm volatile (
"rsm psr.ic|psr.i;;"
"srlz.d;;"
"ld8.s r9=[%1];;"
"tnat.nz p6,p7=r9;;"
"(p6) mov %0=1;"
"(p6) mov r9=r0;"
"(p7) extr.u r9=r9,0,53;;"
"(p7) mov %0=r0;"
"(p7) st8 [%2]=r9;;"
"ssm psr.ic;;"
"srlz.d;;"
"ssm psr.i;;"
"srlz.d;;"
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
return ret;
}
The list of output registers is
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The attached patch fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.
This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545d6e189a upstream.
Found problem on system that firmware that could handle pci aer.
Firmware get error reporting after pci injecting error, before os boots.
But after os boots, firmware can not get report anymore, even pci=noaer
is passed.
Root cause: BIOS _OSC has problem with query bit checking.
It turns out that BIOS vendor is copying example code from ACPI Spec.
In ACPI Spec 5.0, page 290:
If (Not(And(CDW1,1))) // Query flag clear?
{ // Disable GPEs for features granted native control.
If (And(CTRL,0x01)) // Hot plug control granted?
{
Store(0,HPCE) // clear the hot plug SCI enable bit
Store(1,HPCS) // clear the hot plug SCI status bit
}
...
}
When Query flag is set, And(CDW1,1) will be 1, Not(1) will return 0xfffffffe.
So it will get into code path that should be for control set only.
BIOS acpi code should be changed to "If (LEqual(And(CDW1,1), 0)))"
Current kernel code is using _OSC query to notify firmware about support
from OS and then use _OSC to set control bits.
During query support, current code is using all possible controls.
So will execute code that should be only for control set stage.
That will have problem when pci=noaer or aer firmware_first is used.
As firmware have that control set for os aer already in query support stage,
but later will not os aer handling.
We should avoid passing all possible controls, just use osc_control_set
instead.
That should workaround BIOS bugs with affected systems on the field
as more bios vendors are copying sample code from ACPI spec.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d303e9e98f upstream.
Back 2010 during a revamp of the irq code some initializations
were moved from ia64_mca_init() to ia64_mca_late_init() in
commit c75f2aa13f
Cannot use register_percpu_irq() from ia64_mca_init()
But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.
Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com>
Tested-by: Fred Hartnett <fred.hartnett@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c39e8e4354 upstream.
The TX_FIFO register is 10 bits wide. The lower 8 bits are the data to be
written, while the upper two bits are flags to indicate stop/start.
The driver apparently attempted to optimize write access, by only writing a
byte in those cases where the stop/start bits are zero. However, we have
seen cases where the lower byte is duplicated onto the upper byte by the
hardware, which causes inadvertent stop/starts.
This patch changes the write access to the transmit FIFO to always be 16 bits
wide.
Signed off by: Steven A. Falco <sfalco@harris.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4df297129f upstream.
Currently, the depth reported in the stack tracer stack_trace file
does not match the stack_max_size file. This is because the stack_max_size
includes the overhead of stack tracer itself while the depth does not.
The first time a max is triggered, a calculation is not performed that
figures out the overhead of the stack tracer and subtracts it from
the stack_max_size variable. The overhead is stored and is subtracted
from the reported stack size for comparing for a new max.
Now the stack_max_size corresponds to the reported depth:
# cat stack_max_size
4640
# cat stack_trace
Depth Size Location (48 entries)
----- ---- --------
0) 4640 32 _raw_spin_lock+0x18/0x24
1) 4608 112 ____cache_alloc+0xb7/0x22d
2) 4496 80 kmem_cache_alloc+0x63/0x12f
3) 4416 16 mempool_alloc_slab+0x15/0x17
[...]
While testing against and older gcc on x86 that uses mcount instead
of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
stack trace show one more function deep which was missing before.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4ecbfc49b upstream.
When gcc 4.6 on x86 is used, the function tracer will use the new
option -mfentry which does a call to "fentry" at every function
instead of "mcount". The significance of this is that fentry is
called as the first operation of the function instead of the mcount
usage of being called after the stack.
This causes the stack tracer to show some bogus results for the size
of the last function traced, as well as showing "ftrace_call" instead
of the function. This is due to the stack frame not being set up
by the function that is about to be traced.
# cat stack_trace
Depth Size Location (48 entries)
----- ---- --------
0) 4824 216 ftrace_call+0x5/0x2f
1) 4608 112 ____cache_alloc+0xb7/0x22d
2) 4496 80 kmem_cache_alloc+0x63/0x12f
The 216 size for ftrace_call includes both the ftrace_call stack
(which includes the saving of registers it does), as well as the
stack size of the parent.
To fix this, if CC_USING_FENTRY is defined, then the stack_tracer
will reserve the first item in stack_dump_trace[] array when
calling save_stack_trace(), and it will fill it in with the parent ip.
Then the code will look for the parent pointer on the stack and
give the real size of the parent's stack pointer:
# cat stack_trace
Depth Size Location (14 entries)
----- ---- --------
0) 2640 48 update_group_power+0x26/0x187
1) 2592 224 update_sd_lb_stats+0x2a5/0x4ac
2) 2368 160 find_busiest_group+0x31/0x1f1
3) 2208 256 load_balance+0xd9/0x662
I'm Cc'ing stable, although it's not urgent, as it only shows bogus
size for item #0, the rest of the trace is legit. It should still be
corrected in previous stable releases.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 87889501d0 upstream.
Use the stack of stack_trace_call() instead of check_stack() as
the test pointer for max stack size. It makes it a bit cleaner
and a little more accurate.
Adding stable, as a later fix depends on this patch.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0b885657b upstream.
We first tried to avoid updating atime/mtime entirely (commit
b0de59b573: "TTY: do not update atime/mtime on read/write"), and then
limited it to only update it occasionally (commit 37b7f3c765: "TTY:
fix atime/mtime regression"), but it turns out that this was both
insufficient and overkill.
It was insufficient because we let people attach to the shared ptmx node
to see activity without even reading atime/mtime, and it was overkill
because the "only once a minute" means that you can't really tell an
idle person from an active one with 'w'.
So this tries to fix the problem properly. It marks the shared ptmx
node as un-notifiable, and it lowers the "only once a minute" to a few
seconds instead - still long enough that you can't time individual
keystrokes, but short enough that you can tell whether somebody is
active or not.
Reported-by: Simon Kirby <sim@hostway.ca>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a65dcc04c upstream.
The serial core uses device_find_child() but does not drop the reference to
the retrieved child after using it. This patch add the missing put_device().
What I have done to test this issue.
I used a machine with an AMBA PL011 serial driver. I tested the patch on
next-20120408 because the last branch [next-20120415] does not boot on this
board.
For test purpose, I added some pr_info() messages to print the refcount
after device_find_child() (lines: 1937,2009), and after put_device()
(lines: 1947, 2021).
Boot the machine *without* put_device(). Then:
echo reboot > /sys/power/disk
echo disk > /sys/power/state
[ 87.058575] uart_suspend_port:1937 refcount 4
[ 87.058582] uart_suspend_port:1947 refcount 4
[ 87.098083] uart_resume_port:2009refcount 5
[ 87.098088] uart_resume_port:2021 refcount 5
echo disk > /sys/power/state
[ 103.055574] uart_suspend_port:1937 refcount 6
[ 103.055580] uart_suspend_port:1947 refcount 6
[ 103.095322] uart_resume_port:2009 refcount 7
[ 103.095327] uart_resume_port:2021 refcount 7
echo disk > /sys/power/state
[ 252.459580] uart_suspend_port:1937 refcount 8
[ 252.459586] uart_suspend_port:1947 refcount 8
[ 252.499611] uart_resume_port:2009 refcount 9
[ 252.499616] uart_resume_port:2021 refcount 9
The refcount continuously increased.
Boot the machine *with* this patch. Then:
echo reboot > /sys/power/disk
echo disk > /sys/power/state
[ 159.333559] uart_suspend_port:1937 refcount 4
[ 159.333566] uart_suspend_port:1947 refcount 3
[ 159.372751] uart_resume_port:2009 refcount 4
[ 159.372755] uart_resume_port:2021 refcount 3
echo disk > /sys/power/state
[ 185.713614] uart_suspend_port:1937 refcount 4
[ 185.713621] uart_suspend_port:1947 refcount 3
[ 185.752935] uart_resume_port:2009 refcount 4
[ 185.752940] uart_resume_port:2021 refcount 3
echo disk > /sys/power/state
[ 207.458584] uart_suspend_port:1937 refcount 4
[ 207.458591] uart_suspend_port:1947 refcount 3
[ 207.498598] uart_resume_port:2009 refcount 4
[ 207.498605] uart_resume_port:2021 refcount 3
The refcount correctly handled.
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7918c92ae9 upstream.
When we online the CPU, we get this splat:
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94c163663f upstream.
In case a machine supports memory hotplug all active memory increments
present at IPL time have been initialized with a "usecount" of 1.
This is wrong if the memory increment size is larger than the memory
section size of the memory hotplug code. If that is the case the
usecount must be initialized with the number of memory sections that
fit into one memory increment.
Otherwise it is possible to put a memory increment into standby state
even if there are still active sections.
Afterwards addressing exceptions might happen which cause the kernel
to panic.
However even worse, if a memory increment was put into standby state
and afterwards into active state again, it's contents would have been
zeroed, leading to memory corruption.
This was only an issue for machines that support standby memory and
have at least 256GB memory.
This is broken since commit fdb1bb15 "[S390] sclp/memory hotplug: fix
initial usecount of increments".
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 671b4b2ba9 upstream.
Many cards based on CY7C68300A/B/C use the USB ID 04b4:6830 but only the
B and C variants (EZ-USB AT2LP) support the ATA Command Block
functionality, according to the data sheets. The A variant (EZ-USB AT2)
locks up if ATACB is attempted, until a typical 30 seconds timeout runs
out and a USB reset is performed.
https://bugs.launchpad.net/bugs/428469
It seems that one way to spot a CY7C68300A (at least where the card
manufacturer left Cypress' EEPROM default vaules, against Cypress'
recommendations) is to look at the USB string descriptor indices.
A http://media.digikey.com/pdf/Data%20Sheets/Cypress%20PDFs/CY7C68300A.pdf
B http://www.farnell.com/datasheets/43456.pdf
C http://www.cypress.com/?rID=14189
Note that a CY7C68300B/C chip appears as CY7C68300A if it is running
in Backward Compatibility Mode, and if ATACB would be supported in this
case there is anyway no way to tell which chip it really is.
For 5 years my external USB drive has been locking up for half a minute
when plugged in and ata_id is run by udev, or anytime hdparm or similar
is run on it.
Finally looking at the /correct/ datasheet I think I found the reason. I
am aware the quirk in this patch is a bit hacky, but the hardware
manufacturers haven't made it easy for us.
Signed-off-by: Tormod Volden <debian.tormod@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1361bf4b9f upstream.
When usbfs receives a ctrl-request from userspace it calls check_ctrlrecip,
which for a request with USB_RECIP_ENDPOINT tries to map this to an interface
to see if this interface is claimed, except for ctrl-requests with a type of
USB_TYPE_VENDOR.
When trying to use this device: http://www.akaipro.com/eiepro
redirected to a Windows vm running on qemu on top of Linux.
The windows driver makes a ctrl-req with USB_TYPE_CLASS and
USB_RECIP_ENDPOINT with index 0, and the mapping of the endpoint (0) to
the interface fails since ep 0 is the ctrl endpoint and thus never is
part of an interface.
This patch fixes this ctrl-req failing by skipping the checkintf call for
USB_RECIP_ENDPOINT ctrl-reqs on the ctrl endpoint.
Reported-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl>
Tested-by: Dave Stikkolorum <d.r.stikkolorum@hhs.nl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f06d15f8d upstream.
The current ST Micro Connect Lite uses the FT4232H hi-speed quad USB
UART FTDI chip. It is also possible to drive STM reference targets
populated with an on-board JTAG debugger based on the FT2232H chip with
the same STMicroelectronics tools.
For this reason, the ST Micro Connect Lite PIDs should be
ST_STMCLT_2232_PID: 0x3746
ST_STMCLT_4232_PID: 0x3747
Signed-off-by: Adrian Thomasset <adrian.thomasset@st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58f8b6c4fa upstream.
This patch add a missing usb device id for the GDMBoost V1.x device
The patch is against 3.9-rc5
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6747e83235 upstream.
In commit 85fe402 (fs: do not assign default i_ino in new_inode), the
initialisation of i_ino was removed from new_inode() and pushed down
into the callers. However spufs_new_inode() was not updated.
This exhibits as no files appearing in /spu, because all our dirents
have a zero inode, which readdir() seems to dislike.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29ce3c5073 upstream.
In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush. After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.
Unfortunately there's no isync between the copy of this code and the
jump to it. Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.
We've seen this on real machines and it's results in no console output
after:
calling quiesce...
returning from prom_init
The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aea1181b0b upstream, Needed to
compile ext4 for sparc32 since commit
503f4bdcc0
There is no-one that really require atomic64_t support on sparc32.
But several drivers fails to build without proper atomic64 support.
And for an allyesconfig build for sparc32 this is annoying.
Include the generic atomic64_t support for sparc32.
This has a text footprint cost:
$size vmlinux (before atomic64_t support)
text data bss dec hex filename
3578860 134260 108781 3821901 3a514d vmlinux
$size vmlinux (after atomic64_t support)
text data bss dec hex filename
3579892 130684 108781 3819357 3a475d vmlinux
text increase (3579892 - 3578860) = 1032 bytes
data decreases - but I fail to explain why!
I have rebuild twice to check my numbers.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 97599dc792 ]
Commit 4a94445c9a (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.
Commit 64f3b9e203 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.
Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.
Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)
When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.
Use same logic in IPv6, as there is no need to hold dst references.
Reported-by: Tom Parkin <tparkin@katalix.com>
Tested-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c802d75962 ]
sizeof() when applied to a pointer typed expression gives the size of the
pointer, not that of the pointed data.
Introduced by commit 3ce5ef(netrom: fix info leak via msg_name in nr_recvmsg)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 60085c3d00 ]
The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.
Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.
Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a184233f2 ]
The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.
Fix the issue by initializing the memory used for sockaddr info with
memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits 3ce5efad47 and
c802d75962 ]
In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix both issues by initializing the memory with memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c77a4b9cff ]
For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.
Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a5598bd9c0 ]
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ae94c0d2f ]
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d6fbfe733 ]
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e11e0455c0 ]
If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.
Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4683f42fde ]
In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.
Fix this by moving the msg_namelen assignment in front of the shutdown
test.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ef3313e84a ]
When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.
Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.
Fix both issues by initializing the memory with memset(0).
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9b3e617f3d ]
The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.
Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 83f1b4ba91 ]
Commit 257b5358b3 ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.
Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.
This just undoes that (presumably unintentional) part of the commit.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 12fb3dd9dc ]
commit bd090dfc63 (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.
1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
(ecr 200 should be ecr 300 in packets 3 & 4)
Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.
Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 586c31f3bf ]
For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d66954a066 ]
There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
RT_SCOPE_UNIVERSE, IPPROTO_TCP,
inet_sk_flowi_flags(sk),
(opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
ireq->loc_addr, th->source, th->dest);
Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock ->
tcp_v4_syn_recv_sock), so its packets may take the wrong path.
Signed-off-by: Dmitry Popov <dp@highloadlab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 124dff01af ]
Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.
nf_reset() is used in the following cases:
- when passing packets up the the socket layer, at which point we want to
release all netfilter references that might keep modules pinned while
the packet is queued. nf_trace doesn't matter anymore at this point.
- when encapsulating or decapsulating IPsec packets. We want to continue
tracing these packets after IPsec processing.
- when passing packets through virtual network devices. Only devices on
that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
used anymore. Its not entirely clear whether those packets should
be traced after that, however we've always done that.
- when passing packets through virtual network devices that make the
packet cross network namespace boundaries. This is the only cases
where we clearly want to reset nf_trace and is also what the
original patch intended to fix.
Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0e82e7f6df ]
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.
The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.
The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.
Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.
I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.
Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6a5a7b9a5 ]
While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().
v2: no change
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4543fbefe6 ]
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.
Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 25fb6ca4ed ]
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.
IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.
This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.
==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f0f6ee1f70 ]
currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link
In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps
This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.
To fix this problem we prevent change of q->now until its synchronization
with real time.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Commits f36391d279 and
f0af97070a upstream. ]
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.
So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.
Fix this by using generic infrastructure to synchonize on the
completion of the cross call.
This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a
region, we flush TLBs synchronously.
1) Get rid of xcall_flush_tlb_pending and per-cpu type
implementations.
2) Do TLB batch cross calls instead via:
smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()
3) Batch only in lazy mmu sequences:
a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
flush if it's clear.
4) Add infrastructure for synchronous TLB page flushes.
a) Implement __flush_tlb_page and per-cpu variants, patch
as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
upon CONFIG_SMP
5) It turns out that singleton batches are very common, 2 out of every
3 batch flushes have only a single entry in them.
The batch flush waiting is very expensive, both because of the poll
on sibling cpu completeion, as well as because passing the tlb batch
pointer to the sibling cpus invokes a shared memory dereference.
Therefore, in flush_tlb_pending(), if there is only one entry in
the batch perform a completely asynchronous global_flush_tlb_page()
instead.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37b7f3c765 upstream.
In commit b0de59b573 ("TTY: do not update atime/mtime on read/write")
we removed timestamps from tty inodes to fix a security issue and waited
if something breaks. Well, 'w', the utility to find out logged users
and their inactivity time broke. It shows that users are inactive since
the time they logged in.
To revert to the old behaviour while still preventing attackers to
guess the password length, we update the timestamps in one-minute
intervals by this patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0de59b573 upstream.
On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find
out length of a password using timestamps of /dev/ptmx. It is
documented in "Timing Analysis of Keystrokes and Timing Attacks on
SSH". To avoid that problem, do not update time when reading
from/writing to a TTY.
I am afraid of regressions as this is a behavior we have since 0.97
and apps may expect the time to be current, e.g. for monitoring
whether there was a change on the TTY. Now, there is no change. So
this would better have a lot of testing before it goes upstream.
References: CVE-2013-0160
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bc4bee459 upstream.
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8558e4a26b upstream.
This is my example conversion of a few existing mmap users. The mtdchar
case is actually disabled right now (and stays disabled), but I did it
because it showed up on my "git grep", and I was familiar with the code
due to fixing an overflow problem in the code in commit 9c603e53d3
("mtdchar: fix offset overflow detection").
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2323036dfe upstream.
This is my example conversion of a few existing mmap users. The HPET
case is simple, widely available, and easy to test (Clemens Ladisch sent
a trivial test-program for it).
Test-program-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc9bbca8f6 upstream.
This is my example conversion of a few existing mmap users. The
fb_mmap() case is a good example because it is a bit more complicated
than some: fb_mmap() mmaps one of two different memory areas depending
on the page offset of the mmap (but happily there is never any mixing of
the two, so the helper function still works).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4cbb197c7 upstream.
Various drivers end up replicating the code to mmap() their memory
buffers into user space, and our core memory remapping function may be
very flexible but it is unnecessarily complicated for the common cases
to use.
Our internal VM uses pfn's ("page frame numbers") which simplifies
things for the VM, and allows us to pass physical addresses around in a
denser and more efficient format than passing a "phys_addr_t" around,
and having to shift it up and down by the page size. But it just means
that drivers end up doing that shifting instead at the interface level.
It also means that drivers end up mucking around with internal VM things
like the vma details (vm_pgoff, vm_start/end) way more than they really
need to.
So this just exports a function to map a certain physical memory range
into user space (using a phys_addr_t based interface that is much more
natural for a driver) and hides all the complexity from the driver.
Some drivers will still end up tweaking the vm_page_prot details for
things like prefetching or cacheability etc, but that's actually
relevant to the driver, rather than caring about what the page offset of
the mapping is into the particular IO memory region.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 054430e773 upstream.
Okay so Alan's patch handled the case where there was no registered fbcon,
however the other path entered in set_con2fb_map pit.
In there we called fbcon_takeover, but we also took the console lock in a couple
of places. So push the console lock out to the callers of set_con2fb_map,
this means fbmem and switcheroo needed to take the lock around the fb notifier
entry points that lead to this.
This should fix the efifb regression seen by Maarten.
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 991f76f837 in Linus'
tree which is f366c8f271 in
linux-stable.git
It depends on ef3d0fd27e ("vfs: do (nearly) lockless generic_file_llseek")
which is available only in 3.2+.
When applied on 3.0 codebase, it causes A-A deadlock, whenever anyone does
seek() on sysfs, as both generic_file_llseek() and sysfs_dir_llseek() obtain
i_mutex.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72a763d805 upstream.
The current code does not set the msg_namelen member to 0 and therefore
makes net/socket.c leak the local sockaddr_storage variable to userland
-- 128 bytes of kernel stack memory. Fix that.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 383efcd000 upstream.
try_to_wake_up_local() should only be invoked to wake up another
task in the same runqueue and BUG_ON()s are used to enforce the
rule. Missing try_to_wake_up_local() can stall workqueue
execution but such stalls are likely to be finite either by
another work item being queued or the one blocked getting
unblocked. There's no reason to trigger BUG while holding rq
lock crashing the whole system.
Convert BUG_ON()s in try_to_wake_up_local() to WARN_ON_ONCE()s.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130318192234.GD3042@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 319e7bd96a upstream.
Since the firmware has been open sourced, the minor version has been
bumped to 1.4 and the API/ABI will stay compatible across further 1.x
releases.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd272d1ea7 upstream.
On Feroceon the L2 cache becomes non-coherent with the CPU
when the L1 caches are disabled. Thus the L2 needs to be invalidated
after both L1 caches are disabled.
On kexec before the starting the code for relocation the kernel,
the L1 caches are disabled in cpu_froc_fin (cpu_v7_proc_fin for Feroceon),
but after L2 cache is never invalidated, because inv_all is not set
in cache-feroceon-l2.c.
So kernel relocation and decompression may has (and usually has) errors.
Setting the function enables L2 invalidation and fixes the issue.
Signed-off-by: Illia Ragozin <illia.ragozin@grapecom.com>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f964525a1 upstream.
This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page. If the range falls within
the same memslot, then this will be a fast operation. If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.
Tested: Test against kvm_clock unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2c118bfab upstream.
If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows
that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate
that request. ioapic_read_indirect contains an
ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in
non-debug builds. In recent kernels this allows a guest to cause a kernel
oops by reading invalid memory. In older kernels (pre-3.3) this allows a
guest to read from large ranges of host memory.
Tested: tested against apic unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b79459b48 upstream.
There is a potential use after free issue with the handling of
MSR_KVM_SYSTEM_TIME. If the guest specifies a GPA in a movable or removable
memory such as frame buffers then KVM might continue to write to that
address even after it's removed via KVM_SET_USER_MEMORY_REGION. KVM pins
the page in memory so it's unlikely to cause an issue, but if the user
space component re-purposes the memory previously used for the guest, then
the guest will be able to corrupt that memory.
Tested: Tested against kvmclock unit test
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c300aa64dd upstream.
If the guest sets the GPA of the time_page so that the request to update the
time straddles a page then KVM will write onto an incorrect page. The
write is done byusing kmap atomic to get a pointer to the page for the time
structure and then performing a memcpy to that page starting at an offset
that the guest controls. Well behaved guests always provide a 32-byte aligned
address, however a malicious guest could use this to corrupt host kernel
memory.
Tested: Tested against kvmclock unit test.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9cc3a5bd40 upstream.
With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access the
error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in
get_page().
The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit
layout between hugepage hwpoison entry and pmd entry, so
follow_hugetlb_page() which is called in get_dump_page() returns a wrong
page from a given address.
The expected behavior is like this:
absent is_swap_pte FOLL_DUMP Expected behavior
-------------------------------------------------------------------
true false false hugetlb_fault
false true false hugetlb_fault
false false false return page
true false true skip page (to avoid allocation)
false true true hugetlb_fault
false false true return page
With this patch, we can call hugetlb_fault() and take proper actions (we
wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for
hwpoisoned entries,) and as the result we can dump all hugepages except
for hwpoisoned ones.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0443de5fbf upstream.
To get correct endianes on little endian cpus (like arm) while reading device
tree properties, this patch replaces of_get_property() with
of_property_read_u32(). While there use of_property_read_bool() for the
handling of the boolean "nxp,no-comparator-bypass" property.
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84cc8fd2fe upstream.
The current code makes the assumption that a cpu_base lock won't be
held if the CPU corresponding to that cpu_base is offline, which isn't
always true.
If a hrtimer is not queued, then it will not be migrated by
migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
cpu_base may still point to a CPU which has subsequently gone offline
if the timer wasn't enqueued at the time the CPU went down.
Normally this wouldn't be a problem, but a cpu_base's lock is blindly
reinitialized each time a CPU is brought up. If a CPU is brought
online during the period that another thread is performing a hrtimer
operation on a stale hrtimer, then the lock will be reinitialized
under its feet, and a SPIN_BUG() like the following will be observed:
<0>[ 28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
<0>[ 28.087078] lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
<4>[ 42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc)
<4>[ 42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30)
<4>[ 42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8)
<4>[ 42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28)
<4>[ 42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320)
<4>[ 42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8)
<4>[ 42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0)
<4>[ 42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0)
<4>[ 42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434)
As an example, this particular crash occurred when hrtimer_start() was
executed on CPU #0. The code locked the hrtimer's current cpu_base
corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
cpu_base to an optimal CPU which was online. In this case, it selected
the cpu_base corresponding to CPU #3.
Before it could proceed, CPU #1 came online and reinitialized the
spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
which was reinitialized. When CPU #0 finally ended up unlocking the
old cpu_base corresponding to CPU #1 so that it could switch to CPU
#3, we hit this SPIN_BUG() above while in switch_hrtimer_base().
CPU #0 CPU #1
---- ----
... <offline>
hrtimer_start()
lock_hrtimer_base(base #1)
... init_hrtimers_cpu()
switch_hrtimer_base() ...
... raw_spin_lock_init(&cpu_base->lock)
raw_spin_unlock(&cpu_base->lock) ...
<spin_bug>
Solve this by statically initializing the lock.
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5cf8f0742 upstream.
This code was broken because it assumed that all MTD devices were map-based.
Disable it for now, until it can be fixed properly for the next merge window.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2409d8343 upstream.
It would cause no link after suspending or shutdowning when the
nic changes the speed to 10M and connects to a link partner which
forces the speed to 100M.
Check the link partner ability to determine which speed to set.
The link speed down code path is not factored in this kernel version.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c603e53d3 upstream.
Sasha Levin has been running trinity in a KVM tools guest, and was able
to trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of
the memory type). The call trace showed that it was mtdchar_mmap() that
created an invalid remap_pfn_range().
The problem is that mtdchar_mmap() does various really odd and subtle
things with the vma page offset etc, and uses the wrong types (and the
wrong overflow) detection for it.
For example, the page offset may well be 32-bit on a 32-bit
architecture, but after shifting it up by PAGE_SHIFT, we need to use a
potentially 64-bit resource_size_t to correctly hold the full value.
Also, we need to check that the vma length plus offset doesn't overflow
before we check that it is smaller than the length of the mtdmap region.
This fixes things up and tries to make the code a bit easier to read.
Reported-and-tested-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Artem Bityutskiy <dedekind1@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 511ba86e1d upstream.
Invoking arch_flush_lazy_mmu_mode() results in calls to
preempt_enable()/disable() which may have performance impact.
Since lazy MMU is not used on bare metal we can patch away
arch_flush_lazy_mmu_mode() so that it is never called in such
environment.
[ hpa: the previous patch "Fix vmalloc_fault oops during lazy MMU
updates" may cause a minor performance regression on
bare metal. This patch resolves that performance regression. It is
somewhat unclear to me if this is a good -stable candidate. ]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1364045796-10720-2-git-send-email-konrad.wilk@oracle.com
Tested-by: Josh Boyer <jwboyer@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1160c2779b upstream.
In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
when lazy MMU updates are enabled, because set_pgd effects are being
deferred.
One instance of this problem is during process mm cleanup with memory
cgroups enabled. The chain of events is as follows:
- zap_pte_range enables lazy MMU updates
- zap_pte_range eventually calls mem_cgroup_charge_statistics,
which accesses the vmalloc'd mem_cgroup per-cpu stat area
- vmalloc_fault is triggered which tries to sync the corresponding
PGD entry with set_pgd, but the update is deferred
- vmalloc_fault oopses due to a mismatch in the PUD entries
The OOPs usually looks as so:
------------[ cut here ]------------
kernel BUG at arch/x86/mm/fault.c:396!
invalid opcode: 0000 [#1] SMP
.. snip ..
CPU 1
Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
RIP: e030:[<ffffffff816271bf>] [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
.. snip ..
Call Trace:
[<ffffffff81627759>] do_page_fault+0x399/0x4b0
[<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
[<ffffffff81624065>] page_fault+0x25/0x30
[<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
[<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
[<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
[<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
[<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
[<ffffffff81153e61>] unmap_single_vma+0x531/0x870
[<ffffffff81154962>] unmap_vmas+0x52/0xa0
[<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
[<ffffffff8115c8f8>] exit_mmap+0x98/0x170
[<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[<ffffffff81059ce3>] mmput+0x83/0xf0
[<ffffffff810624c4>] exit_mm+0x104/0x130
[<ffffffff8106264a>] do_exit+0x15a/0x8c0
[<ffffffff810630ff>] do_group_exit+0x3f/0xa0
[<ffffffff81063177>] sys_exit_group+0x17/0x20
[<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b
Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
changes visible to the consistency checks.
RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737
Tested-by: Josh Boyer <jwboyer@redhat.com>
Reported-and-Tested-by: Krishna Raman <kraman@redhat.com>
Signed-off-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.com
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1cbcaa9ea upstream.
The sched_clock_remote() implementation has the following inatomicity
problem on 32bit systems when accessing the remote scd->clock, which
is a 64bit value.
CPU0 CPU1
sched_clock_local() sched_clock_remote(CPU0)
...
remote_clock = scd[CPU0]->clock
read_low32bit(scd[CPU0]->clock)
cmpxchg64(scd->clock,...)
read_high32bit(scd[CPU0]->clock)
While the update of scd->clock is using an atomic64 mechanism, the
readout on the remote cpu is not, which can cause completely bogus
readouts.
It is a quite rare problem, because it requires the update to hit the
narrow race window between the low/high readout and the update must go
across the 32bit boundary.
The resulting misbehaviour is, that CPU1 will see the sched_clock on
CPU1 ~4 seconds ahead of it's own and update CPU1s sched_clock value
to this bogus timestamp. This stays that way due to the clamping
implementation for about 4 seconds until the synchronization with
CLOCK_MONOTONIC undoes the problem.
The issue is hard to observe, because it might only result in a less
accurate SCHED_OTHER timeslicing behaviour. To create observable
damage on realtime scheduling classes, it is necessary that the bogus
update of CPU1 sched_clock happens in the context of an realtime
thread, which then gets charged 4 seconds of RT runtime, which results
in the RT throttler mechanism to trigger and prevent scheduling of RT
tasks for a little less than 4 seconds. So this is quite unlikely as
well.
The issue was quite hard to decode as the reproduction time is between
2 days and 3 weeks and intrusive tracing makes it less likely, but the
following trace recorded with trace_clock=global, which uses
sched_clock_local(), gave the final hint:
<idle>-0 0d..30 400269.477150: hrtimer_cancel: hrtimer=0xf7061e80
<idle>-0 0d..30 400269.477151: hrtimer_start: hrtimer=0xf7061e80 ...
irq/20-S-587 1d..32 400273.772118: sched_wakeup: comm= ... target_cpu=0
<idle>-0 0dN.30 400273.772118: hrtimer_cancel: hrtimer=0xf7061e80
What happens is that CPU0 goes idle and invokes
sched_clock_idle_sleep_event() which invokes sched_clock_local() and
CPU1 runs a remote wakeup for CPU0 at the same time, which invokes
sched_remote_clock(). The time jump gets propagated to CPU0 via
sched_remote_clock() and stays stale on both cores for ~4 seconds.
There are only two other possibilities, which could cause a stale
sched clock:
1) ktime_get() which reads out CLOCK_MONOTONIC returns a sporadic
wrong value.
2) sched_clock() which reads the TSC returns a sporadic wrong value.
#1 can be excluded because sched_clock would continue to increase for
one jiffy and then go stale.
#2 can be excluded because it would not make the clock jump
forward. It would just result in a stale sched_clock for one jiffy.
After quite some brain twisting and finding the same pattern on other
traces, sched_clock_remote() remained the only place which could cause
such a problem and as explained above it's indeed racy on 32bit
systems.
So while on 64bit systems the readout is atomic, we need to verify the
remote readout on 32bit machines. We need to protect the local->clock
readout in sched_clock_remote() on 32bit as well because an NMI could
hit between the low and the high readout, call sched_clock_local() and
modify local->clock.
Thanks to Siegfried Wulsch for bearing with my debug requests and
going through the tedious tasks of running a bunch of reproducer
systems to generate the debug information which let me decode the
issue.
Reported-by: Siegfried Wulsch <Siegfried.Wulsch@rovema.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304051544160.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30f359a6f9 upstream.
This patch fixes a bug where a handful of informational / control CDBs
that should be allowed during ALUA access state Standby/Offline/Transition
where incorrectly returning CHECK_CONDITION + ASCQ_04H_ALUA_TG_PT_*.
This includes INQUIRY + REPORT_LUNS, which would end up preventing LUN
registration when LUN scanning occured during these ALUA access states.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f389a8f1d upstream.
As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
subsystems PM) say, syscore_ops operations should be carried with one
CPU on-line and interrupts disabled. However, after commit f96972f2d
(kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
syscore_shutdown() is called before disable_nonboot_cpus(), so break
the rules. We have a MIPS machine with a 8259A PIC, and there is an
external timer (HPET) linked at 8259A. Since 8259A has been shutdown
too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
timer interrupt, so it hangs and reboot fails. This patch call
syscore_shutdown() a little later (after disable_nonboot_cpus()) to
avoid reboot failure, this is the same way as poweroff does.
For consistency, add disable_nonboot_cpus() to kernel_halt().
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1ca493b0b upstream.
The Charge Pump needs the DSP clock to work properly, without it the
bypass to HP/LINEOUT is not working properly. This requirement is not
mentioned in the datasheet but has been confirmed by Mark Brown from
Wolfson.
Signed-off-by: Alban Bedel <alban.bedel@avionic-design.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f03574f2d5 upstream.
[was already included in 3.0, but I missed the patch hunk for
arch/x86/mm/numa_32.c - gregkh]
This code was an optimization for 32-bit NUMA systems.
It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger. Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator. If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal. But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.
For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.
We don't need to hang on to performance optimizations for
these old boxes any more. All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.
This code is causing real things to break today:
https://lkml.org/lkml/2013/1/9/376
I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().
[ hpa: Cc: this for -stable, since it is a memory corruption issue.
However, an alternative is to simply mark NUMA as depends BROKEN
rather than EXPERIMENTAL in the X86_32 subclause... ]
Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 889d66848b upstream.
The usb_control_msg() function expects __u16 types and performs
the endianness conversions by itself.
However, in three places, a conversion is performed before it is
handed over to usb_control_msg(), which leads to a double conversion
(= no conversion):
* snd_usb_nativeinstruments_boot_quirk()
* snd_nativeinstruments_control_get()
* snd_nativeinstruments_control_put()
Caught by sparse:
sound/usb/mixer_quirks.c:512:38: warning: incorrect type in argument 6 (different base types)
sound/usb/mixer_quirks.c:512:38: expected unsigned short [unsigned] [usertype] index
sound/usb/mixer_quirks.c:512:38: got restricted __le16 [usertype] <noident>
sound/usb/mixer_quirks.c:543:35: warning: incorrect type in argument 5 (different base types)
sound/usb/mixer_quirks.c:543:35: expected unsigned short [unsigned] [usertype] value
sound/usb/mixer_quirks.c:543:35: got restricted __le16 [usertype] <noident>
sound/usb/mixer_quirks.c:543:56: warning: incorrect type in argument 6 (different base types)
sound/usb/mixer_quirks.c:543:56: expected unsigned short [unsigned] [usertype] index
sound/usb/mixer_quirks.c:543:56: got restricted __le16 [usertype] <noident>
sound/usb/quirks.c:502:35: warning: incorrect type in argument 5 (different base types)
sound/usb/quirks.c:502:35: expected unsigned short [unsigned] [usertype] value
sound/usb/quirks.c:502:35: got restricted __le16 [usertype] <noident>
Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f03574f2d5 upstream.
This code was an optimization for 32-bit NUMA systems.
It has probably been the cause of a number of subtle bugs over
the years, although the conditions to excite them would have
been hard to trigger. Essentially, we remap part of the kernel
linear mapping area, and then sometimes part of that area gets
freed back in to the bootmem allocator. If those pages get
used by kernel data structures (say mem_map[] or a dentry),
there's no big deal. But, if anyone ever tried to use the
linear mapping for these pages _and_ cared about their physical
address, bad things happen.
For instance, say you passed __GFP_ZERO to the page allocator
and then happened to get handed one of these pages, it zero the
remapped page, but it would make a pte to the _old_ page.
There are probably a hundred other ways that it could screw
with things.
We don't need to hang on to performance optimizations for
these old boxes any more. All my 32-bit NUMA systems are long
dead and buried, and I probably had access to more than most
people.
This code is causing real things to break today:
https://lkml.org/lkml/2013/1/9/376
I looked in to actually fixing this, but it requires surgery
to way too much brittle code, as well as stuff like
per_cpu_ptr_to_phys().
[ hpa: Cc: this for -stable, since it is a memory corruption issue.
However, an alternative is to simply mark NUMA as depends BROKEN
rather than EXPERIMENTAL in the X86_32 subclause... ]
Link: http://lkml.kernel.org/r/20130131005616.1C79F411@kernel.stglabs.ibm.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6a9b7f6b1 upstream.
find_vma() can be called by multiple threads with read lock
held on mm->mmap_sem and any of them can update mm->mmap_cache.
Prevent compiler from re-fetching mm->mmap_cache, because other
readers could update it in the meantime:
thread 1 thread 2
|
find_vma() | find_vma()
struct vm_area_struct *vma = NULL; |
vma = mm->mmap_cache; |
if (!(vma && vma->vm_end > addr |
&& vma->vm_start <= addr)) { |
| mm->mmap_cache = vma;
return vma; |
^^ compiler may optimize this |
local variable out and re-read |
mm->mmap_cache |
This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:
kernel BUG at mm/rmap.c:1088!
Call Trace:
([<000003d100c57000>] 0x3d100c57000)
[<000000000023a1c0>] do_wp_page+0x2fc/0xa88
[<000000000023baae>] handle_pte_fault+0x41a/0xac8
[<000000000023d832>] handle_mm_fault+0x17a/0x268
[<000000000060507a>] do_protection_exception+0x1e2/0x394
[<0000000000603a04>] pgm_check_handler+0x138/0x13c
[<000003fffcf1f07a>] 0x3fffcf1f07a
Last Breaking-Event-Address:
[<000000000024755e>] page_add_new_anon_rmap+0xc2/0x168
Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit da28d966f6 upstream.
The return code from the registration of the thermal class is used to
unallocate resources, but this failure isn't passed back to the caller of
thermal_init. Return this failure back to the caller.
This bug was introduced in changeset 4cb18728 which overwrote the return code
when the variable was re-used to catch the return code of the registration of
the genetlink thermal socket family.
Signed-off-by: Richard Guy Briggs <rbriggs@redhat.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c678ef5286 upstream.
As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
that use a value generated by queue_var_store independent of whether
that value was set or not.
block/blk-sysfs.c: In function 'queue_store_nonrot':
block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this function [-Wmaybe-uninitialized]
Unlike most other such warnings, this one is not a false positive,
writing any non-number string into the sysfs files indeed has
an undefined result, rather than returning an error.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3dde52209 upstream.
rfc4543(gcm(*)) code for GMAC assumes that assoc scatterlist always contains
only one segment and only makes use of this first segment. However ipsec passes
assoc with three segments when using 'extended sequence number' thus in this
case rfc4543(gcm(*)) fails to function correctly. Patch fixes this issue.
Reported-by: Chaoxing Lin <Chaoxing.Lin@ultra-3eti.com>
Tested-by: Chaoxing Lin <Chaoxing.Lin@ultra-3eti.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 386afc9114 upstream.
In UP and non-preempt respectively, the spinlocks and preemption
disable/enable points are stubbed out entirely, because there is no
regular code that can ever hit the kind of concurrency they are meant to
protect against.
However, while there is no regular code that can cause scheduling, we
_do_ end up having some exceptional (literally!) code that can do so,
and that we need to make sure does not ever get moved into the critical
region by the compiler.
In particular, get_user() and put_user() is generally implemented as
inline asm statements (even if the inline asm may then make a call
instruction to call out-of-line), and can obviously cause a page fault
and IO as a result. If that inline asm has been scheduled into the
middle of a preemption-safe (or spinlock-protected) code region, we
obviously lose.
Now, admittedly this is *very* unlikely to actually ever happen, and
we've not seen examples of actual bugs related to this. But partly
exactly because it's so hard to trigger and the resulting bug is so
subtle, we should be extra careful to get this right.
So make sure that even when preemption is disabled, and we don't have to
generate any actual *code* to explicitly tell the system that we are in
a preemption-disabled region, we need to at least tell the compiler not
to move things around the critical region.
This patch grew out of the same discussion that caused commits
79e5f05edc ("ARC: Add implicit compiler barrier to raw_local_irq*
functions") and 3e2e0d2c22 ("tile: comment assumption about
__insn_mtspr for <asm/irqflags.h>") to come about.
Note for stable: use discretion when/if applying this. As mentioned,
this bug may never have actually bitten anybody, and gcc may never have
done the required code motion for it to possibly ever trigger in
practice.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fb2640159 upstream.
Some versions of pHyp will perform the adjunct partition test before the
ANDCOND test. The result of this is that H_RESOURCE can be returned and
cause the BUG_ON condition to occur. The HPTE is not removed. So add a
check for H_RESOURCE, it is ok if this HPTE is not removed as
pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a
specific HPTE to remove. So it is ok to just move on to the next slot
and try again.
Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8668fcb0b upstream.
The function returns type of ATAPI drives so it should return integer value.
The commit 4dce8ba94c (libata: Use 'bool' return value for ata_id_XXX) since
v2.6.39 changed the type of return value from int to bool, the change would
cause all of the ATAPI class drives to be treated as TYPE_TAPE and the
max_sectors of the drives to be set to 65535 because of the commit
f8d8e5799b7(libata: increase 128 KB / cmd limit for ATAPI tape drives), for the
function would return true for all ATAPI class drives and the TYPE_TAPE is
defined as 0x01.
Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aeb3a97222 upstream.
Rename "Digitial In" to "Digital In". This function is only used for
proc output, so should not cause any problems to change.
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef5692efa upstream.
In function snd_hdmi_get_eld(), the variable 'ret' should be initialized to 0.
Otherwise it will be returned uninitialized as non-zero after ELD info is got
successfully. Thus hdmi_present_sense() will always assume ELD info is invalid
by mistake, and /proc file system cannot show the proper ELD info.
Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Acked-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35e5cbc0af upstream.
After commit 21d8a15a (lookup_one_len: don't accept . and ..) reiserfs
started failing to delete xattrs from inode. This was due to a buggy
test for '.' and '..' in fill_with_dentries() which resulted in passing
'.' and '..' entries to lookup_one_len() in some cases. That returned
error and so we failed to iterate over all xattrs of and inode.
Fix the test in fill_with_dentries() along the lines of the one in
lookup_one_len().
Reported-by: Pawel Zawora <pzawora@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67e753ca41 upstream.
The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.
The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
(see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
flag and re-writes the entire media atomically, which may take really a lot
of time.
3. UBIFS clears the fixup flag in the superblock.
This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reported-by: Mark Jackson <mpfj-list@mimc.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 417a1178f1 upstream.
The dma-sh7760 currently fails with the following compile error:
sound/soc/sh/dma-sh7760.c:346:2: error: unknown field 'pcm_ops' specified in initializer
sound/soc/sh/dma-sh7760.c:346:2: warning: initialization from incompatible pointer type
sound/soc/sh/dma-sh7760.c:347:2: error: unknown field 'pcm_new' specified in initializer
sound/soc/sh/dma-sh7760.c:347:2: warning: initialization makes integer from pointer without a cast
sound/soc/sh/dma-sh7760.c:348:2: error: unknown field 'pcm_free' specified in initializer
sound/soc/sh/dma-sh7760.c:348:2: warning: initialization from incompatible pointer type
sound/soc/sh/dma-sh7760.c: In function 'sh7760_soc_platform_probe':
sound/soc/sh/dma-sh7760.c:353:2: warning: passing argument 2 of 'snd_soc_register_platform' from incompatible pointer type
include/sound/soc.h:368:5: note: expected 'struct snd_soc_platform_driver *' but argument is of type 'struct snd_soc_platform *'
This is due the misnaming of the snd_soc_platform_driver type name and 'ops'
field. The issue was introduced in commit f0fba2a("ASoC: multi-component - ASoC
Multi-Component Support").
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fcd99434fb ]
Now that netdev_rx_handler_unregister contains synchronize_net(), we need
to call it outside of bond->lock, cause it might sleep. Also, remove the
already unneded synchronize_net().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2a2876e86 upstream.
There is a bug introduced with commit 27c2127 that causes
devices which are hot unplugged and then hot-replugged to
not have per-device dma_ops set. This causes these devices
to not function correctly. Fixed with this patch.
Reported-by: Andreas Degert <andreas.degert@googlemail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c51e53689 ]
This patch enables RX of jumbo frames for LAN7500.
Previously the driver would transmit jumbo frames succesfully but
would drop received jumbo frames (incrementing the interface errors
count).
With this patch applied the device can succesfully receive jumbo
frames up to MTU 9000 (9014 bytes on the wire including ethernet
header).
Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 76a0e68129 ]
skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that
checksums were correct and CHECKSUM_NONE in any other case. They're
currently placed vice versa, which breaks the forwarding scenario. Fix it
by placing them as described above.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 00cfec3748 ]
commit 35d48903e9 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :
<quoting Steven>
I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.
In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.
But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:
slave = bond_slave_get_rcu(skb->dev);
bond = slave->bond; <--- BUG
the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.
Looking at the code that unregisters the handler:
void netdev_rx_handler_unregister(struct net_device *dev)
{
ASSERT_RTNL();
RCU_INIT_POINTER(dev->rx_handler, NULL);
RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}
Which is basically:
dev->rx_handler = NULL;
dev->rx_handler_data = NULL;
And looking at __netif_receive_skb() we have:
rx_handler = rcu_dereference(skb->dev->rx_handler);
if (rx_handler) {
if (pt_prev) {
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = NULL;
}
switch (rx_handler(&skb)) {
My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading? What happens if the interrupt
triggers and we have this:
CPU0 CPU1
---- ----
rx_handler = skb->dev->rx_handler
netdev_rx_handler_unregister() {
dev->rx_handler = NULL;
dev->rx_handler_data = NULL;
rx_handler()
bond_handle_frame() {
slave = skb->dev->rx_handler;
bond = slave->bond; <-- NULL pointer dereference!!!
What protection am I missing in the bond release handler that would
prevent the above from happening?
</quoting Steven>
We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.
A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.
The second way is better as it avoids an extra test in fast path.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14bc435ea5 ]
According to the Datasheet (page 52):
15-12 Reserved
11-0 RXBC Receive Byte Count
This field indicates the present received frame byte size.
The code has a bug:
rxh = ks8851_rdreg32(ks, KS_RXFHSR);
rxstat = rxh & 0xffff;
rxlen = rxh >> 16; // BUG!!! 0xFFF mask should be applied
Signed-off-by: Max Nekludov <Max.Nekludov@us.elster.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To restart tx queue use netif_wake_queue() intead of netif_start_queue()
so that net schedule will restart transmission immediately which will
increase network performance while doing huge data transfers.
Reported-by: Dan Franke <dan.franke@schneider-electric.com>
Suggested-by: Sriramakrishnan A G <srk@ti.com>
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 91c5746425 ]
Some network drivers use a non default hard_header_len
Transmitted skb should take into account dev->hard_header_len, or risk
crashes or expensive reallocations.
In the case of aoe, lets reserve MAX_HEADER bytes.
David reported a crash in defxx driver, solved by this patch.
Reported-by: David Oostdyk <daveo@ll.mit.edu>
Tested-by: David Oostdyk <daveo@ll.mit.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ded34e0fe8 ]
As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.
Dave, I think this one should go on the -stable pile.
Special thanks to Jan for coming up with a reproducer for this
problem.
Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits 73214f5d9f
and f1e79e2080, the latter
adds an assertion to genetlink to prevent this from happening
again in the future. ]
The original name is too long.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a7df340ed ]
vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.
This patch moves vlan_vid_del() as behind as possible.
Signed-off-by: Cong Wang <amwang@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7ebe183c6d ]
On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f4541d60a4 ]
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.
Current code leads to following bad bursty behavior :
20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
While the expected behavior is more like :
20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 74f9f42c1c ]
The sky2 driver sets the Rx Upper Threshold for Pause Packet generation to a
wrong value which leads to only 2kB of RAM remaining space. This can lead to
Rx overflow errors even with activated flow-control.
Fix: We should increase the value to 8192/8
Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9cfe8b156c ]
The sky2 driver doesn't count the Receive Overflows because the MAC
interrupt for this event is not set in the MAC's interrupt mask.
The MAC's interrupt mask is set only for Transmit FIFO Underruns.
Fix: The correct setting should be (GM_IS_TX_FF_UR | GM_IS_RX_FF_OR)
Otherwise the Receive Overflow event will not generate any interrupt.
The Receive Overflow interrupt is handled correctly
Signed-off-by: Mirko Lindner <mlindner@marvell.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 613f04a0f5 upstream.
The latency tracers require the buffers to be in overwrite mode,
otherwise they get screwed up. Force the buffers to stay in overwrite
mode when latency tracers are enabled.
Added a flag_changed() method to the tracer structure to allow
the tracers to see what flags are being changed, and also be able
to prevent the change from happing.
[Backported for 3.4-stable. Re-added current_trace NULL checks; removed
allocated_snapshot field; adapted to tracing_trace_options_write without
trace_set_options.]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69d34da298 upstream.
Seems that the tracer flags have never been protected from
synchronous writes. Luckily, admins don't usually modify the
tracing flags via two different tasks. But if scripts were to
be used to modify them, then they could get corrupted.
Move the trace_types_lock that protects against tracers changing
to also protect the flags being set.
[Backported for 3.4, 3.0-stable. Moved return to after unlock.]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90ba983f68 upstream.
A user who was using a 8TB+ file system and with a very large flexbg
size (> 65536) could cause the atomic_t used in the struct flex_groups
to overflow. This was detected by PaX security patchset:
http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551
This bug was introduced in commit 9f24e4208f, so it's been around
since 2.6.30. :-(
Fix this by using an atomic64_t for struct orlav_stats's
free_clusters.
[Backported for 3.0-stable. Renamed free_clusters back to free_blocks;
fixed a few more atomic_read's of free_blocks left in 3.0.]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e971318bbe upstream.
Some firmware exhibits a bug where the same VariableName and
VendorGuid values are returned on multiple invocations of
GetNextVariableName(). See,
https://bugzilla.kernel.org/show_bug.cgi?id=47631
As a consequence of such a bug, Andre reports hitting the following
WARN_ON() in the sysfs code after updating the BIOS on his, "Gigabyte
Technology Co., Ltd. To be filled by O.E.M./Z77X-UD3H, BIOS F19e
11/21/2012)" machine,
[ 0.581554] EFI Variables Facility v0.08 2004-May-17
[ 0.584914] ------------[ cut here ]------------
[ 0.585639] WARNING: at /home/andre/linux/fs/sysfs/dir.c:536 sysfs_add_one+0xd4/0x100()
[ 0.586381] Hardware name: To be filled by O.E.M.
[ 0.587123] sysfs: cannot create duplicate filename '/firmware/efi/vars/SbAslBufferPtrVar-01f33c25-764d-43ea-aeea-6b5a41f3f3e8'
[ 0.588694] Modules linked in:
[ 0.589484] Pid: 1, comm: swapper/0 Not tainted 3.8.0+ #7
[ 0.590280] Call Trace:
[ 0.591066] [<ffffffff81208954>] ? sysfs_add_one+0xd4/0x100
[ 0.591861] [<ffffffff810587bf>] warn_slowpath_common+0x7f/0xc0
[ 0.592650] [<ffffffff810588bc>] warn_slowpath_fmt+0x4c/0x50
[ 0.593429] [<ffffffff8134dd85>] ? strlcat+0x65/0x80
[ 0.594203] [<ffffffff81208954>] sysfs_add_one+0xd4/0x100
[ 0.594979] [<ffffffff81208b78>] create_dir+0x78/0xd0
[ 0.595753] [<ffffffff81208ec6>] sysfs_create_dir+0x86/0xe0
[ 0.596532] [<ffffffff81347e4c>] kobject_add_internal+0x9c/0x220
[ 0.597310] [<ffffffff81348307>] kobject_init_and_add+0x67/0x90
[ 0.598083] [<ffffffff81584a71>] ? efivar_create_sysfs_entry+0x61/0x1c0
[ 0.598859] [<ffffffff81584b2b>] efivar_create_sysfs_entry+0x11b/0x1c0
[ 0.599631] [<ffffffff8158517e>] register_efivars+0xde/0x420
[ 0.600395] [<ffffffff81d430a7>] ? edd_init+0x2f5/0x2f5
[ 0.601150] [<ffffffff81d4315f>] efivars_init+0xb8/0x104
[ 0.601903] [<ffffffff8100215a>] do_one_initcall+0x12a/0x180
[ 0.602659] [<ffffffff81d05d80>] kernel_init_freeable+0x13e/0x1c6
[ 0.603418] [<ffffffff81d05586>] ? loglevel+0x31/0x31
[ 0.604183] [<ffffffff816a6530>] ? rest_init+0x80/0x80
[ 0.604936] [<ffffffff816a653e>] kernel_init+0xe/0xf0
[ 0.605681] [<ffffffff816ce7ec>] ret_from_fork+0x7c/0xb0
[ 0.606414] [<ffffffff816a6530>] ? rest_init+0x80/0x80
[ 0.607143] ---[ end trace 1609741ab737eb29 ]---
There's not much we can do to work around and keep traversing the
variable list once we hit this firmware bug. Our only solution is to
terminate the loop because, as Lingzhu reports, some machines get
stuck when they encounter duplicate names,
> I had an IBM System x3100 M4 and x3850 X5 on which kernel would
> get stuck in infinite loop creating duplicate sysfs files because,
> for some reason, there are several duplicate boot entries in nvram
> getting GetNextVariableName into a circle of iteration (with
> period > 2).
Also disable the workqueue, as efivar_update_sysfs_entries() uses
GetNextVariableName() to figure out which variables have been created
since the last iteration. That algorithm isn't going to work if
GetNextVariableName() returns duplicates. Note that we don't disable
EFI variable creation completely on the affected machines, it's just
that any pstore dump-* files won't appear in sysfs until the next
boot.
[Backported for 3.0-stable. Removed code related to pstore
workqueue but pulled in helper function variable_is_present
from a93bc0c; Moved the definition of __efivars to the top
for being referenced in variable_is_present.]
Reported-by: Andre Heider <a.heider@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Tested-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec50bd32f1 upstream.
It's not wise to assume VariableNameSize represents the length of
VariableName, as not all firmware updates VariableNameSize in the same
way (some don't update it at all if EFI_SUCCESS is returned). There
are even implementations out there that update VariableNameSize with
values that are both larger than the string returned in VariableName
and smaller than the buffer passed to GetNextVariableName(), which
resulted in the following bug report from Michael Schroeder,
> On HP z220 system (firmware version 1.54), some EFI variables are
> incorrectly named :
>
> ls -d /sys/firmware/efi/vars/*8be4d* | grep -v -- -8be returns
> /sys/firmware/efi/vars/dbxDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
> /sys/firmware/efi/vars/KEKDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
> /sys/firmware/efi/vars/SecureBoot-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
> /sys/firmware/efi/vars/SetupMode-Information8be4df61-93ca-11d2-aa0d-00e098032b8c
The issue here is that because we blindly use VariableNameSize without
verifying its value, we can potentially read garbage values from the
buffer containing VariableName if VariableNameSize is larger than the
length of VariableName.
Since VariableName is a string, we can calculate its size by searching
for the terminating NULL character.
[Backported for 3.8-stable. Removed workqueue code added in
a93bc0c 3.9-rc1.]
Reported-by: Frederic Crozat <fcrozat@suse.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Michael Schroeder <mls@suse.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Lingzhu Xiang <lxiang@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a35f83b2b upstream.
Restore crtc->fb to the old framebuffer if queue_flip fails.
While at it, kill the pointless intel_fb temp variable.
v2: Update crtc->fb before queue_flip and restore it back
after a failure.
[Backported for 3.0-stable. Adjusted context. Please
cherry-pick commit 7317c75e66
upstream before this patch as it provides necessary context
and fixes a panic.]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64a817cfbd upstream.
Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.
The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1681bf8a7 upstream.
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d1068b3a9 upstream.
On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.
invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
[<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
[<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70
QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.
Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 08dff7b7d6 upstream.
When online_pages() is called to add new memory to an empty zone, it
rebuilds all zone lists by calling build_all_zonelists(). But there's a
bug which prevents the new zone to be added to other nodes' zone lists.
online_pages() {
build_all_zonelists()
.....
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
}
Here the node of the zone is put into N_HIGH_MEMORY state after calling
build_all_zonelists(), but build_all_zonelists() only adds zones from
nodes in N_HIGH_MEMORY state to the fallback zone lists.
build_all_zonelists()
->__build_all_zonelists()
->build_zonelists()
->find_next_best_node()
->for_each_node_state(n, N_HIGH_MEMORY)
So memory in the new zone will never be used by other nodes, and it may
cause strange behavor when system is under memory pressure. So put node
into N_HIGH_MEMORY state before calling build_all_zonelists().
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2ebd422f7 upstream.
kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI. However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.
Fix by ensuring that an MSI entry is added to an empty list.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b92946e291 upstream.
There're several reasons that the vectors need to be validated:
- Return error when caller provides vectors whose num is greater than UIO_MAXIOV.
- Linearize part of skb when userspace provides vectors grater than MAX_SKB_FRAGS.
- Return error when userspace provides vectors whose total length may exceed
- MAX_SKB_FRAGS * PAGE_SIZE.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.de> [patch reduced to
the 3rd reason only for 3.0]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e515705a1 upstream.
If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.
Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP
This is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.
Based on earlier patch by Michael Ellerman.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 56d08fef23 upstream.
Squelch compiler warnings:
fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
unsigned integer expressions [-Wsign-compare]
Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
acl data", Dec 7, 2011.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 331818f1c4 upstream.
Commit bf118a342f (NFSv4: include bitmap
in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
where we may need to decode multi-page data. However it fails to take
into account the fact that the variable may be NULL (for the case where
we're not doing multi-page decode), and it also attaches it to the
encoding xdr_stream rather than the decoding one.
The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
call to page_address() with a NULL page pointer.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf118a342f upstream.
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0924ab2cfa upstream.
User space may create the PIT and forgets about setting up the irqchips.
In that case, firing PIT IRQs will crash the host:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
...
Call Trace:
[<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
[<ffffffff81071431>] process_one_work+0x111/0x4d0
[<ffffffff81071bb2>] worker_thread+0x152/0x340
[<ffffffff81075c8e>] kthread+0x7e/0x90
[<ffffffff815a4474>] kernel_thread_helper+0x4/0x10
Prevent this by checking the irqchip mode before starting a timer. We
can't deny creating the PIT if the irqchips aren't set up yet as
current user land expects this order to work.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5a1eeef04 upstream.
Don't write more than the requested number of bytes of an batman-adv icmp
packet to the userspace buffer. Otherwise unrelated userspace memory might get
overridden by the kernel.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c00b6856fc upstream.
Writing a icmp_packet_rr and then reading icmp_packet can lead to kernel
memory corruption, if __user *buf is just below TASK_SIZE.
Signed-off-by: Paul Kot <pawlkt@gmail.com>
[sven@narfation.org: made it checkpatch clean]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb101ed2c3 upstream.
There are multiple locations in the X.25 packet layer where a skb is
assumed to be of at least a certain size and that all its data is
currently available at skb->data. These assumptions are not checked,
hence buffer overreads may occur. Use pskb_may_pull to check these
minimal size assumptions and ensure that data is available at skb->data
when necessary, as well as use skb_copy_bits where needed.
Signed-off-by: Matthew Daley <mattjd@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d780592b99 upstream.
So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdf30d1c1b upstream.
A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space. This is because the
global block reserve was taking up the entire free metadata space. This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done. This allowed the user to
complete his rsync without issues. Thanks
Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c11a172cb upstream.
Use proper macro while extracting TRB transfer length from
Transfer event TRBs. Adding a macro EVENT_TRB_LEN (bits 0:23)
for the same, and use it instead of TRB_LEN (bits 0:16) in
case of event TRBs.
This patch should be backported to kernels as old as 2.6.31, that
contain the commit b10de14211 "USB: xhci:
Bulk transfer support". This patch will have issues applying to older
kernels.
Signed-off-by: Vivek gautam <gautam.vivek@samsung.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 084c7189ac upstream.
curr_cmd points to the command that is in processing or waiting
for its command response from firmware. If the function shutdown
happens to occur at this time we should cancel the cmd timer and
put the command back to free queue.
Tested-by: Marco Cesarano <marco@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e5e098ac2 upstream.
Commit 7708992 ("xen/blkback: Seperate the bio allocation and the bio
submission") consolidated the pendcnt updates to just a single write,
neglecting the fact that the error path relied on it getting set to 1
up front (such that the decrement in __end_block_io_op() would actually
drop the count to zero, triggering the necessary cleanup actions).
Also remove a misleading and a stale (after said commit) comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b251412db9 upstream.
Intermittently, b43 will report "Out of order TX status report on DMA ring".
When this happens, the driver must be reset before communication can resume.
The cause of the problem is believed to be an error in the closed-source
firmware; however, all versions of the firmware are affected.
This change uses the observation that the expected status is always 2 less
than the observed value, and supplies a fake status report to skip one
header/data pair.
Not all devices suffer from this problem, but it can occur several times
per second under heavy load. As each occurence kills the unmodified driver,
this patch makes if possible for the affected devices to function. The patch
logs only the first instance of the reset operation to prevent spamming
the logs.
Tested-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e5110f411d upstream.
In case of 'if (filp->f_pos == 0 or 1)' of sysfs_readdir(),
the failure from filldir() isn't handled, and the reference counter
of the sysfs_dirent object pointed by filp->private_data will be
released without clearing filp->private_data, so use after free
bug will be triggered later.
This patch returns immeadiately under the situation for fixing the bug,
and it is reasonable to return from readdir() when filldir() fails.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 991f76f837 upstream.
While readdir() is running, lseek() may set filp->f_pos as zero,
then may leave filp->private_data pointing to one sysfs_dirent
object without holding its reference counter, so the sysfs_dirent
object may be used after free in next readdir().
This patch holds inode->i_mutex to avoid the problem since
the lock is always held in readdir path.
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4317ce877 upstream.
For the s626 driver, there is a bug in the handling of asynchronous
commands on the AI subdevice when the stop source is `TRIG_NONE`. The
command should run continuously until cancelled, but the interrupt
handler stops the command running after the first scan.
The command set-up function `s626_ai_cmd()` contains this code:
switch (cmd->stop_src) {
case TRIG_COUNT:
/* data arrives as one packet */
devpriv->ai_sample_count = cmd->stop_arg;
devpriv->ai_continous = 0;
break;
case TRIG_NONE:
/* continous acquisition */
devpriv->ai_continous = 1;
devpriv->ai_sample_count = 0;
break;
}
The interrupt handler `s626_irq_handler()` contains this code:
if (!(devpriv->ai_continous))
devpriv->ai_sample_count--;
if (devpriv->ai_sample_count <= 0) {
devpriv->ai_cmd_running = 0;
/* ... */
}
So `devpriv->ai_sample_count` is only decremented for the `TRIG_COUNT`
case, but `devpriv->ai_cmd_running` is set to 0 (and the command
stopped) regardless.
Fix this in `s626_ai_cmd()` by setting `devpriv->ai_sample_count = 1`
for the `TRIG_NONE` case. The interrupt handler will not decrement it
so it will remain greater than 0 and the check for stopping the
acquisition will fail.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1166fde6a9 upstream.
We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vaguely based on upstream commit 574c4866e3 'consolidate kernel-side
struct sigaction declarations'.
flush_signal_handlers() needs to know whether sigaction::sa_restorer
is defined, not whether SA_RESTORER is defined. Define the
__ARCH_HAS_SA_RESTORER macro to indicate this.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb7da02245 upstream.
Since commit 8871e99f89 ('asus-laptop: HRWS/HWRS typo'), module
initialisation is very slow on the Asus UL30A. The HWRS method takes
about 12 seconds to run, and subsequent initialisation also seems to
be delayed. Since we don't really need the result, don't bother
calling it on init. Those who are curious can still get the result
through the 'infos' device attribute.
Update the comment about HWRS in show_infos().
Reported-by: ryan <draziw+deb@gmail.com>
References: http://bugs.debian.org/692436
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Corentin Chary <corentin.chary@gmail.com>
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ef9e2f6d1 upstream.
If CONFIG_MAC80211_MESH is not set, cfg80211 will now allow advertising
interface combinations with NL80211_IFTYPE_MESH_POINT present.
Add appropriate ifdefs to avoid running into errors.
[Backported for 3.8-stable. Removed code of simultaneous AP and mesh
mode added in 4a5fc6d 3.9-rc1.]
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Lingzhu Xiang <lxiang@redhat.com>
Reviewed-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d740269867 upstream.
To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.
This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:
if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}
The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.
Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d627b62ff8 upstream.
This is rather a hack to fix brightness hotkeys on a Clevo laptop. CADL is not
used anywhere in the driver code at the moment, but it could be used in BIOS as
is the case with the Clevo laptop.
The Clevo B7130 requires the CADL field to contain at least the ID of
the LCD device. If this field is empty, the ACPI methods that are called
on pressing brightness / display switching hotkeys will not trigger a
notification. As a result, it appears as no hotkey has been pressed.
Reference: https://bugs.freedesktop.org/show_bug.cgi?id=45452
Tested-by: Peter Wu <lekensteyn@gmail.com>
Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8aec0f5d41 upstream.
Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().
This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:
Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().
I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.
While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.
And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5492bf3d56 upstream.
Add missing get_icount field to two-port driver.
The two-port driver was not updated when switching to the new icount
interface in commit 0bca1b913a ("tty: Convert the USB drivers to the
new icount interface").
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 618aa1068d upstream.
Remove bogus disconnect test introduced by 95bef012e ("USB: more serial
drivers writing after disconnect") which prevented queued data from
being freed on disconnect.
The possible IO it was supposed to prevent is long gone.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89b1f39eb4 upstream.
For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.
Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.
Reported-and-tested-by: v10lator@myway.de
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Jim Trigg <jtrigg@spamcop.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7971051e4 upstream.
Make sure the interface is not released before our serial device.
Note that drivers are still not allowed to access the interface in
any way that may interfere with another driver that may have gotten
bound to the same interface after disconnect returns.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8264340e6 upstream.
According to XHCI specification (5.5.2.1) the IP is bit 0 and IE is bit 1
of IMAN register. Previously their definitions were reversed.
Even though there are no ill effects being observed from the swapped
definitions (because IMAN_IP is RW1C and in legacy PCI case we come in
with it already set to 1 so it was clearing itself even though we were
setting IMAN_IE instead of IMAN_IP), we should still correct the values.
This patch should be backported to kernels as old as 2.6.36, that
contain the commit 4e833c0b87 "xhci: don't
re-enable IE constantly".
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7dc19b865 upstream.
Currently tick_check_broadcast_device doesn't reject clock_event_devices
with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real
hardware if they have a higher rating value. In this situation, the
dummy timer is responsible for broadcasting to itself, and the core
clockevents code may attempt to call non-existent callbacks for
programming the dummy, eventually leading to a panic.
This patch makes tick_check_broadcast_device always reject dummy timers,
preventing this problem.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ee9e2aa7b upstream.
Commit f0dc117abd ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.
The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.
This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq(). If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.
This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a2256702e upstream.
This commit fixes a wrong return value of the number of the allocated
blocks in ext4_split_extent. When the length of blocks we want to
allocate is greater than the length of the current extent, we return a
wrong number. Let's see what happens in the following case when we
call ext4_split_extent().
map: [48, 72]
ex: [32, 64, u]
'ex' will be split into two parts:
ex1: [32, 47, u]
ex2: [48, 64, w]
'map->m_len' is returned from this function, and the value is 24. But
the real length is 16. So it should be fixed.
Meanwhile in this commit we use right length of the allocated blocks
when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents
is called.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f853c61688 upstream.
We've had several reports of people attempting to mount Windows 8 shares
and getting failures with a return code of -EINVAL. The default sec=
mode changed recently to sec=ntlmssp. With that, we expect and parse a
SPNEGO blob from the server in the NEGOTIATE reply.
The current decode_negTokenInit function first parses all of the
mechTypes and then tries to parse the rest of the negTokenInit reply.
The parser however currently expects a mechListMIC or nothing to follow the
mechTypes, but Windows 8 puts a mechToken field there instead to carry
some info for the new NegoEx stuff.
In practice, we don't do anything with the fields after the mechTypes
anyway so I don't see any real benefit in continuing to parse them.
This patch just has the kernel ignore the fields after the mechTypes.
We'll probably need to reinstate some of this if we ever want to support
NegoEx.
Reported-by: Jason Burgess <jason@jacknife2.dns2go.com>
Reported-by: Yan Li <elliot.li.tech@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d00285884c upstream.
hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).
If the system is configured for more than one hugepage size, which is
possible since commit a137e1cc6d ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp. shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory. This can lead to an
unexpected ENOMEM/EFAULT resp. SIGSEGV when memory is accounted.
Testcase:
boot: hugepagesz=1G hugepages=1
the default overcommit ratio is 50
before patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 55434168 kB
after patch:
egrep 'CommitLimit' /proc/meminfo
CommitLimit: 54909880 kB
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 16dad1d743 upstream.
EDID spreads some values across multiple bytes; bit-fiddling is needed
to retrieve these. The current code to parse "detailed timings" has a
cut&paste error that results in a vsync offset of at most 15 lines
instead of 63.
See
http://en.wikipedia.org/wiki/EDID
and in the "EDID Detailed Timing Descriptor" see bytes 10+11 show why
that needs to be a left shift.
Signed-off-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3118a4f652 upstream.
It is possible to wrap the counter used to allocate the buffer for
relocation copies. This could lead to heap writing overflows.
CVE-2013-0913
v3: collapse test, improve comment
v2: move check into validate_exec_list
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Pinkie Pie
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 740466bc89 upstream.
Because function tracing is very invasive, and can even trace
calls to rcu_read_lock(), RCU access in function tracing is done
with preempt_disable_notrace(). This requires a synchronize_sched()
for updates and not a synchronize_rcu().
Function probes (traceon, traceoff, etc) must be freed after
a synchronize_sched() after its entry has been removed from the
hash. But call_rcu() is used. Fix this by using call_rcu_sched().
Also fix the usage to use hlist_del_rcu() instead of hlist_del().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2721e72dd1 upstream.
Although the swap is wrapped with a spin_lock, the assignment
of the temp buffer used to swap is not within that lock.
It needs to be moved into that lock, otherwise two swaps
happening on two different CPUs, can end up using the wrong
temp buffer to assign in the swap.
Luckily, all current callers of the swap function appear to have
their own locks. But in case something is added that allows two
different callers to call the swap, then there's a chance that
this race can trigger and corrupt the buffers.
New code is coming soon that will allow for this race to trigger.
I've Cc'd stable, so this bug will not show up if someone backports
one of the changes that can trigger this bug.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2563a4524f upstream.
Masks kernel address info-leak in object dumps with the %pK suffix,
so they cannot be used to target kernel memory corruption attacks if
the kptr_restrict sysctl is set.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83ea5d18d7 upstream.
Creation of individual mixer controls may fail, but that shouldn't cause
the entire mixer creation to fail. Even worse, if the mixer creation
fails, that will error out the entire device probing.
All the functions called by parse_audio_unit() should return -EINVAL if
they find descriptors that are unsupported or believed to be malformed,
so we can safely handle this error code as a non-fatal condition in
snd_usb_mixer_controls().
That fixes a long standing bug which is commonly worked around by
adding quirks which make the driver ignore entire interfaces. Some of
them might now be unnecessary.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Rodolfo Thomazelli <pe.soberbo@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d7b86c98e upstream.
In check_input_term() and parse_audio_feature_unit(), propagate the
error value that has been returned by a failing function instead of
-EINVAL. That helps cleaning up the error pathes in the mixer.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a686fd141e upstream.
There is a typo in convert_to_spdif_status() about checking the
emphasis IEC958 status bit. It should check the given value instead
of the resultant value.
Reported-by: Martin Weishart <martin.weishart@telosalliance.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fae8563b25 ]
Using TX push when notifying the NIC of multiple new descriptors in
the ring will very occasionally cause the TX DMA engine to re-use an
old descriptor. This can result in a duplicated or partly duplicated
packet (new headers with old data), or an IOMMU page fault. This does
not happen when the pushed descriptor is the only one written.
TX push also provides little latency benefit when a packet requires
more than one descriptor.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35205b211c ]
efx_device_detach_sync() locks all TX queues before marking the device
detached and thus disabling further TX scheduling. But it can still
be interrupted by TX completions which then result in TX scheduling in
soft interrupt context. This will deadlock when it tries to acquire
a TX queue lock that efx_device_detach_sync() already acquired.
To avoid deadlock, we must use netif_tx_{,un}lock_bh().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 29c69a4882 ]
We must only ever stop TX queues when they are full or the net device
is not 'ready' so far as the net core, and specifically the watchdog,
is concerned. Otherwise, the watchdog may fire *immediately* if no
packets have been added to the queue in the last 5 seconds.
The device is ready if all the following are true:
(a) It has a qdisc
(b) It is marked present
(c) It is running
(d) The link is reported up
(a) and (c) are normally true, and must not be changed by a driver.
(d) is under our control, but fake link changes may disturb userland.
This leaves (b). We already mark the device absent during reset
and self-test, but we need to do the same during MTU changes and ring
reallocation. We don't need to do this when the device is brought
down because then (c) is already false.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits 06e63c57ac,
b590ace09d and
c73e787a8d ]
We assume that the mapping between DMA and virtual addresses is done
on whole pages, so we can find the page offset of an RX buffer using
the lower bits of the DMA address. However, swiotlb maps in units of
2K, breaking this assumption.
Add an explicit page_offset field to struct efx_rx_buffer.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3a68f19d7a ]
We may currently allocate two RX DMA buffers to a page, and only unmap
the page when the second is completed. We do not sync the first RX
buffer to be completed; this can result in packet loss or corruption
if the last RX buffer completed in a NAPI poll is the first in a page
and is not DMA-coherent. (In the middle of a NAPI poll, we will
handle the following RX completion and unmap the page *before* looking
at the content of the first buffer.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ebf98e797b ]
efx_mcdi_poll() uses get_seconds() to read the current time and to
implement a polling timeout. The use of this function was chosen
partly because it could easily be replaced in a co-sim environment
with a macro that read the simulated time.
Unfortunately the real get_seconds() returns the system time (real
time) which is subject to adjustment by e.g. ntpd. If the system time
is adjusted forward during a polled MCDI operation, the effective
timeout can be shorter than the intended 10 seconds, resulting in a
spurious failure. It is also possible for a backward adjustment to
delay detection of a areal failure.
Use jiffies instead, and change MCDI_RPC_TIMEOUT to be denominated in
jiffies. Also correct rounding of the timeout: check time > finish
(or rather time_after(time, finish)) and not time >= finish.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c2f3b8e3a4 ]
The assertion of netif_device_present() at the top of
efx_hard_start_xmit() may fail if we don't do this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits a606f4325d,
d5e8cc6c94,
525d9e8240 ]
The TX DMA engine issues upstream read requests when there is room in
the TX FIFO for the completion. However, the fetches for the rest of
the packet might be delayed by any back pressure. Since a flush must
wait for an EOP, the entire flush may be delayed by back pressure.
Mitigate this by disabling flow control before the flushes are
started. Since PF and VF flushes run in parallel introduce
fc_disable, a reference count of the number of flushes outstanding.
The same principle could be applied to Falcon, but that
would bring with it its own testing.
We sometimes hit a "failed to flush" timeout on some TX queues, but the
flushes have completed and the flush completion events seem to go missing.
In this case, we can check the TX_DESC_PTR_TBL register and drain the
queues if the flushes had finished.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0:
- Call efx_nic_type::finish_flush() on both success and failure paths
- Check the TX_DESC_PTR_TBL registers in the polling loop
- Declare efx_mcdi_set_mac() extern]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bfeed90294 ]
On big-endian systems the MTD partition names currently have mangled
subtype numbers and are not recognised by the firmware update tool
(sfupdate).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
[bwh: Backported to 3.0: use old macros for length of firmware subtype array]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3dca9d2dc2 ]
efx_nic_fatal_interrupt() disables DMA before scheduling a reset.
After this, we need not and *cannot* flush queues.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4017dbdc14 ]
efx_filter_remove_filter() fails to remove inserted filters in some cases.
For example:
1. Two filters A and B have specifications that result in an initial
hash collision.
2. A is inserted first, followed by B.
3. An attempt to remove B first succeeds, but if A is removed first
a subsequent attempt to remove B fails.
When searching for an existing filter (!for_insert),
efx_filter_search() must always continue to the maximum search depth
for the given type rather than stopping at the first unused entry.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a3da1fe95 ]
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.
If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.
I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a5b8db9144 ]
Range/validity checks on rta_type in rtnetlink_rcv_msg() do
not account for flags that may be set. This causes the function
to return -EINVAL when flags are set on the type (for example
NLA_F_NESTED).
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5b9e12dbf9 ]
a long time ago by the commit
commit 93456b6d77
Author: Denis V. Lunev <den@openvz.org>
Date: Thu Jan 10 03:23:38 2008 -0800
[IPV4]: Unify access to the routing tables.
the defenition of FIB_HASH_TABLE size has obtained wrong dependency:
it should depend upon CONFIG_IP_MULTIPLE_TABLES (as was in the original
code) but it was depended from CONFIG_IP_ROUTE_MULTIPATH
This patch returns the situation to the original state.
The problem was spotted by Tingwei Liu.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Tingwei Liu <tingw.liu@gmail.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2317f449af ]
sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f315bef23 ]
__netpoll_cleanup() is called in netconsole_netdev_event() while holding a
spinlock. Release/acquire the spinlock before/after it and restart the
loop. Also, disable the netconsole completely, because we won't have chance
after the restart of the loop, and might end up in a situation where
nt->enabled == 1 and nt->np.dev == NULL.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4660c7f498 ]
This is needed in order to detect if the timestamp option appears
more than once in a packet, to remove the option if the packet is
fragmented, etc. My previous change neglected to store the option
location when the router addresses were prespecified and Pointer >
Length. But now the option location is also stored when Flag is an
unrecognized value, to ensure these option handling behaviors are
still performed.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cb29529ea0 ]
If a machine has X (X < 4) sunsu ports and cmdline
option "console=ttySY" is passed, where X < Y <= 4,
than the following panic happens:
Unable to handle kernel NULL pointer dereference
TPC: <sunsu_console_setup+0x78/0xe0>
RPC: <sunsu_console_setup+0x74/0xe0>
I7: <register_console+0x378/0x3e0>
Call Trace:
[0000000000453a38] register_console+0x378/0x3e0
[0000000000576fa0] uart_add_one_port+0x2e0/0x340
[000000000057af40] su_probe+0x160/0x2e0
[00000000005b8a4c] platform_drv_probe+0xc/0x20
[00000000005b6c2c] driver_probe_device+0x12c/0x220
[00000000005b6da8] __driver_attach+0x88/0xa0
[00000000005b4df4] bus_for_each_dev+0x54/0xa0
[00000000005b5a54] bus_add_driver+0x154/0x260
[00000000005b7190] driver_register+0x50/0x180
[00000000006d250c] sunsu_init+0x18c/0x1e0
[00000000006c2668] do_one_initcall+0xe8/0x160
[00000000006c282c] kernel_init_freeable+0x12c/0x1e0
[0000000000603764] kernel_init+0x4/0x100
[0000000000405f64] ret_from_syscall+0x1c/0x2c
[0000000000000000] (null)
1)Fix the panic;
2)Increment registered port number every successful
probe.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 29cd8ae0e1 ]
The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
struct,
Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 84d73cd3fb ]
Initialize the mac address buffer with 0 as the driver specific function
will probably not fill the whole buffer. In fact, all in-kernel drivers
fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible
bytes. Therefore we currently leak 26 bytes of stack memory to userland
via the netlink interface.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3bc1b1add7 ]
The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer
counted as dropped. They are counted as successfully received by
'netif_receive_skb'.
This allows network interface drivers to correctly update their RX-OK and
RX-DRP counters based on the result of 'netif_receive_skb'.
Signed-off-by: Cristian Bercaru <B43982@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits 0c1233aba1 and
a6a8fe950e ]
When we have a large number of static label mappings that spill across
the netlink message boundary we fail to properly save our state in the
netlink_callback struct which causes us to repeat the same listings.
This patch fixes this problem by saving the state correctly between
calls to the NetLabel static label netlink "dumpit" routines.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit aab2b4bf22 ]
We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both
before and after going to step5. That wastes CPU and double-counts the
receiver-side RTT sample.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e8b0ac3e4 ]
Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel
to accept RAs even when forwarding is enabled. However, enabling
forwarding purges all default routes on the system, breaking
connectivity until the next RA is received. Fix this by not
purging default routes on interfaces that have accept_ra=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8b82547e33 ]
The sendmsg() syscall handler for PPPoL2TP doesn't decrease the socket
reference counter after successful transmissions. Any successful
sendmsg() call from userspace will then increase the reference counter
forever, thus preventing the kernel's session and tunnel data from
being freed later on.
The problem only happens when writing directly on L2TP sockets.
PPP sockets attached to L2TP are unaffected as the PPP subsystem
uses pppol2tp_xmit() which symmetrically increase/decrease reference
counters.
This patch adds the missing call to sock_put() before returning from
pppol2tp_sendmsg().
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5370019dc2 upstream.
bd_mutex and lo_ctl_mutex can be held in different order.
Path #1:
blkdev_open
blkdev_get
__blkdev_get (hold bd_mutex)
lo_open (hold lo_ctl_mutex)
Path #2:
blkdev_ioctl
lo_ioctl (hold lo_ctl_mutex)
lo_set_capacity (hold bd_mutex)
Lockdep does not report it, because path #2 actually holds a subclass of
lo_ctl_mutex. This subclass seems creep into the code by mistake. The
patch author actually just mentioned it in the changelog, see commit
f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:
http://marc.info/?l=linux-kernel&m=123806169129727&w=2
Path #2 hold bd_mutex to call bd_set_size(), I've protected it
with i_mutex in a previous patch, so drop bd_mutex at this site.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3e78080f81 ('hwmon: (sht15) Check return value of
regulator_enable()') depends on the use of devm_kmalloc() for automatic
resource cleanup in the failure cases, which was introduced in 3.7. In
older stable branches, explicit cleanup is needed.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e79e0fe380 upstream.
Subsequent threads returning EBUSY from vm_insert_pfn() was not handled
correctly. As a result concurrent access from new threads to
mmapped data caused SIGBUS.
Note that this fixes i-g-t/tests/gem_threaded_tiled_access.
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc178622d4 upstream.
Doing this would reliably fail with -EBUSY for me:
# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy
because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.
Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:
btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);
so unmount might complete before __free_device fires & does its blkdev_put.
Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6a70a0707 upstream.
Our flush_tlb_kernel_range() implementation calls __tlb_flush_mm() with
&init_mm as argument. __tlb_flush_mm() however will only flush tlbs
for the passed in mm if its mm_cpumask is not empty.
For the init_mm however its mm_cpumask has never any bits set. Which in
turn means that our flush_tlb_kernel_range() implementation doesn't
work at all.
This can be easily verified with a vmalloc/vfree loop which allocates
a page, writes to it and then frees the page again. A crash will follow
almost instantly.
To fix this remove the cpumask_empty() check in __tlb_flush_mm() since
there shouldn't be too many mms with a zero mm_cpumask, besides the
init_mm of course.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4d3bc99b upstream.
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:
arch/x86/power/built-in.o: In function `restore_processor_state':
(.text+0x45c): undefined reference to `perf_restore_debug_store'
Fix it by defining the dummy function appropriately.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a6e06b2ae upstream.
Commit 1d9d8639c0 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") fixed a crash when doing PEBS performance profiling
after resuming, but in using init_debug_store_on_cpu() to restore the
DS_AREA mtrr it also resulted in a new WARN_ON() triggering.
init_debug_store_on_cpu() uses "wrmsr_on_cpu()", which in turn uses CPU
cross-calls to do the MSR update. Which is not really valid at the
early resume stage, and the warning is quite reasonable. Now, it all
happens to _work_, for the simple reason that smp_call_function_single()
ends up just doing the call directly on the CPU when the CPU number
matches, but we really should just do the wrmsr() directly instead.
This duplicates the wrmsr() logic, but hopefully we can just remove the
wrmsr_on_cpu() version eventually.
Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d9d8639c0 upstream.
This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.
The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4502403dcf upstream.
The call tree here is:
sk_clone_lock() <- takes bh_lock_sock(newsk);
xfrm_sk_clone_policy()
__xfrm_sk_clone_policy()
clone_policy() <- uses GFP_ATOMIC for allocations
security_xfrm_policy_clone()
security_ops->xfrm_policy_clone_security()
selinux_xfrm_policy_clone()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d63ac5f6cf upstream.
Commit 44ae3ab335 forgot to update
the entry for the 970MP rev 1.0 processor when moving some CPU
features bits to the MMU feature bit mask. This breaks booting
on some rare G5 models using that chip revision.
Reported-by: Phileas Fogg <phileas-fogg@mail.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d1817cab2 upstream.
On Sat, Mar 02, 2013 at 10:45:10AM +0100, Sven Geggus wrote:
> This is the bad commit I found doing git bisect:
> 04f482faf5 is the first bad commit
> commit 04f482faf5
> Author: Patrick McHardy <kaber@trash.net>
> Date: Mon Mar 28 08:39:36 2011 +0000
Good job. I was too lazy to bisect for bad commit;)
Reading the code I found problematic kthread_should_stop call from netlink
connector which causes the oops. After applying a patch, I've been testing
owfs+w1 setup for nearly two days and it seems to work very reliable (no
hangs, no memleaks etc).
More detailed description and possible fix is given below:
Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.
This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.
Signed-off-by: Marcin Jurkowski <marcin1j@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Josh Boyer <jwboyer@gmail.com>
Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbfa57c0f2 upstream.
If an fsync occurs on a read-only array, we need to send a
completion for the IO and may not increment the active IO count.
Otherwise, we hit a bug trace and can't stop the MD array anymore.
By advice of Christoph Hellwig we return success upon a flush
request but we return -EROFS for other writes.
We detect flush requests by checking if the bio has zero sectors.
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: NeilBrown <neilb@suse.de>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Paul Menzel <paulepanter@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b81273a132 upstream.
Now that login from util-linux is forced to drop all references to a
TTY which it wants to hangup (to reach reference count 1) we are
seeing issues with telnet. When login closes its last reference to the
slave PTY, it also resets packet mode on the *master* side. And we
have a race here.
What telnet does is fork+exec of `login'. Then there are two
scenarios:
* `login' closes the slave TTY and resets thus master's packet mode,
but even now telnet properly sets the mode, or
* `telnetd' sets packet mode on the master, `login' closes the slave
TTY and resets master's packet mode.
The former case is OK. However the latter happens in much more cases,
by the order of magnitude to be precise. So when one tries to login to
such a messed telnet setup, they see the following:
inux login:
ogin incorrect
Note the missing first letters -- telnet thinks it is still in the
packet mode, so when it receives "linux login" from `login', it
considers "l" as the type of the packet and strips it.
SuS does not mention how the implementation should behave. Both BSDs I
checked (Free and Net) do not reset the flag upon the last close.
By this I am resurrecting an old bug, see References. We are hitting
it regularly now, i.e. with updated util-linux, ergo login.
Here, I am changing a behavior introduced back in 2.1 times. It would
better have a long time testing before goes upstream.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Bryan Mason <bmason@redhat.com>
References: https://lkml.org/lkml/2009/11/11/223
References: https://bugzilla.redhat.com/show_bug.cgi?id=504703
References: https://bugzilla.novell.com/show_bug.cgi?id=797042
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 827aa0d36d upstream.
This could have been either ARCH_S5P64X0 or CPU_S5P6450. Looking at
commit 2555e663b3 ("ARM: S5P64X0: Add UART
serial support for S5P6450") - which added this typo - makes clear this
should be CPU_S5P6450.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d0c2d10dd upstream.
ext3_msg() takes the printk prefix as the second parameter and the
format string as the third parameter. Two callers of ext3_msg omit the
prefix and pass the format string as the second parameter and the first
parameter to the format string as the third parameter. In both cases
this string comes from an arbitrary source. Which means the string may
contain format string characters, which will
lead to undefined and potentially harmful behavior.
The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages
in ext3") and is fixed by this patch.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ca39528c0 upstream.
When the new signal handlers are set up, the location of sa_restorer is
not cleared, leaking a parent process's address space location to
children. This allows for a potential bypass of the parent's ASLR by
examining the sa_restorer value returned when calling sigaction().
Based on what should be considered "secret" about addresses, it only
matters across the exec not the fork (since the VMAs haven't changed
until the exec). But since exec sets SIG_DFL and keeps sa_restorer,
this is where it should be fixed.
Given the few uses of sa_restorer, a "set" function was not written
since this would be the only use. Instead, we use
__ARCH_HAS_SA_RESTORER, as already done in other places.
Example of the leak before applying this patch:
$ cat /proc/$$/maps
...
7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
...
$ ./leak
...
7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so
...
1 0 (nil) 0x7fb9f30b94a0
2 4000000 (nil) 0x7f278bcaa4a0
3 4000000 (nil) 0x7f278bcaa4a0
4 0 (nil) 0x7fb9f30b94a0
...
[akpm@linux-foundation.org: use SA_RESTORER for backportability]
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Emese Revfy <re.emese@gmail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Julien Tinnes <jln@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6987a6dabf upstream.
Remove usb_put_dev from vt6656_suspend and usb_get_dev
from vt6566_resume.
These are not normally in suspend/resume functions.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit feca7746d5 upstream.
This patch (as1661) fixes a rather obscure bug in ehci-hcd. In a
couple of places, the driver compares the DMA address stored in a QH's
overlay region with the address of a particular qTD, in order to see
whether that qTD is the one currently being processed by the hardware.
(If it is then the status in the QH's overlay region is more
up-to-date than the status in the qTD, and if it isn't then the
overlay's value needs to be adjusted when the QH is added back to the
active schedule.)
However, DMA address in the overlay region isn't always valid. It
sometimes will contain a stale value, which may happen by coincidence
to be equal to a qTD's DMA address. Instead of checking the DMA
address, we should check whether the overlay region is active and
valid. The patch tests the ACTIVE bit in the overlay, and clears this
bit when the overlay becomes invalid (which happens when the
currently-executing URB is unlinked).
This is the second part of a fix for the regression reported at:
https://bugs.launchpad.net/bugs/1088733
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Reported-and-tested-by: Stephen Thirlwall <sdt@dr.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab4b71644a upstream.
This reverts commit 200e0d99 ("USB: storage: optimize to match the
Huawei USB storage devices and support new switch command" and the
followup bugfix commit cd060956 ("USB: storage: properly handle
the endian issues of idProduct").
The commit effectively added a large number of Huawei devices to
the deprecated usb-storage mode switching logic. Many of these
devices have been in use and supported by the userspace
usb_modeswitch utility for years. Forcing the switching inside
the kernel causes a number of regressions as a result of ignoring
existing onfigurations, and also completely takes away the ability
to configure mode switching per device/system/user.
Known regressions caused by this:
- Some of the devices support multiple modes, using different
switching commands. There are existing configurations taking
advantage of this.
- There is a real use case for disabling mode switching and
instead mounting the exposed storage device. This becomes
impossible with switching logic inside the usb-storage driver.
- At least on device fail as a result of the usb-storage switching
command, becoming completely unswitchable. This is possibly a
firmware bug, but still a regression because the device work as
expected using usb_modeswitch defaults.
In-kernel mode switching was deprecated years ago with the
development of the more user friendly userspace alternatives. The
existing list of devices in usb-storage was only kept to prevent
breaking already working systems. The long term plan is to remove
the list, not to add to it. Ref:
http://permalink.gmane.org/gmane.linux.usb.general/28543
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: <fangxiaozhi@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a57e82a187 upstream.
The Rigblaster Advantage is an amateur radio interface sold by West Mountain
Radio. It contains a cp210x serial interface but the device ID is not in
the driver.
Signed-off-by: Steve Conklin <sconklin@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0f5ecee4e upstream.
The buffer for responses must not overflow.
If this would happen, set a flag, drop the data and return
an error after user space has read all remaining data.
Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit daec90e738 upstream.
Another device using CDC ACM with vendor specific protocol to mark
serial functions.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e84e7a56a3 upstream.
The code currently only supports one virtio-rng device at a time.
Invoking guests with multiple devices causes the guest to blow up.
Check if we've already registered and initialised the driver. Also
cleanup in case of registration errors or hot-unplug so that a new
device can be used.
Reported-by: Peter Krempa <pkrempa@redhat.com>
Reported-by: <yunzheng@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d90e63603 upstream.
4 ports; AT/PPP is standard CDC-ACM. The other three (added by this
patch) are QCDM/DIAG, possibly GPS, and unknown.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a40e7cf8f0 upstream.
Commit 9f9c9cbb60 ("drivers/firmware/dmi_scan.c: fetch dmi version
from SMBIOS if it exists") hoisted the check for "_DMI_" into
dmi_scan_machine(), which means that we don't bother to check for
"_DMI_" at offset 16 in an SMBIOS entry. smbios_present() may also call
dmi_present() for an address where we found "_SM_", if it failed further
validation.
Check for "_DMI_" in smbios_present() before calling dmi_present().
[akpm@linux-foundation.org: fix build]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: Tim McGrath <tmhikaru@gmail.com>
Tested-by: Tim Mcgrath <tmhikaru@gmail.com>
Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When decnet is built as a module a simple:
echo 0.0 >/proc/sys/net/decnet/node_address
results in most of the sysctl entries under /proc/sys/net/decnet and
/proc/sys/net/decnet/conf disappearing.
For more details see http://www.spinics.net/lists/netdev/msg226123.html.
This change applies the same workaround used in
net/core/sysctl_net_core.c and net/ipv6/sysctl_net_ipv6.c of creating
a skeleton of decnet sysctl entries before doing anything else.
The problem first appeared in kernel 2.6.27. The later rewrite of
sysctl in kernel 3.4 restored the previous behavior and eliminated the
need for this workaround.
This patch was heavily inspired by a similar but more complex patch by
Larry Baker.
Reported-by: Larry Baker <baker@usgs.gov>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: David Miller <davem@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db05021d49 upstream.
The prompt to enable DYNAMIC_FTRACE (the ability to nop and
enable function tracing at run time) had a confusing statement:
"enable/disable ftrace tracepoints dynamically"
This was written before tracepoints were added to the kernel,
but now that tracepoints have been added, this is very confusing
and has confused people enough to give wrong information during
presentations.
Not only that, I looked at the help text, and it still references
that dreaded daemon that use to wake up once a second to update
the nop locations and brick NICs, that hasn't been around for over
five years.
Time to bring the text up to the current decade.
Reported-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a930d87905 upstream.
If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.
That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there. And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.
This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0da9dfdd2c upstream.
This fixes CVE-2013-1792.
There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created. It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.
Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.
THREAD A THREAD B
=============================== ===============================
==>call install_user_keyrings();
if (!cred->user->session_keyring)
==>call install_user_keyrings()
...
user->uid_keyring = uid_keyring;
if (user->uid_keyring)
return 0;
<==
key = cred->user->session_keyring [== NULL]
user->session_keyring = session_keyring;
atomic_inc(&key->usage); [oops]
At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.
The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.
This couldn't be reproduced on a stock kernel. However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.
Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2069d483b3 upstream.
When a value of a vmaster slave control is changed, the ctl change
notification is sometimes ignored. This happens when the master
control overrides, e.g. when the corresponding master control is
muted. The reason is that slave_put() returns the value of the actual
slave put callback, and it doesn't reflect the virtual slave value
change.
This patch fixes the function just to return 1 whenever a slave value
is changed.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e78080f81 upstream.
Not having power is a pretty serious error so check that we are able to
enable the supply and error out if we can't.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
commit 58ebb34c49 upstream.
Create_stripe_zones returns an error slightly differently to
raid0_run and to raid0_takeover_*.
The error returned used by the second was wrong and an error would
result in mddev->private being set to NULL and sooner or later a
crash.
So never return NULL, return ERR_PTR(err), not NULL from
create_stripe_zones.
This bug has been present since 2.6.35 so the fix is suitable
for any kernel since then.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a3d63cadba upstream.
RSSI is being stored internally as s8 in several places. The indication
of an unset RSSI value, ATH_RSSI_DUMMY_MARKER, was supposed to have been
set to 127, but ended up being set to 0x127 because of a code cleanup
mistake. This could lead to invalid signal strength values in a few
places.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7f154f124 upstream.
virtio_rng feeds the randomness buffer handed by the core directly
into the scatterlist, since commit bb347d9807.
However, if CONFIG_HW_RANDOM=m, the static buffer isn't a linear address
(at least on most archs). We could fix this in virtio_rng, but it's actually
far easier to just do it in the core as virtio_rng would have to allocate
a buffer every time (it doesn't know how much the core will want to read).
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9a6b52ee1 upstream.
If the socket is full, we're better off just waiting until it empties,
or until the connection is broken. The reason why we generally don't
want to time out is that the call to xprt->ops->release_xprt() will
trigger a connection reset, which isn't helpful...
Let's make an exception for soft RPC calls, since they have to provide
timeout guarantees.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cba0cdf5e upstream.
__btrfs_close_devices() clones btrfs device structs with
memcpy(). Some of the fields in the clone are reinitialized, but it's
missing to init io_lock. In mainline this goes unnoticed, but on RT it
leaves the plist pointing to the original about to be freed lock
struct.
Initialize io_lock after cloning, so no references to the original
struct are left.
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 472b72f2db upstream.
The page++ is wrong. It makes bio_add_pc_page() pointing to a wrong page
address if the 'while (len > 0 && data_len > 0) { ... }' loop is
executed more than one once.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b255188f90 upstream.
Paolo Pisati reports that IPv6 triggers this warning:
BUG: scheduling while atomic: swapper/0/0/0x40000100
Modules linked in:
[<c001b1c4>] (unwind_backtrace+0x0/0xf0) from [<c0503c5c>] (__schedule_bug+0x48/0x5c)
[<c0503c5c>] (__schedule_bug+0x48/0x5c) from [<c0508608>] (__schedule+0x700/0x740)
[<c0508608>] (__schedule+0x700/0x740) from [<c007007c>] (__cond_resched+0x24/0x34)
[<c007007c>] (__cond_resched+0x24/0x34) from [<c05086dc>] (_cond_resched+0x3c/0x44)
[<c05086dc>] (_cond_resched+0x3c/0x44) from [<c0021f6c>] (do_alignment+0x178/0x78c)
[<c0021f6c>] (do_alignment+0x178/0x78c) from [<c00083e0>] (do_DataAbort+0x34/0x98)
[<c00083e0>] (do_DataAbort+0x34/0x98) from [<c0509a60>] (__dabt_svc+0x40/0x60)
Exception stack(0xc0763d70 to 0xc0763db8)
3d60: e97e805e e97e806e 2c000000 11000000
3d80: ea86bb00 0000002c 00000011 e97e807e c076d2a8 e97e805e e97e806e 0000002c
3da0: 3d000000 c0763dbc c04b98fc c02a8490 00000113 ffffffff
[<c0509a60>] (__dabt_svc+0x40/0x60) from [<c02a8490>] (__csum_ipv6_magic+0x8/0xc8)
Fix this by using probe_kernel_address() stead of __get_user().
Reported-by: Paolo Pisati <p.pisati@gmail.com>
Tested-by: Paolo Pisati <p.pisati@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e4ba617c1 upstream.
Martin Storsjö reports that the sequence:
ee312ac1 vsub.f32 s4, s3, s2
ee702ac0 vsub.f32 s5, s1, s0
e59f0028 ldr r0, [pc, #40]
ee111a90 vmov r1, s3
on Raspberry Pi (implementor 41 architecture 1 part 20 variant b rev 5)
where s3 is a denormal and s2 is zero results in incorrect behaviour -
the instruction "vsub.f32 s5, s1, s0" is not executed:
VFP: bounce: trigger ee111a90 fpexc d0000780
VFP: emulate: INST=0xee312ac1 SCR=0x00000000
...
As we can see, the instruction triggering the exception is the "vmov"
instruction, and we emulate the "vsub.f32 s4, s3, s2" but fail to
properly take account of the FPEXC_FP2V flag in FPEXC. This is because
the test for the second instruction register being valid is bogus, and
will always skip emulation of the second instruction.
Reported-by: Martin Storsjö <martin@martin.st>
Tested-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc400e185c upstream.
Some low-level comedi drivers (incorrectly) point `dev->read_subdev` or
`dev->write_subdev` to a subdevice that does not support asynchronous
commands. Comedi's poll(), read() and write() file operation handlers
assume these subdevices do support asynchronous commands. In
particular, they assume `s->async` is valid (where `s` points to the
read or write subdevice), which it won't be if it has been set
incorrectly. This can lead to a NULL pointer dereference.
Check `s->async` is non-NULL in `comedi_poll()`, `comedi_read()` and
`comedi_write()` to avoid the bug.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 22056e2b46 upstream.
Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings. He tracked it down to two
problems, one of which is addressed by this patch.
It seems that writing to the command3 register after writing to the
command4 register in `labpc_ai_cmd()` messes up the differential
reference bit setting in the command4 register. Set up the command4
register after the command3 register (as in `labpc_ai_rinsn()`) to avoid
the problem.
Thanks to Tuomas for suggesting the fix.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4c4bc25d0f upstream.
Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings. He tracked it down to two
problems, one of which is addressed by this patch.
It seems the setting of the channel bits for particular scanning modes
was incorrect for differential mode. (Only half the number of channels
are available in differential mode; comedi refers to them as channels 0,
1, 2 and 3, but the hardware documentation refers to them as channels 0,
2, 4 and 6.) In differential mode, the setting of the channel enable
bits in the command1 register should depend on whether the scan enable
bit is set. Effectively, we need to double the comedi channel number
when the scan enable bit is not set in differential mode. The scan
enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or
`MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is
`MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`. The existing test
for whether the comedi channel number needs to be doubled in
differential mode is incorrect in `labpc_ai_cmd()`. This patch corrects
the test.
Thanks to Tuomas for suggesting the fix.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 3.0.67, commit 58c9ce6fad (s390/kvm:
Fix store status for ACRS/FPRS), upstream commit
15bc8d8457, added a call to
save_access_regs to save ACRS. But we do not have ARCS in kvm_run in
3.0 yet, so this results in:
arch/s390/kvm/kvm-s390.c: In function 'kvm_s390_vcpu_store_status':
arch/s390/kvm/kvm-s390.c:593: error: 'struct kvm_run' has no member named 's'
Fix it by saving guest_acrs which is where ARCS are in 3.0.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 3.0.67, commit 7a9a20ea77 (dca: check
against empty dca_domains list before unregister provider), upstream
commit c419fcfd07, added a fail path to
unregister_dca_provider. It added there also a call to
raw_spin_unlock_irqrestore. But in 3.0, the lock is not raw, so this
results in:
drivers/dca/dca-core.c: In function 'unregister_dca_provider':
drivers/dca/dca-core.c:413: warning: passing argument 1 of '_raw_spin_unlock_irqrestore' from incompatible pointer type
Fix it by calling spin_unlock_irqrestore properly.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71b5707e11 upstream.
In cgroup_exit() put_css_set_taskexit() is called without any lock,
which might lead to accessing a freed cgroup:
thread1 thread2
---------------------------------------------
exit()
cgroup_exit()
put_css_set_taskexit()
atomic_dec(cgrp->count);
rmdir();
/* not safe !! */
check_for_release(cgrp);
rcu_read_lock() can be used to make sure the cgroup is alive.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63f43f55c9 upstream.
rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.
It's safe in the protection of dentry->d_lock.
v2: check NULL dentry before acquiring dentry lock.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb214ede76 upstream.
When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.
The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.
This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.
The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.
Current code handle x2apic when BIOS pass with xapic mode
enabled:
When user specifies x2apic_phys, or FADT indicates PHYSICAL:
1. During madt oem check, apic driver is set with xapic logical
or xapic phys driver at first.
2. enable_IR_x2apic() will enable x2apic_mode.
3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
will install the correct x2apic phys driver and use x2apic phys mode.
Otherwise it will skip the driver will let x2apic_cluster_probe to
take over to install x2apic cluster driver (wrong one) even though FADT
indicates PHYSICAL, because x2apic_phys_probe does not check
FADT PHYSICAL.
Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.
Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Zhang Lin-Bao <Linbao.zhang@hp.com>
[ make a patch specially for 3.0.66]
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f4ffc3a53 upstream.
automount-support is broken on the parisc architecture, because the existing
#if list does not include a check for defined(__hppa__). The HPPA (parisc)
architecture is similiar to other 64bit Linux targets where we have to define
autofs_wqt_t (which is passed back and forth to user space) as int type which
has a size of 32bit across 32 and 64bit kernels.
During the discussion on the mailing list, H. Peter Anvin suggested to invert
the #if list since only specific platforms (specifically those who do not have
a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned
long type.
This suggestion is probably the best way to go, since Arm64 (and maybe others?)
seems to have a non-working automounter. So in the long run even for other new
upcoming architectures this inverted check seem to be the best solution, since
it will not require them to change this #if again (unless they are 64bit only).
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 9067ac85d5.
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().
TASK_ALL has no other users, probably can be killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 9899d11f65.
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.
set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.
As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.
Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().
While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 910ffdb18a.
Cleanup and preparation for the next change.
signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.
Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.
This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd97120fc3 upstream.
If a single descriptor crosses a region, the
second chunk length should be decremented
by size translated so far, instead it includes
the full descriptor length.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e75bafbff2 upstream.
svc_age_temp_xprts expires xprts in a two-step process: first it takes
the sv_lock and moves the xprts to expire off their server-wide list
(sv_tempsocks or sv_permsocks) to a local list. Then it drops the
sv_lock and enqueues and puts each one.
I see no reason for this: svc_xprt_enqueue() will take sp_lock, but the
sv_lock and sp_lock are not otherwise nested anywhere (and documentation
at the top of this file claims it's correct to nest these with sp_lock
inside.)
Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
Tested-by: Paweł Sikora <pawel.sikora@agmk.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f116700971 upstream.
In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30ebc5e44d upstream.
We recently introduced a new return -ENODEV in this function but we need
to unlock before returning.
[mchehab@redhat.com: found two patches with the same fix. Merged SOB's/acks into one patch]
Acked-by: Herton R. Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cdae7416a upstream.
The iteration logic of idr_get_next() is borrowed mostly verbatim from
idr_for_each(). It walks down the tree looking for the slot matching
the current ID. If the matching slot is not found, the ID is
incremented by the distance of single slot at the given level and
repeats.
The implementation assumes that during the whole iteration id is aligned
to the layer boundaries of the level closest to the leaf, which is true
for all iterations starting from zero or an existing element and thus is
fine for idr_for_each().
However, idr_get_next() may be given any point and if the starting id
hits in the middle of a non-existent layer, increment to the next layer
will end up skipping the same offset into it. For example, an IDR with
IDs filled between [64, 127] would look like the following.
[ 0 64 ... ]
/----/ |
| |
NULL [ 64 ... 127 ]
If idr_get_next() is called with 63 as the starting point, it will try
to follow down the pointer from 0. As it is NULL, it will then try to
proceed to the next slot in the same level by adding the slot distance
at that level which is 64 - making the next try 127. It goes around the
loop and finds and returns 127 skipping [64, 126].
Note that this bug also triggers in idr_for_each_entry() loop which
deletes during iteration as deletions can make layers go away leaving
the iteration with unaligned ID into missing layers.
Fix it by ensuring proceeding to the next slot doesn't carry over the
unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
id += slot_distance.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d092603cc upstream.
"be->mode" is obtained from xenbus_read(), which does a kmalloc() for
the message body. The short string is never released, so do it along
with freeing "be" itself, and make sure the string isn't kept when
backend_changed() doesn't complete successfully (which made it
desirable to slightly re-structure that function, so that the error
cleanup can be done in one place).
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 309a85b686 upstream.
ocfs2_block_group_alloc_discontig() disables chain relink by setting
ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple
cluster groups.
It doesn't keep the credits for all chain relink,but
ocfs2_claim_suballoc_bits overrides this in this call trace:
ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()->
__ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits()
ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call
ocfs2_search_chain() one time and disable it again, and then we run out
of credits.
Fix is to allow relink by default and disable it in
ocfs2_block_group_alloc_discontig.
Without this patch, End-users will run into a crash due to run out of
credits, backtrace like this:
RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>]
jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
Call Trace:
ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
ocfs2_relink_block_group+0x111/0x480 [ocfs2]
ocfs2_search_chain+0x455/0x9a0 [ocfs2]
...
Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbbf8555a9 upstream.
This patch adds missing bounds checking for the configfs provided
mapped_lun value during target_fabric_make_mappedlun() setup ahead
of se_lun_acl initialization.
This addresses a potential OOPs when using a mapped_lun value that
exceeds the hardcoded TRANSPORT_MAX_LUNS_PER_TPG-1 value within
se_node_acl->device_list[].
Reported-by: Jan Engelhardt <jengelh@inai.de>
Cc: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c10093692 upstream.
On non-BIOS platforms it is possible that the BIOS data area contains
garbage instead of being zeroed or something equivalent (firmware
people: we are talking of 1.5K here, so please do the sane thing.)
We need on the order of 20-30K of low memory in order to boot, which
may grow up to < 64K in the future. We probably want to avoid the
lowest of the low memory. At the same time, it seems extremely
unlikely that a legitimate EBDA would ever reach down to the 128K
(which would require it to be over half a megabyte in size.) Thus,
pick 128K as the cutoff for "this is insane, ignore." We may still
end up reserving a bunch of extra memory on the low megabyte, but that
is not really a major issue these days. In the worst case we lose
512K of RAM.
This code really should be merged with trim_bios_range() in
arch/x86/kernel/setup.c, but that is a bigger patch for a later merge
window.
Reported-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c189ea64e upstream.
Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
changed ftrace module notifier's priority to INT_MAX in order to
process the ftrace nops before anything else could touch them
(namely kprobes). This was the correct thing to do.
Unfortunately, the ftrace module notifier also contains the ftrace
clean up code. As opposed to the set up code, this code should be
run *after* all the module notifiers have run in case a module is doing
correct clean-up and unregisters its ftrace hooks. Basically, ftrace
needs to do clean up on module removal, as it needs to know about code
being removed so that it doesn't try to modify that code. But after it
removes the module from its records, if a ftrace user tries to remove
a probe, that removal will fail due as the record of that code segment
no longer exists.
Nothing really bad happens if the probe removal is called after ftrace
did the clean up, but the ftrace removal function will return an error.
Correct code (such as kprobes) will produce a WARN_ON() if it fails
to remove the probe. As people get annoyed by frivolous warnings, it's
best to do the ftrace clean up after everything else.
By splitting the ftrace_module_notifier into two notifiers, one that
does the module load setup that is run at high priority, and the other
that is called for module clean up that is run at low priority, the
problem is solved.
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e182bb38d7 upstream.
When idr_find() was fed a negative ID, it used to look up the ID
ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and
move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers
a WARN_ON_ONCE().
__lock_timer() feeds timer_id from userland directly to idr_find()
without sanitizing it which can trigger the above malfunctions. Add a
range check on @timer_id before invoking idr_find() in __lock_timer().
While timer_t is defined as int by all archs at the moment, Andrew
worries that it may be defined as a larger type later on. Make the
test cover larger integers too so that it at least is guaranteed to
not return the wrong timer.
Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional
precaution while moving away from ignoring MSB. Once it's gone we can
remove the guard as long as timer_t isn't larger than int.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f528d980c1 upstream.
When dma_ops are initialized the unity mappings are
created. The init_device_table_dma() function makes sure DMA
from all devices is blocked by default. This opens a short
window in time where DMA to unity mapped regions is blocked
by the IOMMU. Make sure this does not happen by initializing
the device table after dma_ops.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3ad83d9ef upstream.
Otherwise, ext4 file systems with the quota featured enable will get a
very confusing "No such process" error message if the quota code is
built as a module and the quota_v2 module has not been loaded.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd060956c5 upstream.
1. The idProduct is little endian, so make sure its value to be
compatible with the current CPU. Make no break on big endian processors.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0475352326 upstream.
The module alias should be "ehci-omap" and not
"omap-ehci" to match the platform device name.
The omap-ehci module should now autoload correctly.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f3f687722 upstream.
The USB device descriptor of one identity presented by a few
Huawei morphing devices have serial functions with class codes
02/02/ff, indicating CDC ACM with a vendor specific protocol. This
combination is often used for MSFT RNDIS functions, and the CDC
ACM class driver will therefore ignore such functions.
The CDC ACM class driver cannot support functions with only 2
endpoints. The underlying serial functions of these modems are
also believed to be the same as for alternate device identities
already supported by the option driver. Letting the same driver
handle these functions independently of the current identity
ensures consistent handling and user experience.
There is no need to blacklist these devices in the rndis_host
driver. Huawei serial functions will either have only 2 endpoints
or a CDC ACM functional descriptor with bmCapabilities != 0, making
them correctly ignored as "non RNDIS" by that driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8f0302bbc upstream.
Adding three currently unsupported modems based on information
from .inf driver files:
Diag VID_1BBB&PID_0052&MI_00
AGPS VID_1BBB&PID_0052&MI_01
VOICE VID_1BBB&PID_0052&MI_02
AT VID_1BBB&PID_0052&MI_03
Modem VID_1BBB&PID_0052&MI_05
wwan VID_1BBB&PID_0052&MI_06
Diag VID_1BBB&PID_00B6&MI_00
AT VID_1BBB&PID_00B6&MI_01
Modem VID_1BBB&PID_00B6&MI_02
wwan VID_1BBB&PID_00B6&MI_03
Diag VID_1BBB&PID_00B7&MI_00
AGPS VID_1BBB&PID_00B7&MI_01
VOICE VID_1BBB&PID_00B7&MI_02
AT VID_1BBB&PID_00B7&MI_03
Modem VID_1BBB&PID_00B7&MI_04
wwan VID_1BBB&PID_00B7&MI_05
Updating the blacklist info for the X060S_X200 and X220_X500D,
reserving interfaces for a wwan driver, based on
wwan VID_1BBB&PID_0000&MI_04
wwan VID_1BBB&PID_0017&MI_06
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c419fcfd07 upstream.
When providers get blocked unregister_dca_providers() is called ending up
with dca_providers and dca_domain lists emptied. Dca should be prevented from
trying to unregister any provider if dca_domain list is found empty.
Reported-by: Jiang Liu <jiang.liu@huawei.com>
Tested-by: Gaohuai Han <hangaohuai@huawei.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <djbw@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08dcdbf6a7 ]
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.
We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.
inet6_ehashfn() can also separately use the ports, instead
of xoring them.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e55f8b306 ]
If the credit timer is left armed after calling
xen_netbk_remove_xenvif(), then it may fire and attempt to schedule
the vif which will then oops as vif->netbk == NULL.
This may happen both in the fatal error path and during normal
disconnection from the front end.
The sequencing during shutdown is critical to ensure that: a)
vif->netbk doesn't become unexpectedly NULL; and b) the net device/vif
is not freed.
1. Mark as unschedulable (netif_carrier_off()).
2. Synchronously cancel the timer.
3. Remove the vif from the schedule list.
4. Remove it from it netback thread group.
5. Wait for vif->refcnt to become 0.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35876b5ffc ]
netbk_count_requests() could detect an error, call
netbk_fatal_tx_error() but return 0. The vif may then be used
afterwards (e.g., in a call to netbk_tx_error().
Since netbk_fatal_tx_error() could set vif->refcnt to 1, the vif may
be freed immediately after the call to netbk_fatal_tx_error() (e.g.,
if the vif is also removed).
Netback thread Xenwatch thread
-------------------------------------------
netbk_fatal_tx_err() netback_remove()
xenvif_disconnect()
...
free_netdev()
netbk_tx_err() Oops!
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 547b4e7181 ]
Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50e244cc79 upstream.
Adjust the console layer to allow a take over call where the caller
already holds the locks. Make the fb layer lock in order.
This is partly a band aid, the fb layer is terminally confused about the
locking rules it uses for its notifiers it seems.
[akpm@linux-foundation.org: remove stray non-ascii char, tidy comment]
[akpm@linux-foundation.org: export do_take_over_console()]
[airlied: cleanup another non-ascii char]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jiri Kosina <jkosina@suse.cz>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae1287865f upstream.
If grub2 loads efifb/vesafb, then when systemd starts it can set the console
font on that framebuffer device, however when we then load the native KMS
driver, the first thing it does is tear down the generic framebuffer driver.
The thing is the generic code is doing the right thing, it frees the font
because otherwise it would leak memory. However we can assume that if you
are removing the generic firmware driver (vesa/efi/offb), that a new driver
*should* be loading soon after, so we effectively leak the font.
However the old code left a dangling pointer in vc->vc_font.data and we
can now reuse that dangling pointer to load the font into the new
driver, now that we aren't freeing it.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=892340
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7139bc1579 upstream.
This patch goes a long way toward fixing the minifail bug, and
it significantly improves the stability of SMP machines such as
the rp3440. When write protecting a page for COW, we need to
purge the existing translation. Otherwise, the COW break
doesn't occur as expected because the TLB may still have a stale entry
which allows writes.
[jejb: fix up checkpatch errors]
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8520e443aa upstream.
Disable hard IRQ before kexec a new kernel image.
Not doing it can result in corrupted data in the memory segments
reserved for the new kernel.
Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d107a20415 upstream.
The Chip Select Configuration Register must be programmed to 0x2 in
order to achieve the correct behavior of the Static Memory Controller.
Without this patch devices wired to DFI and accessed through SMC cannot
be accessed after resume from S2.
Do not rely on the boot loader to program the CSMSADRCFG register by
programming it in the kernel smemc module.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae5943de8c upstream.
This error happens because PIPEnsControlOut and PIPEnsControlIn unlock the
spin lock for delay, letting in another thread.
The patch moves the current MP_SET_FLAG to before filling
of sUsbCtlRequest for pControlURB and clears it in event of failing.
Any thread calling either function while fMP_CONTROL_READS or fMP_CONTROL_WRITES
flags set will return STATUS_FAILURE.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 754ab5c0e5 upstream.
Comedi has two sorts of minor devices:
(a) normal board minor devices in the range 0 to
COMEDI_NUM_BOARD_MINORS-1 inclusive; and
(b) special subdevice minor devices in the range COMEDI_NUM_BOARD_MINORS
upwards that are used to open the same underlying comedi device as the
normal board minor devices, but with non-default read and write
subdevices for asynchronous commands.
The special subdevice minor devices get created when a board supporting
asynchronous commands is attached to a normal board minor device, and
destroyed when the board is detached from the normal board minor device.
One way to attach or detach a board is by using the COMEDI_DEVCONFIG
ioctl. This should only be used on normal board minors as the special
subdevice minors are too ephemeral. In particular, the change
introduced in commit 7d3135af39 ("staging:
comedi: prevent auto-unconfig of manually configured devices") breaks
horribly for special subdevice minor devices.
Since there's no legitimate use for the COMEDI_DEVCONFIG ioctl on a
special subdevice minor device node, disallow it and return -ENOTTY.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0720a06a75 upstream.
The utf8s_to_utf16s conversion routine needs to be improved. Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.
This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called. A
follow-on patch will add a third caller that does utilize the new
capabilities.
The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations. But that's a subject for a
different patch.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f23de52b6 upstream.
While looking at plymouth on udl I noticed that plymouth was trying
to use its fb plugin not its drm one, it was trying to drmOpen a driver called
usb not udl, noticed that we actually had out driver pointing at the wrong
device.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c49bafa384 upstream.
We added some more error handling in b40971426a "ext4: add error
checking to calls to ext4_handle_dirty_metadata()". But we need to
call kfree() as well to avoid a memory leak.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcf2d804ed upstream.
Some of the error path in ext4_fill_super don't release the
resouces properly. So this patch just try to release them
in the right way.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b531f81b0d upstream.
Commit 99fc86450c "ALSA: usb-mixer:
parse descriptors with structs" introduced a set of useful parsers
for descriptors. Unfortunately the parses for the Processing Unit
Descriptor came with a very subtle bug...
Functions uac_processing_unit_iProcessing() and
uac_processing_unit_specific() were indexing the baSourceID array
forgetting the fields before the iProcessing and process-specific
descriptors.
The problem was observed with Sound Blaster Extigy mixer,
where nNrModes in Up/Down-mix Processing Unit Descriptor
was accessed at offset 10 of the descriptor (value 0)
instead of offset 15 (value 7). In result the resulting
control had interesting limit values:
Simple mixer control 'Channel Routing Mode Select',0
Capabilities: volume volume-joined penum
Playback channels: Mono
Capture channels: Mono
Limits: 0 - -1
Mono: -1 [100%]
Fixed by starting from the bmControls, which was calculated
correctly, instead of baSourceID.
Now the mentioned control is fine:
Simple mixer control 'Channel Routing Mode Select',0
Capabilities: volume volume-joined penum
Playback channels: Mono
Capture channels: Mono
Limits: 0 - 6
Mono: 0 [0%]
Signed-off-by: Pawel Moll <mail@pawelmoll.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7da5804648 upstream.
The quirk for the Roland/Cakewalk A-PRO keyboards accidentally used the
wrong interface number, which prevented the driver from attaching to the
device.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 008e33f733 upstream.
Corrected USB ID for T-Com Sinus 154 data II. ISL3887-based. The
device was tested in managed mode with no security, WEP 128
bit and WPA-PSK (TKIP) with firmware 2.13.1.0.lm87.arm (md5sum:
7d676323ac60d6e1a3b6d61e8c528248). It works.
Signed-off-by: Tomasz Guszkowski <tsg@o2.pl>
Acked-By: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 666b3d803a upstream.
Currently, nlmclnt_lock will break out of the for(;;) loop when
the reclaimer wakes up the blocking lock thread by setting
nlm_lck_denied_grace_period. This causes the lock request to fail
with an ENOLCK error.
The intention was always to ensure that we resend the lock request
after the grace period has expired.
Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67d46b296a upstream.
Rob van der Heij reported the following (paraphrased) on private mail.
The scenario is that I want to avoid backups to fill up the page
cache and purge stuff that is more likely to be used again (this is
with s390x Linux on z/VM, so I don't give it as much memory that
we don't care anymore). So I have something with LD_PRELOAD that
intercepts the close() call (from tar, in this case) and issues
a posix_fadvise() just before closing the file.
This mostly works, except for small files (less than 14 pages)
that remains in page cache after the face.
Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.
The issue is the per-cpu pagevecs for LRU additions. If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release(). The user-visible effect is
that a program that uses fadvise() properly is not obeyed.
A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded. The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.
Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again. If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care. With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.
If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.
The following test program demonstrates the problem. It should never
report that pages are still resident but will without this patch. It
assumes that CPU 0 and 1 exist.
int main() {
int fd;
int pagesize = getpagesize();
ssize_t written = 0, expected;
char *buf;
unsigned char *vec;
int resident, i;
cpu_set_t set;
/* Prepare a buffer for writing */
expected = FILESIZE_PAGES * pagesize;
buf = malloc(expected + 1);
if (buf == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
buf[expected] = 0;
memset(buf, 'a', expected);
/* Prepare the mincore vec */
vec = malloc(FILESIZE_PAGES);
if (vec == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
/* Bind ourselves to CPU 0 */
CPU_ZERO(&set);
CPU_SET(0, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* open file, unlink and write buffer */
fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
unlink("fadvise-test-file");
while (written < expected) {
ssize_t this_write;
this_write = write(fd, buf + written, expected - written);
if (this_write == -1) {
perror("write");
exit(EXIT_FAILURE);
}
written += this_write;
}
free(buf);
/*
* Force ourselves to another CPU. If fadvise only flushes the local
* CPUs pagevecs then the fadvise will fail to discard all file pages
*/
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* sync and fadvise to discard the page cache */
fsync(fd);
if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
perror("posix_fadvise");
exit(EXIT_FAILURE);
}
/* map the file and use mincore to see which parts of it are resident */
buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
if (buf == NULL) {
perror("mmap");
exit(EXIT_FAILURE);
}
if (mincore(buf, expected, vec) == -1) {
perror("mincore");
exit(EXIT_FAILURE);
}
/* Check residency */
for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
if (vec[i])
resident++;
}
if (resident != 0) {
printf("Nr unexpected pages resident: %d\n", resident);
exit(EXIT_FAILURE);
}
munmap(buf, expected);
close(fd);
free(vec);
exit(EXIT_SUCCESS);
}
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Rob van der Heij <rvdheij@gmail.com>
Tested-by: Rob van der Heij <rvdheij@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f00110f72 upstream.
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.
Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.
To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x
# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
# mount -o remount,size=200M nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above
# dd if=/dev/zero of=/tmp/x/f count=1
# panic here
Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f
Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.
The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:
config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */
This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.
How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676a0675cf upstream.
Running the command:
inotifywait -e unmount /mnt/disk
immediately aborts with a -EINVAL return code. This is however a valid
parameter. This abort occurs only if unmount is the sole event
parameter. If other event parameters are supplied, then the unmount
event wait will work.
The problem was introduced by commit 44b350fc23 ("inotify: Fix mask
checks"). In that commit, it states:
The mask checks in inotify_update_existing_watch() and
inotify_new_watch() are useless because inotify_arg_to_mask()
sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
But instead of removing the useless checks, it did this:
mask = inotify_arg_to_mask(arg);
- if (unlikely(!mask))
+ if (unlikely(!(mask & IN_ALL_EVENTS)))
return -EINVAL;
The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the
check should be:
if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))
But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
anyway, so the check is always going to pass and thus should simply be
removed. Also note that inotify_arg_to_mask completely controls what
mask bits get set from arg, there's no way for invalid bits to get
enabled there.
Lets fix it by simply removing the useless broken checks.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15bc8d8457 upstream.
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.
If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.
This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55c171a6d9 upstream.
Running under a kvm host does not necessarily imply the presence of
a page mapped above the main memory with the virtio information;
however, the code includes a hard coded access to that page.
Instead, check for the presence of the page and exit gracefully
before we hit an addressing exception if it does not exist.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 751efd8610 upstream.
There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().
Assume two tasks, one calling mmu_notifier_unregister() as a result of a
filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).
A B
t1 srcu_read_lock()
t2 if (!hlist_unhashed())
t3 srcu_read_unlock()
t4 srcu_read_lock()
t5 hlist_del_init_rcu()
t6 synchronize_srcu()
t7 srcu_read_unlock()
t8 hlist_del_rcu() <--- NULL pointer deref.
Additionally, the list traversal in __mmu_notifier_release() is not
protected by the by the mmu_notifier_mm->hlist_lock which can result in
callouts to the ->release() notifier from both mmu_notifier_unregister()
and __mmu_notifier_release().
-stable suggestions:
The stable trees prior to 3.7.y need commits 21a92735f6 and
70400303ce cherry-picked in that order prior to cherry-picking this
commit. The 3.7.y tree already has those two commits.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Sagi Grimberg <sagig@mellanox.co.il>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21a92735f6 upstream.
With an RCU based mmu_notifier implementation, any callout to
mmu_notifier_invalidate_range_{start,end}() or
mmu_notifier_invalidate_page() would not be allowed to call schedule()
as that could potentially allow a modification to the mmu_notifier
structure while it is currently being used.
Since srcu allocs 4 machine words per instance per cpu, we may end up
with memory exhaustion if we use srcu per mm. So all mms share a global
srcu. Note that during large mmu_notifier activity exit & unregister
paths might hang for longer periods, but it is tolerable for current
mmu_notifier clients.
Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fa3e78be7 upstream.
A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().
The panic/reboot path has fixups that look up devices in pci_bus_type. If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:
mach_reboot_fixups
pci_get_device
..
bus_find_device(&pci_bus_type, ...)
bus->p is NULL
Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76eaca031f upstream.
There is a loophole between Xen's current implementation of
pv-spinlocks and the scheduler. This was triggerable through
a testcase until v3.6 changed the TLB flushing code. The
problem potentially is still there just not observable in the
same way.
What could happen was (is):
1. CPU n tries to schedule task x away and goes into a slow
wait for the runq lock of CPU n-# (must be one with a lower
number).
2. CPU n-#, while processing softirqs, tries to balance domains
and goes into a slow wait for its own runq lock (for updating
some records). Since this is a spin_lock_irqsave in softirq
context, interrupts will be re-enabled for the duration of
the poll_irq hypercall used by Xen.
3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
an interrupt (e.g. endio) and when processing the interrupt,
tries to wake up task x. But that is in schedule and still
on_cpu, so try_to_wake_up goes into a tight loop.
4. The runq lock of CPU n-# gets unlocked, but the message only
gets sent to the first waiter, which is CPU n-# and that is
busily stuck.
5. CPU n-# never returns from the nested interruption to take and
release the lock because the scheduler uses a busy wait.
And CPU n never finishes the task migration because the unlock
notification only went to CPU n-#.
To avoid this and since the unlocking code has no real sense of
which waiter is best suited to grab the lock, just send the IPI
to all of them. This causes the waiters to return from the hyper-
call (those not interrupted at least) and do active spinlocking.
BugLink: http://bugs.launchpad.net/bugs/1011792
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f49a59c447 upstream.
According to the other code in this driver and similar
code in rme96 it seems, that spin_lock_irq in
snd_rme32_capture_close function should be paired
with spin_unlock_irq.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dacae5a19b upstream.
snd_ali_pointer function is called with local
interrupts disabled. However it seems very strange to
reenable them in such way.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22affe0ae upstream.
hrtimer_enqueue_reprogram contains a race which could result in
timer.base switch during unlock/lock sequence.
hrtimer_enqueue_reprogram is releasing the lock protecting the timer
base for calling raise_softirq_irqsoff() due to a lock ordering issue
versus rq->lock.
If during that time another CPU calls __hrtimer_start_range_ns() on
the same hrtimer, the timer base might switch, before the current CPU
can lock base->lock again and therefor the unlock_timer_base() call
will unlock the wrong lock.
[ tglx: Added comment and massaged changelog ]
Signed-off-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e716efde75 upstream.
commit 52553ddf(genirq: fix regression in irqfixup, irqpoll)
introduced a potential deadlock by calling the action handler with the
irq descriptor lock held.
Remove the call and let the handling code run even for an interrupt
where only a single action is registered. That matches the goal of
the above commit and avoids the deadlock.
Document the confusing action = desc->action reload in the handling
loop while at it.
Reported-and-tested-by: "Wang, Warner" <warner.wang@hp.com>
Tested-by: Edward Donovan <edward.donovan@numble.net>
Cc: "Wang, Song-Bo (Stoney)" <song-bo.wang@hp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63a3f60341 upstream.
defined(@array) is deprecated in Perl and gives off a warning.
Restructure the code to remove that warning.
[ hpa: it would be interesting to revert to the timeconst.bc script.
It appears that the failures reported by akpm during testing of
that script was due to a known broken version of make, not a problem
with bc. The Makefile rules could probably be restructured to avoid
the make bug, or it is probably old enough that it doesn't matter. ]
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c45512df9 upstream.
Commit c060f943d0 ("mm: use aligned zone start for pfn_to_bitidx
calculation") fixed out calculation of the index into the pageblock
bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.
However, the _allocation_ of that bitmap had never taken this alignment
requirement into accout, so depending on the exact size and alignment of
the zone, the use of that index could then access past the allocation,
resulting in some very subtle memory corruption.
This was reported (and bisected) by Ingo Molnar: one of his random
config builds would hang with certain very specific kernel command line
options.
In the meantime, commit c060f943d0 has been marked for stable, so this
fix needs to be back-ported to the stable kernels that backported the
commit to use the right alignment.
Bisected-and-tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch corrects a buffer overflow in kernels from 3.0 to 3.4 when calling
log_prefix() function from call_console_drivers().
This bug existed in previous releases but has been revealed with commit
162a7e7500 (2.6.39 => 3.0) that made changes
about how to allocate memory for early printk buffer (use of memblock_alloc).
It disappears with commit 7ff9554bb5 (3.4 => 3.5)
that does a refactoring of printk buffer management.
In log_prefix(), the access to "p[0]", "p[1]", "p[2]" or
"simple_strtoul(&p[1], &endp, 10)" may cause a buffer overflow as this
function is called from call_console_drivers by passing "&LOG_BUF(cur_index)"
where the index must be masked to do not exceed the buffer's boundary.
The trick is to prepare in call_console_drivers() a buffer with the necessary
data (PRI field of syslog message) to be safely evaluated in log_prefix().
This patch can be applied to stable kernel branches 3.0.y, 3.2.y and 3.4.y.
Without this patch, one can freeze a server running this loop from shell :
$ export DUMMY=`cat /dev/urandom | tr -dc '12345AZERTYUIOPQSDFGHJKLMWXCVBNazertyuiopqsdfghjklmwxcvbn' | head -c255`
$ while true do ; echo $DUMMY > /dev/kmsg ; done
The "server freeze" depends on where memblock_alloc does allocate printk buffer :
if the buffer overflow is inside another kernel allocation the problem may not
be revealed, else the server may hangs up.
Signed-off-by: Alexandre SIMON <Alexandre.Simon@univ-lorraine.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae1c07a6b7 upstream.
For some reason the reading of the RQDPC register was being artificially
limited to 4K. Instead of limiting the value we should read the value and
add the full amount. Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Vinson Lee <vlee@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 249bfb83cf upstream.
Devices are added to pci_pme_list when drivers use pci_enable_wake()
or pci_wake_from_d3(), but they aren't removed from the list unless
the driver explicitly disables wakeup. Many drivers never disable
wakeup, so their devices remain on the list even after they are
removed, e.g., via hotplug. A subsequent PME poll will oops when
it tries to touch the device.
This patch disables PME# on a device before removing it, which removes
the device from pci_pme_list. This is safe even if the device never
had PME# enabled.
This oops can be triggered by unplugging a Thunderbolt ethernet adapter
on a Macbook Pro, as reported by Daniel below.
[bhelgaas: changelog]
Reference: http://lkml.kernel.org/r/CAMVG2svG21yiM1wkH4_2pen2n+cr2-Zv7TbH3Gj+8MwevZjDbw@mail.gmail.com
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13d2b4d11d upstream.
This fixes CVE-2013-0228 / XSA-42
Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:
-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G D ---------------
2.6.32-356.el6.i686 #1
Call Trace:
[<c08476df>] ? panic+0x6e/0x122
[<c084b63c>] ? oops_end+0xbc/0xd0
[<c084b260>] ? do_general_protection+0x0/0x210
[<c084a9b7>] ? error_code+0x73/
-------------
Petr says: "
I've analysed the bug and I think that xen_iret() cannot cope with
mangled DS, in this case zeroed out (null selector/descriptor) by either
xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
entry was invalidated by the reproducer. "
Jan took a look at the preliminary patch and came up a fix that solves
this problem:
"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."
The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.
Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment. In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.
Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ee364eb31 upstream.
A user reported the following oops when a backup process reads
/proc/kcore:
BUG: unable to handle kernel paging request at ffffbb00ff33b000
IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
[...]
Call Trace:
[<ffffffff811b8aaa>] read_kcore+0x17a/0x370
[<ffffffff811ad847>] proc_reg_read+0x77/0xc0
[<ffffffff81151687>] vfs_read+0xc7/0x130
[<ffffffff811517f3>] sys_read+0x53/0xa0
[<ffffffff81449692>] system_call_fastpath+0x16/0x1b
Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.
The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD. If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.
This patch adds the necessary pud_large() check.
Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48856286b6 ]
A buggy or malicious frontend should not be able to confuse netback.
If we spot anything which is not as it should be then shutdown the
device and don't try to continue with the ring in a potentially
hostile state. Well behaved and non-hostile frontends will not be
penalised.
As well as making the existing checks for such errors fatal also add a
new check that ensures that there isn't an insane number of requests
on the ring (i.e. more than would fit in the ring). If the ring
contains garbage then previously is was possible to loop over this
insane number, getting an error each time and therefore not generating
any more pending requests and therefore not exiting the loop in
xen_netbk_tx_build_gops for an externded period.
Also turn various netdev_dbg calls which no precipitate a fatal error
into netdev_err, they are rate limited because the device is shutdown
afterwards.
This fixes at least one known DoS/softlockup of the backend domain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit daf3ec688e ]
TG3_PHY_AUXCTL_SMDSP_ENABLE/DISABLE macros do a blind write to the phy
auxiliary control register and overwrite the EXT_PKT_LEN (bit 14) resulting
in intermittent crc errors on jumbo frames with some link partners. Change
the code to do a read/modify/write.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c13cb8bb4 ]
When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6caab7b054 ]
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae62ca7b03 ]
commit 35f9c09fe9 (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.
The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.
But some programs pass an arbitrary high value, and the test fails.
The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.
We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)
Many thanks to Willy for providing very clear bug report, bisection
and test programs.
Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6731d2095b ]
There are transients during normal FRTO procedure during which
the packets_in_flight can go to zero between write_queue state
updates and firing the resulting segments out. As FRTO processing
occurs during that window the check must be more precise to
not match "spuriously" :-). More specificly, e.g., when
packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
branch that set cwnd into zero would not be taken and new segments
might be sent out later.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e5f421211 ]
Commit 9dc274151a (tcp: fix ABC in tcp_slow_start())
uncovered a bug in FRTO code :
tcp_process_frto() is setting snd_cwnd to 0 if the number
of in flight packets is 0.
As Neal pointed out, if no packet is in flight we lost our
chance to disambiguate whether a loss timeout was spurious.
We should assume it was a proper loss.
Reported-by: Pasi Kärkkäinen <pasik@iki.fi>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5c37fe6e2 ]
On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ba542a291 ]
In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f94aabd9f ]
Jamie Parsons reported a problem recently, in which the re-initalization of an
association (The duplicate init case), resulted in a loss of receive window
space. He tracked down the root cause to sctp_outq_teardown, which discarded
all the data on an outq during a re-initalization of the corresponding
association, but never reset the outq->outstanding_data field to zero. I wrote,
and he tested this fix, which does a proper full re-initalization of the outq,
fixing this problem, and hopefully future proofing us from simmilar issues down
the road.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab54ee80aa ]
We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.
So simply rename the iphase typedef to a new name. Fixes this compile error:
In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
from next/arch/s390/include/asm/lowcore.h:12,
from next/arch/s390/include/asm/thread_info.h:30,
from include/linux/thread_info.h:54,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:29,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9665d5d624 ]
When releasing a packet socket, the routine packet_set_ring() is reused
to free rings instead of allocating them. But when calling it for the
first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
which in the second invocation makes it bail out since req->tp_block_nr
is greater zero but req->tp_block_size is zero.
This patch solves the problem by passing a zeroed auto-variable to
packet_set_ring() upon each invocation from packet_release().
As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
and packet mmap), i.e. the original inclusion of TX ring support into
af_packet, but applies only to sockets with both RX and TX ring
allocated, which is probably why this was unnoticed all the time.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Cc: Johann Baudy <johann.baudy@gnu-log.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd30e94720 ]
They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.
Note that IPv4 already handles it this way. No neighbor entries are
created for local input.
Tested by myself and customer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 604dfd6efc ]
The return value of pktgen_add_device() is not checked, so
even if we fail to add some device, for example, non-exist one,
we still see "OK:...". This patch fixes it.
After this patch, I got:
# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
-bash: echo: write error: No such device
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped:
Result: ERROR: can not add device non-exist
# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped: eth0
Result: OK: add_device=eth0
(Candidate for -stable)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 794ed393b7 ]
Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.
We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.
It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().
As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.
[<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
[<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
[<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
[<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
[<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
[<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
[<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
[<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
[<ffffffff814ad90e>] ip_finish_output+0x63/0x68
[<ffffffff814ae412>] ip_output+0x61/0x67
[<ffffffff814ab904>] dst_output+0x17/0x1b
[<ffffffff814adb6d>] ip_local_out+0x1e/0x23
[<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
[<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
[<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
[<ffffffff814c3571>] tcp_connect+0x53c/0x587
[<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
[<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
[<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
[<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
[<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
[<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
[<ffffffff814632c6>] sys_connect+0x75/0x98
This bug was introduced in linux-2.6.35, in commit
7fee226ad2 (net: add a noref bit on skb dst)
skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5d0feaff23 ]
This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.
Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.
The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.
Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.
As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d721a1752b ]
If subtracting 12 from l leaves zero we'd do a zero size allocation,
leading to an oops later when we try to set the NUL terminator.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit adbbf69d1a ]
I changed my email because the vyatta.com mail server is now
redirected to brocade.com; and the Brocade mail system
is not friendly to Linux desktop users.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aacde9ee45 upstream.
Since:
commit b23b025fe2
Author: Ben Greear <greearb@candelatech.com>
Date: Fri Feb 4 11:54:17 2011 -0800
mac80211: Optimize scans on current operating channel.
we do not disable PS while going back to operational channel (on
ieee80211_scan_state_suspend) and deffer that until scan finish.
But since we are allowed to send frames, we can send a frame to AP
without PM bit set, so disable PS on AP side. Then when we switch
to off-channel (in ieee80211_scan_state_resume) we do not enable PS.
Hence we are off-channel with PS disabled, frames are not buffered
by AP.
To fix remove offchannel_ps_disable argument and always enable PS when
going off-channel and disable it when going on-channel, like it was
before.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4965f5667f upstream.
Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep. Convert the recursive calls to an
iterative loop to avoid the problem.
Tested on a machine containing 135 regions. The kernel no longer panicked
with stack overflow.
Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.
Signed-off-by: T Makphaibulchoke <tmac@hp.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48c3375c5f upstream.
This patch (as1640) fixes a memory leak in xhci-hcd. The urb_priv
data structure isn't always deallocated in the handle_tx_event()
routine for non-control transfers. The patch adds a kfree() call so
that all paths end up freeing the memory properly.
This patch should be backported to kernels as old as 2.6.36, that
contain the commit 8e51adccd4 "USB: xHCI:
Introduce urb_priv structure"
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-and-tested-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 760973d2a7 upstream.
An isochronous TD is comprised of one isochronous TRB chained to zero or
more normal TRBs. Only the isoc TRB has the TBC and TLBPC fields. The
normal TRBs must set those fields to zeroes. The code was setting the
TBC and TLBPC fields for both isoc and normal TRBs. Fix this.
This should be backported to stable kernels as old as 3.0, that contain
the commit b61d378f2d " xhci 1.0: Set
transfer burst last packet count field."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 200e0d994d upstream.
1. Optimize the match rules with new macro for Huawei USB storage devices,
to avoid to load USB storage driver for the modem interface
with Huawei devices.
2. Add to support new switch command for new Huawei USB dongles.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07c7be3d87 upstream.
1. Define a new macro for USB storage match rules:
matching with Vendor ID and interface descriptors.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e619d0415 upstream.
This patch (as1654) fixes a very old bug in ehci-hcd, connected with
scheduling of periodic split transfers. The calculations for
full/low-speed bus usage are all carried out after the correction for
bit-stuffing has been applied, but the values in the max_tt_usecs
array assume it hasn't been. The array should allow for allocation of
up to 90% of the bus capacity, which is 900 us, not 780 us.
The symptom caused by this bug is that any isochronous transfer to a
full-speed device with a maxpacket size larger than about 980 bytes is
always rejected with a -ENOSPC error.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9bae18954 upstream.
There exists a situation when GC can work in background alone without
any other filesystem activity during significant time.
The nilfs_clean_segments() method calls nilfs_segctor_construct() that
updates superblocks in the case of NILFS_SC_SUPER_ROOT and
THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the
nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
As a result, the update of superblocks doesn't occurred all this time
and in the case of SPOR superblocks keep very old values of last super
root placement.
SYMPTOMS:
Trying to mount a NILFS2 volume after SPOR in such environment ends with
very long mounting time (it can achieve about several hours in some
cases).
REPRODUCING PATH:
1. It needs to use external USB HDD, disable automount and doesn't
make any additional filesystem activity on the NILFS2 volume.
2. Generate temporary file with size about 100 - 500 GB (for example,
dd if=/dev/zero of=<file_name> bs=1073741824 count=200). The size of
file defines duration of GC working.
3. Then it needs to delete file.
4. Start GC manually by means of command "nilfs-clean -p 0". When you
start GC by means of such way then, at the end, superblocks is updated
by once. So, for simulation of SPOR, it needs to wait sometime (15 -
40 minutes) and simply switch off USB HDD manually.
5. Switch on USB HDD again and try to mount NILFS2 volume. As a
result, NILFS2 volume will mount during very long time.
REPRODUCIBILITY: 100%
FIX:
This patch adds checking that superblocks need to update and set
THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.
Reported-by: Sergey Alexandrov <splavgm@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c903f0456b upstream.
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.
In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.
Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f44310b98d upstream.
I get the following warning every day with v3.7, once or
twice a day:
[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
As explained by Linus as well:
|
| Once we've done the "list_add_rcu()" to add it to the
| queue, we can have (another) IPI to the target CPU that can
| now see it and clear the mask.
|
| So by the time we get to actually send the IPI, the mask might
| have been cleared by another IPI.
|
This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:
if (WARN_ONCE(!mask, "empty IPI mask"))
return;
then the problem in commit 83d349f35e ("x86: don't send an IPI to
the empty set of CPU's") will happen again.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a9ab9bdb3 upstream.
The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.
Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:
$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af
("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)
Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d56268fb10 upstream.
Commit 23caaf19b1 (ALSA: usb-mixer: Add support for Audio Class v2.0)
forgot to adjust the length check for UAC 2.0 feature unit descriptors.
This would make the code abort on encountering a feature unit without
per-channel controls, and thus prevented the driver to work with any
device having such a unit, such as the RME Babyface or Fireface UCX.
Reported-by: Florian Hanisch <fhanisch@uni-potsdam.de>
Tested-by: Matthew Robbetts <wingfeathera@gmail.com>
Tested-by: Michael Beer <beerml@sigma6audio.de>
Cc: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1adb2e2b5f upstream.
When the next beacon is sent, the ath_buf from the previous run is reused.
If getting a new beacon from mac80211 fails, bf->bf_mpdu is not reset, yet
the skb is freed, leading to a double-free on the next beacon tx attempt,
resulting in a system crash.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15653371c6 upstream.
Subhash Jadavani reported this partial backtrace:
Now consider this call stack from MMC block driver (this is on the ARMv7
based board):
[<c001b50c>] (v7_dma_inv_range+0x30/0x48) from [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
[<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
[<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from [<c0017ff8>] (dma_map_sg+0x3c/0x114)
This is caused by incrementing the struct page pointer, and running off
the end of the sparsemem page array. Fix this by incrementing by pfn
instead, and convert the pfn to a struct page.
Suggested-by: James Bottomley <JBottomley@Parallels.com>
Tested-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10b8c7dff5 upstream.
When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee50e135ae upstream.
Errors in CAN protocol (location) are reported in data[3] of the can
frame instead of data[2].
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 891348ca0f upstream.
We found a user code which was raising a divide-by-zero trap. That trap
would lead to XPC connections between system-partitions being torn down
due to the die_chain notifier callouts it received.
This also revealed a different issue where multiple callers into
xpc_die_deactivate() would all attempt to do the disconnect in parallel
which would sometimes lock up but often overwhelm the console on very
large machines as each would print at least one line of output at the
end of the deactivate.
I reviewed all the users of the die_chain notifier and changed the code
to ignore the notifier callouts for reasons which will not actually lead
to a system to continue on to call die().
[akpm@linux-foundation.org: fix ia64]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7328ae184 upstream.
With virtual machines like qemu, it's pretty common to see "too much
work for irq4" messages nowadays. This happens when a bunch of output
is printed on the emulated serial console. This is caused by too low
PASS_LIMIT. When ISR loops more than the limit, it spits the message.
I've been using a kernel with doubled the limit and I couldn't see no
problems. Maybe it's time to get rid of the message now?
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ram Gupta <ram.gupta5@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f9c9cbb60 upstream.
The right dmi version is in SMBIOS if it's zero in DMI region
This issue was originally found from an oracle bug.
One customer noticed system UUID doesn't match between dmidecode & uek2.
- HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
00000000-0000-4C48-3031-4D5030333531
# dmidecode | grep -i uuid
UUID: 00000000-0000-484C-3031-4D5030333531
From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
network byte order.
So we need to get dmi version to distinguish. If version is 0.0, the
real version is taken from the SMBIOS version. This is part of original
kernel comment in code.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1d8e614d7 upstream.
As of version 2.6 of the SMBIOS specification, the first 3 fields of the
UUID are supposed to be little-endian encoded.
Also a minor fix to match variable meaning and mute checkpatch.pl
[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afd5e34b2b upstream.
scsi_register_driver will register a prep_fn() function, which
in turn migh need to use the sd_cdp_pool for DIF.
Which hasn't been initialised at this point, leading to
a crash. So reshuffle the init_sd() and exit_sd() paths
to have the driver registered last.
Signed-off-by: Joel D. Diaz <joeldiaz@us.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f815a0a70 upstream.
This patch (as1644) fixes a race that occurs during startup in
uhci-hcd. If the IRQ line is shared with other devices, it's possible
for the handler routine to be called before the data structures are
fully initialized.
The problem is fixed by adding a check to the IRQ handler routine. If
the initialization hasn't finished yet, the routine will return
immediately.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Don Zickus <dzickus@redhat.com>
Tested-by: "Huang, Adrian (ISS Linux TW)" <adrian.huang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e16721498 upstream.
Right now using pcie_aspm=force will not enable ASPM if the FADT indicates
ASPM is unsupported. However, the semantics of force should probably allow
for this, especially as they did before 3c076351c4 ("PCI: Rework ASPM
disable code")
This patch just skips the clearing of any ASPM setup that the firmware has
carried out on this bus if pcie_aspm=force is being used.
Reference: http://bugs.launchpad.net/bugs/962038
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1bf08ac26 upstream.
If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.
Here's the error:
WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Hardware name: Bochs
Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Call Trace:
[<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810fd189>] ftrace_bug+0x239/0x280
[<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
[<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
[<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
[<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
[<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff810c2a23>] sys_init_module+0x73/0x220
[<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
---[ end trace 9ef46351e53bbf80 ]---
ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
actual: cc:bb:d2:4b:e1
A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).
Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.
The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.
Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit 262b6d363f upstream.
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.
Fixes regression from commit bcf50e2775
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sun Nov 21 22:07:12 2010 +0000
drm/i915: Handle pagefaults in execbuffer user relocations
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 1ee4c55fc9 upstream.
vt6656 has several headers that use the #pragma pack(1) directive to
enable structure packing, but never disable it. The layout of
structures defined in other headers can then depend on which order the
various headers are included in, breaking the One Definition Rule.
In practice this resulted in crashes on x86_64 until the order of header
inclusion was changed for some files in commit 11d404cb56 ('staging:
vt6656: fix headers and add cfg80211.'). But we need a proper fix that
won't be affected by future changes to the order of inclusion.
This removes the #pragma pack(1) directives and adds __packed to the
structure definitions for which packing appears to have been intended.
Reported-and-tested-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 014b9b4ce8 upstream.
When shut down SPI port, it's possible that MRDY has been asserted and a SPI
timer was activated waiting for SRDY assert, in the case, it needs to delete
this timer.
Signed-off-by: Chen Jun <jun.d.chen@intel.com>
Signed-off-by: channing <chao.bi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
commit 2291dff02e upstream.
The driver description files gives these names to the vendor specific
functions on this modem:
Diag VID_19D2&PID_0265&MI_00
NMEA VID_19D2&PID_0265&MI_01
AT cmd VID_19D2&PID_0265&MI_02
Modem VID_19D2&PID_0265&MI_03
Net VID_19D2&PID_0265&MI_04
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99beb2e968 upstream.
The driver description files gives these names to the vendor specific
functions on this modem:
Diagnostics VID_2357&PID_0201&MI_00
NMEA VID_2357&PID_0201&MI_01
Modem VID_2357&PID_0201&MI_03
Networkcard VID_2357&PID_0201&MI_04
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9174adbee4 upstream.
This fixes CVE-2013-0190 / XSA-40
There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path. This can result in the kernel crashing.
In the classic kernel case, the relevant code looked a little like:
popl %eax # Error code from hypervisor
jz 5f
addl $16,%esp
jmp iret_exc # Hypervisor said iret fault
5: addl $16,%esp
# Hypervisor said segment selector fault
Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.
In the PVOPS case, the code looks like:
popl_cfi %eax # Error from the hypervisor
lea 16(%esp),%esp # Add $16 before choosing fault path
CFI_ADJUST_CFA_OFFSET -16
jz 5f
addl $16,%esp # Incorrectly adjust %esp again
jmp iret_exc
It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present. At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.
This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68e5254adb upstream.
xhci_alloc_segments_for_ring() builds a list of xhci_segments and links
the tail to head at the end (forming a ring). When it bails out for OOM
reasons half-way through, it tries to destroy its half-built list with
xhci_free_segments_for_ring(), even though it is not a ring yet. This
causes a null-pointer dereference upon hitting the last element.
Furthermore, one of its callers (xhci_ring_alloc()) mistakenly believes
the output parameters to be valid upon this kind of OOM failure, and
calls xhci_ring_free() on them. Since the (incomplete) list/ring should
already be destroyed in that case, this would lead to a use after free.
This patch fixes those issues by having xhci_alloc_segments_for_ring()
destroy its half-built, non-circular list manually and destroying the
invalid struct xhci_ring in xhci_ring_alloc() with a plain kfree().
This patch should be backported to kernels as old as 2.6.31, that
contains the commit 0ebbab3742 "USB: xhci:
Ring allocation and initialization."
A separate patch will need to be developed for kernels older than 3.4,
since the ring allocation code was refactored in that kernel.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
[bwh: Backported to 3.2:
- Adjust context
- Since segment allocation is done directly in xhci_ring_alloc(), walk
the list starting from ring->first_seg when freeing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea2447f700 upstream.
This patch is to prevent non-USB devices that have RMRRs associated with them from
being placed into the SI Domain during init. This fixes the issue where the RMRR info
for devices being placed in and out of the SI Domain gets lost.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36caff5d79 upstream.
This patch (as1631) fixes a bug that shows up when a config change
fails for a device under an xHCI controller. The controller needs to
be told to disable the endpoints that have been enabled for the new
config. The existing code does this, but before storing the
information about which endpoints were enabled! As a result, any
second attempt to install the new config is doomed to fail because
xhci-hcd will refuse to enable an endpoint that is already enabled.
The patch optimistically initializes the new endpoints' device
structures before asking the device to switch to the new config. If
the request fails then the endpoint information is already stored, so
we can use usb_hcd_alloc_bandwidth() to disable the endpoints with no
trouble. The rest of the error path is slightly more complex now; we
have to disable the new interfaces and call put_device() rather than
simply deallocating them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Matthias Schniedermeyer <ms@citd.de>
CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
[not upstream as the code involved was removed in the 3.3.0 release]
Fix wii_memory_fixups() the following compile error on 3.0.y tree with
wii_defconfig on 3.0.y tree.
CC arch/powerpc/platforms/embedded6xx/wii.o
arch/powerpc/platforms/embedded6xx/wii.c: In function ‘wii_memory_fixups’:
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
cc1: all warnings being treated as errors
make[2]: *** [arch/powerpc/platforms/embedded6xx/wii.o] Error 1
make[1]: *** [arch/powerpc/platforms/embedded6xx] Error 2
make: *** [arch/powerpc/platforms] Error 2
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66bea92c69 upstream.
ext4_da_block_invalidatepages is missing a pagevec_init(),
which means that pvec->cold contains random garbage.
This affects whether the page goes to the front or
back of the LRU when ->cold makes it to
free_hot_cold_page()
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9acc5365d upstream.
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table. So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.
Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.
[ hpa: made a number of stylistic changes, marked arrays as static
const, and made less verbose; use "memblock=debug" for full
verbosity. ]
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed4f20943c upstream.
Converting a 64 Bit TOD format value to nanoseconds means that the value
must be divided by 4.096. In order to achieve that we multiply with 125
and divide by 512.
When used within sched_clock() this triggers an overflow after appr.
417 days. Resulting in a sched_clock() return value that is much smaller
than previously and therefore may cause all sort of weird things in
subsystems that rely on a monotonic sched_clock() behaviour.
To fix this implement a tod_to_ns() helper function which converts TOD
values without overflow and call this function from both places that
open coded the conversion: sched_clock() and kvm_s390_handle_wait().
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a71997a32 upstream.
Ensure that the aux table is properly initialized, even when optional features
are missing. Without this, the FDPIC loader did not work. This was meant to
be included in commit d5ab780305.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34ffb33e09 upstream.
The 'ni_at_a2150' module links to `cfc_write_to_buffer` in the
'comedi_fc' module, so selecting 'COMEDI_NI_AT_A2150' in the kernel config
needs to also select 'COMEDI_FC'.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c43435d772 upstream.
comedi_auto_config() associates a Comedi minor device number with an
auto-configured hardware device and comedi_auto_unconfig() disassociates
it. Currently, these use the hardware device's private data pointer to
point to some allocated storage holding the minor device number. This
is a bit of a waste of the hardware device's private data pointer,
preventing it from being used for something more useful by the low-level
comedi device drivers. For example, it would make more sense if
comedi_usb_auto_config() was passed a pointer to the struct
usb_interface instead of the struct usb_device, but this cannot be done
currently because the low-level comedi drivers already use the private
data pointer in the struct usb_interface for something more useful.
This patch stops the comedi core hijacking the hardware device's private
data pointer. Instead, comedi_auto_config() stores a pointer to the
hardware device's struct device in the struct comedi_device_file_info
associated with the minor device number, and comedi_auto_unconfig()
calls new function comedi_find_board_minor() to recover the minor device
number associated with the hardware device.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e43a028752 upstream.
When remembering the direction of a DCR transaction, we should write
to the same variable that we interpret on later when doing vcpu_run
again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6491d4d028 upstream.
The dma_pte_free_pagetable() function will only free a page table page
if it is asked to free the *entire* 2MiB range that it covers. So if a
page table page was used for one or more small mappings, it's likely to
end up still present in the page tables... but with no valid PTEs.
This was fine when we'd only be repopulating it with 4KiB PTEs anyway
but the same virtual address range can end up being reused for a
*large-page* mapping. And in that case were were trying to insert the
large page into the second-level page table, and getting a complaint
from the sanity check in __domain_mapping() because there was already a
corresponding entry. This was *relatively* harmless; it led to a memory
leak of the old page table page, but no other ill-effects.
Fix it by calling dma_pte_clear_range (hopefully redundant) and
dma_pte_free_pagetable() before setting up the new large page.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Ravi Murty <Ravi.Murty@intel.com>
Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96e5d1d3ad upstream.
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55c1945eda upstream.
A high speed control or bulk endpoint may have bInterval set to zero,
which means it does not NAK. If bInterval is non-zero, it means the
endpoint NAKs at a rate of 2^(bInterval - 1).
The xHCI code to compute the NAK interval does not handle the special
case of zero properly. The current code unconditionally subtracts one
from bInterval and uses it as an exponent. This causes a very large
bInterval to be used, and warning messages like these will be printed:
usb 1-1: ep 0x1 - rounding interval to 32768 microframes, ep desc says 0 microframes
This may cause the xHCI host hardware to reject the Configure Endpoint
command, which means the HS device will be unusable under xHCI ports.
This patch should be backported to kernels as old as 2.6.31, that contain
commit dfa49c4ad1 "USB: xhci - fix math in
xhci_get_endpoint_interval()".
Reported-by: Vincent Pelletier <plr.vincent@gmail.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07e72b95f5 upstream.
Some touchscreens have buggy firmware which claims
remote wakeup to be enabled after a reset. They nevertheless
crash if the feature is cleared by the host.
Add a check for reset resume before checking for
an enabled remote wakeup feature. On compliant
devices the feature must be cleared after a reset anyway.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77c7f072c8 upstream.
John's NEC 0.96 xHCI host controller needs a longer timeout for a warm
reset to complete. The logs show it takes 650ms to complete the warm
reset, so extend the hub reset timeout to 800ms to be on the safe side.
This commit should be backported to kernels as old as 3.2, that contain
the commit 75d7cf72ab "usbcore: refine
warm reset logic".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: John Covici <covici@ccs.covici.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d16638e3b upstream.
If we do have endpoints named like "ep-a" then bEndpointAddress is
counted internally by the gadget framework.
If we do have endpoints named like "ep-1" then bEndpointAddress is
assigned from the digit after "ep-".
If we do have both, then it is likely that after we used up the
"generic" endpoints we will use the digits and thus assign one
bEndpointAddress to multiple endpoints.
This theory can be proofed by using the completely enabled g_multi.
Without this patch, the mass storage won't enumerate and times out
because it shares endpoints with RNDIS.
This patch also adds fills up the endpoints list so we have in total
endpoints 1 to 15 in + out available while some of them are restricted
to certain types like BULK or ISO. Without this change the nokia gadget
won't load because the system does not provide enough (BULK) endpoints
but it did before ep-a - ep-f were removed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 036915a7a4 upstream.
Adding support "PSC Scanning, Magellan 800i" in cdc-acm
Very simple, but very necessary.
Suitable for all versions of the kernel > 2.6
Signed-off-by: Denis N Ladin <denladin@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8cf65dc386 upstream.
Simple fix to add support for Crucible Technologies COMET Caller ID
USB decoder - a device containing FTDI USB/Serial converter chip,
handling 1200bps CallerID messages decoded from the phone line -
adding correct USB PID is sufficient.
Tested to apply cleanly and work flawlessly against 3.6.9, 3.7.0-rc8
and 3.8.0-rc3 on both amd64 and x86 arches.
Signed-off-by: Tomasz Mloduchowski <q@qdot.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ec0085440 upstream.
also known as Alcatel One Touch L100V LTE
The driver description files gives these names to the vendor specific
functions on this modem:
Application1: VID_1BBB&PID_011E&MI_00
Application2: VID_1BBB&PID_011E&MI_01
Modem: VID_1BBB&PID_011E&MI_03
Ethernet: VID_1BBB&PID_011E&MI_04
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94a85b6338 upstream.
In option.c, add some new MEDIATEK PIDs support for MEDIATEK new products. This
is a MEDIATEK inc. release patch.
Signed-off-by: Quentin.Li <snowmanli88@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a56f992cda upstream.
This is a very old bug, but there's nothing that prevents the
timer from running while the module is being removed when we
only do del_timer() instead of del_timer_sync().
The timer should normally not be running at this point, but
it's not clearly impossible (or we could just remove this.)
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51861d4eeb upstream.
Those rn50 chip are often connected to console remoting hw and load
detection often fails with those. Just don't try to load detect and
report connect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0729eeefd upstream.
Éric Piel reported a kernel oops in the "comedi_test" module. It was a
NULL pointer dereference within `waveform_ai_interrupt()` (actually a
timer function) that sometimes occurred when a running asynchronous
command is cancelled (either by the `COMEDI_CANCEL` ioctl or by closing
the device file).
This seems to be a race between the caller of `waveform_ai_cancel()`
which on return from that function goes and tears down the running
command, and the timer function which uses the command. In particular,
`async->cmd.chanlist` gets freed (and the pointer set to NULL) by
`do_become_nonbusy()` in "comedi_fops.c" but a previously scheduled
`waveform_ai_interrupt()` timer function will dereference that pointer
regardless, leading to the oops.
Fix it by replacing the `del_timer()` call in `waveform_ai_cancel()`
with `del_timer_sync()`.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reported-by: Éric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d3135af39 upstream.
When a low-level comedi driver auto-configures a device, a `struct
comedi_dev_file_info` is allocated (as well as a `struct
comedi_device`) by `comedi_alloc_board_minor()`. A pointer to the
hardware `struct device` is stored as a cookie in the `struct
comedi_dev_file_info`. When the low-level comedi driver
auto-unconfigures the device, `comedi_auto_unconfig()` uses the cookie
to find the `struct comedi_dev_file_info` so it can detach the comedi
device from the driver, clean it up and free it.
A problem arises if the user manually unconfigures and reconfigures the
comedi device using the `COMEDI_DEVCONFIG` ioctl so that is no longer
associated with the original hardware device. The problem is that the
cookie is not cleared, so that a call to `comedi_auto_unconfig()` from
the low-level driver will still find it, detach it, clean it up and free
it.
Stop this problem occurring by always clearing the `hardware_device`
cookie in the `struct comedi_dev_file_info` whenever the
`COMEDI_DEVCONFIG` ioctl call is successful.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41b645c862 upstream.
Cold reset on the pxa27x currently fails and
pxa2xx_ac97_try_cold_reset: cold reset timeout (GSR=0x44)
appears in the kernel log. Through trial-and-error (the pxa270 developer's
manual is mostly incoherent on the topic of ac97 reset), I got cold reset to
complete by setting the WARM_RST bit in the GCR register (and later noticed that
pxa3xx does this for cold reset as well). Also, a timeout loop is needed to
wait for the reset to complete.
Tested on a palm treo 680 machine.
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Acked-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 115c9b8192 upstream.
Implement a new netlink attribute type IFLA_EXT_MASK. The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.
This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.
Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask. If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.0:
- Adjust context
- Drop the change in do_setlink() that reverts commit f18da14565
('net: RTNETLINK adjusting values of min_ifinfo_dump_size'), which
was never applied here]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7ac8679be upstream.
The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.
Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbf0a1427 upstream.
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f1d06c34f upstream.
On COW, a new hugepage is allocated and charged to the memcg. If the
system is oom or the charge to the memcg fails, however, the fault
handler will return VM_FAULT_OOM which results in an oom kill.
Instead, it's possible to fallback to splitting the hugepage so that the
COW results only in an order-0 page being allocated and charged to the
memcg which has a higher liklihood to succeed. This is expensive
because the hugepage must be split in the page fault handler, but it is
much better than unnecessarily oom killing a process.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb719c59bd upstream.
Incrementing lenExtents even while writing to a hole is bad
for performance as calls to udf_discard_prealloc and
udf_truncate_tail_extent would not return from start if
isize != lenExtents
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a41409c51 upstream, but doesn't
apply, so this version is different for older kernels than 3.7.x
blk_alloc_queue has already done a bdi_init, so do not bdi_init
again in aoeblk_gdalloc. The extra call causes list corruption
in the per-CPU backing dev info stats lists.
Affected users see console WARNINGs about list_del corruption on
percpu_counter_destroy when doing "rmmod aoe" or "aoeflush -a"
when AoE targets have been detected and initialized by the
system.
The patch below applies to v3.6.11, with its v47 aoe driver. It
is expected to apply to all currently maintained stable kernels
except 3.7.y. A related but different fix has been posted for
3.7.y.
References:
RedHat bugzilla ticket with original report
https://bugzilla.redhat.com/show_bug.cgi?id=853064
LKML discussion of bug and fix
http://thread.gmane.org/gmane.linux.kernel/1416336/focus=1416497
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 721e3eba21 upstream.
Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d096ad0f79 upstream.
When a journal-less ext4 filesystem is mounted on a read-only block
device (blockdev --setro will do), each remount (for other, unrelated,
flags, like suid=>nosuid etc) results in a series of scary messages
from kernel telling about I/O errors on the device.
This is becauese of the following code ext4_remount():
if (sbi->s_journal == NULL)
ext4_commit_super(sb, 1);
at the end of remount procedure, which forces writing (flushing) of
a superblock regardless whenever it is dirty or not, if the filesystem
is readonly or not, and whenever the device itself is readonly or not.
We only need call ext4_commit_super when the file system had been
previously mounted read/write.
Thanks to Eric Sandeen for help in diagnosing this issue.
Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7961c7fa4 upstream.
The following race is possible between start_this_handle() and someone
calling jbd2_journal_flush().
Process A Process B
start_this_handle().
if (journal->j_barrier_count) # false
if (!journal->j_running_transaction) { #true
read_unlock(&journal->j_state_lock);
jbd2_journal_lock_updates()
jbd2_journal_flush()
write_lock(&journal->j_state_lock);
if (journal->j_running_transaction) {
# false
... wait for committing trans ...
write_unlock(&journal->j_state_lock);
...
write_lock(&journal->j_state_lock);
if (!journal->j_running_transaction) { # true
jbd2_get_transaction(journal, new_transaction);
write_unlock(&journal->j_state_lock);
goto repeat; # eventually blocks on j_barrier_count > 0
...
J_ASSERT(!journal->j_running_transaction);
# fails
We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
in exclusive mode.
Reported-by: yjwsignal@empal.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c1ecba8d8 upstream.
The VDCTRL4 register does not provide the MXS SET/CLR/TOGGLE feature.
The write in mxsfb_disable_controller() sets the data_cnt for the LCD
DMA to 0 which obviously means the max. count for the LCD DMA and
leads to overwriting arbitrary memory when the display is unblanked.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Juergen Beisert <jbe@pengutronix.de>
Tested-by: Lauri Hintsala <lauri.hintsala@bluegiga.net>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70e227790d upstream.
The timer appears to run too fast/race on 64 bit systems.
Using msecs_to_jiffies seems to cause a deadlock on 64 bit.
A calculation of (MSecond * HZ) / 1000 appears to run satisfactory.
Change BSSIDInfoCount to u32.
After this patch the driver can be successfully connect on little endian 64/32 bit systems.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7730492855 upstream.
After this patch all BYTE/WORD/DWORD types can be replaced with the appropriate u sizes.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab1dd99631 upstream.
Calling RFbSetPower with uCH zero value will cause out of bound array reference.
This causes 64 bit kernels to oops on boot.
Note: Driver does not function on 64 bit kernels and should be
blacklisted on them.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e910d7ebec upstream.
Abort dm ioctl processing if userspace changes the data_size parameter
after we validated it but before we finished copying the data buffer
from userspace.
The dm ioctl parameters are processed in the following sequence:
1. ctl_ioctl() calls copy_params();
2. copy_params() makes a first copy of the fixed-sized portion of the
userspace parameters into the local variable "tmp";
3. copy_params() then validates tmp.data_size and allocates a new
structure big enough to hold the complete data and copies the whole
userspace buffer there;
4. ctl_ioctl() reads userspace data the second time and copies the whole
buffer into the pointer "param";
5. ctl_ioctl() reads param->data_size without any validation and stores it
in the variable "input_param_size";
6. "input_param_size" is further used as the authoritative size of the
kernel buffer.
The problem is that userspace code could change the contents of user
memory between steps 2 and 4. In particular, the data_size parameter
can be changed to an invalid value after the kernel has validated it.
This lets userspace force the kernel to access invalid kernel memory.
The fix is to ensure that the size has not changed at step 4.
This patch shouldn't have a security impact because CAP_SYS_ADMIN is
required to run this code, but it should be fixed anyway.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9366c1ba13 upstream.
The function rb_check_pages() was added to make sure the ring buffer's
pages were sane. This check is done when the ring buffer size is modified
as well as when the iterator is released (closing the "trace" file),
as that was considered a non fast path and a good place to do a sanity
check.
The problem is that the check does not have any locks around it.
If one process were to read the trace file, and another were to read
the raw binary file, the check could happen while the reader is reading
the file.
The issues with this is that the check requires to clear the HEAD page
before doing the full check and it restores it afterward. But readers
require the HEAD page to exist before it can read the buffer, otherwise
it gives a nasty warning and disables the buffer.
By adding the reader lock around the check, this keeps the race from
happening.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cbba75a56 upstream.
Users of jffs2_do_reserve_space() expect they still held
erase_completion_lock after call to it. But there is a path
where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
The patch fixes it.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6567ed140 upstream.
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24ec19b0ae upstream.
In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.
This patch fixes that.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9fbb62eb6 upstream.
mfd_remove_devices would iterate over all devices sharing a parent with
an mfd device regardless of whether they were allocated by the mfd core
or not. This especially caused problems when the device structure was
not contained within a platform_device, because to_platform_device is
used on each device pointer.
This patch defines a device_type for mfd devices and checks this is
present from mfd_remove_devices_fn before processing the device.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Peter Tyser <ptyser@xes-inc.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f4ad44b26 upstream.
The lockdep warning below is in theory correct but it will be in really weird
rare situation that ends up that deadlock since the tcm fc session is hashed
based the rport id. Nonetheless, the complaining below is about rcu callback
that does the transport_deregister_session() is happening in softirq, where
transport_register_session() that happens earlier is not. This triggers the
lockdep warning below. So, just fix this to make lockdep happy by disabling
the soft irq before calling transport_register_session() in ft_prli.
BTW, this was found in FCoE VN2VN over two VMs, couple of create and destroy
would get this triggered.
v1: was enforcing register to be in softirq context which was not righ. See,
http://www.spinics.net/lists/target-devel/msg03614.html
v2: following comments from Roland&Nick (thanks), it seems we don't have to
do transport_deregister_session() in rcu callback, so move it into ft_sess_free()
but still do kfree() of the corresponding ft_sess struct in rcu callback to
make sure the ft_sess is not freed till the rcu callback.
...
[ 1328.370592] scsi2 : FCoE Driver
[ 1328.383429] fcoe: No FDMI support.
[ 1328.384509] host2: libfc: Link up on port (000000)
[ 1328.934229] host2: Assigned Port ID 00a292
[ 1357.232132] host2: rport 00a393: Remove port
[ 1357.232568] host2: rport 00a393: Port sending LOGO from Ready state
[ 1357.233692] host2: rport 00a393: Delete port
[ 1357.234472] host2: rport 00a393: work event 3
[ 1357.234969] host2: rport 00a393: callback ev 3
[ 1357.235979] host2: rport 00a393: Received a LOGO response closed
[ 1357.236706] host2: rport 00a393: work delete
[ 1357.237481]
[ 1357.237631] =================================
[ 1357.238064] [ INFO: inconsistent lock state ]
[ 1357.238450] 3.7.0-rc7-yikvm+ #3 Tainted: G O
[ 1357.238450] ---------------------------------
[ 1357.238450] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1357.238450] ksoftirqd/0/3 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 1357.238450] (&(&se_tpg->session_lock)->rlock){+.?...}, at: [<ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] {SOFTIRQ-ON-W} state was registered at:
[ 1357.238450] [<ffffffff810834f5>] mark_held_locks+0x6d/0x95
[ 1357.238450] [<ffffffff8108364a>] trace_hardirqs_on_caller+0x12d/0x197
[ 1357.238450] [<ffffffff810836c1>] trace_hardirqs_on+0xd/0xf
[ 1357.238450] [<ffffffff8149caba>] _raw_spin_unlock_irq+0x2d/0x45
[ 1357.238450] [<ffffffffa01e8d10>] __transport_register_session+0xb8/0x122 [target_core_mod]
[ 1357.238450] [<ffffffffa01e8dbe>] transport_register_session+0x44/0x5a [target_core_mod]
[ 1357.238450] [<ffffffffa018e32c>] ft_prli+0x1e3/0x275 [tcm_fc]
[ 1357.238450] [<ffffffffa0160e8d>] fc_rport_recv_req+0x95e/0xdc5 [libfc]
[ 1357.238450] [<ffffffffa015be88>] fc_lport_recv_els_req+0xc4/0xd5 [libfc]
[ 1357.238450] [<ffffffffa015c778>] fc_lport_recv_req+0x12f/0x18f [libfc]
[ 1357.238450] [<ffffffffa015a6d7>] fc_exch_recv+0x8ba/0x981 [libfc]
[ 1357.238450] [<ffffffffa0176d7a>] fcoe_percpu_receive_thread+0x47a/0x4e2 [fcoe]
[ 1357.238450] [<ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] irq event stamp: 275411
[ 1357.238450] hardirqs last enabled at (275410): [<ffffffff810bb6a0>] rcu_process_callbacks+0x229/0x42a
[ 1357.238450] hardirqs last disabled at (275411): [<ffffffff8149c2f7>] _raw_spin_lock_irqsave+0x22/0x8e
[ 1357.238450] softirqs last enabled at (275394): [<ffffffff8103d669>] __do_softirq+0x246/0x26f
[ 1357.238450] softirqs last disabled at (275399): [<ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450]
[ 1357.238450] other info that might help us debug this:
[ 1357.238450] Possible unsafe locking scenario:
[ 1357.238450]
[ 1357.238450] CPU0
[ 1357.238450] ----
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450] <Interrupt>
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450]
[ 1357.238450] *** DEADLOCK ***
[ 1357.238450]
[ 1357.238450] no locks held by ksoftirqd/0/3.
[ 1357.238450]
[ 1357.238450] stack backtrace:
[ 1357.238450] Pid: 3, comm: ksoftirqd/0 Tainted: G O 3.7.0-rc7-yikvm+ #3
[ 1357.238450] Call Trace:
[ 1357.238450] [<ffffffff8149399a>] print_usage_bug+0x1f5/0x206
[ 1357.238450] [<ffffffff8100da59>] ? save_stack_trace+0x2c/0x49
[ 1357.238450] [<ffffffff81082aae>] ? print_irq_inversion_bug.part.14+0x1ae/0x1ae
[ 1357.238450] [<ffffffff81083336>] mark_lock+0x106/0x258
[ 1357.238450] [<ffffffff81084e34>] __lock_acquire+0x2e7/0xe53
[ 1357.238450] [<ffffffff8102903d>] ? pvclock_clocksource_read+0x48/0xb4
[ 1357.238450] [<ffffffff810ba6a3>] ? rcu_process_gp_end+0xc0/0xc9
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff81085ef1>] lock_acquire+0x119/0x143
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff8149c329>] _raw_spin_lock_irqsave+0x54/0x8e
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff810bb6a0>] ? rcu_process_callbacks+0x229/0x42a
[ 1357.238450] [<ffffffffa018ddc5>] ft_sess_rcu_free+0x17/0x24 [tcm_fc]
[ 1357.238450] [<ffffffffa018ddae>] ? ft_sess_free+0x1b/0x1b [tcm_fc]
[ 1357.238450] [<ffffffff810bb6d7>] rcu_process_callbacks+0x260/0x42a
[ 1357.238450] [<ffffffff8103d55d>] __do_softirq+0x13a/0x26f
[ 1357.238450] [<ffffffff8149b34e>] ? __schedule+0x65f/0x68e
[ 1357.238450] [<ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450] [<ffffffff8105c83c>] smpboot_thread_fn+0x1a5/0x1aa
[ 1357.238450] [<ffffffff8105c697>] ? smpboot_unregister_percpu_thread+0x47/0x47
[ 1357.238450] [<ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<ffffffff8149b49d>] ? wait_for_common+0xbb/0x10a
[ 1357.238450] [<ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1357.238450] [<ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] [<ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1417.440099] rport-2:0-0: blocked FC remote port time out: removing rport
Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: Open-FCoE <devel@open-fcoe.org>
Cc: Nicholas A. Bellinger <nab@risingtidesystems.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3100d49d3c upstream.
sata_promise's pdc_hard_reset_port() needs to serialize because it
flips a port-specific bit in controller register that's shared by
all ports. The code takes the ata host lock for this, but that's
broken because an interrupt may arrive on our irq during the hard
reset sequence, and that too will take the ata host lock. With
lockdep enabled a big nasty warning is seen.
Fixed by adding private state to the ata host structure, containing
a second lock used only for serializing the hard reset sequences.
This eliminated the lockdep warnings both on my test rig and on
the original reporter's machine.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Adko Branil <adkobranil@yahoo.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a394aac885 upstream.
When the qla2xxx driver loses access to multiple, remote ports, there is a race
condition which can occur which will keep the request stuck on a scsi request
queue indefinitely.
This bad state occurred do to a race condition with how the FCPORT_UPDATE_NEEDED
bit is set in qla2x00_schedule_rport_del(), and how it is cleared in
qla2x00_do_dpc(). The problem port has its drport pointer set, but it has never
been processed by the driver to inform the fc transport that the port has been
lost. qla2x00_schedule_rport_del() sets drport, and then sets the
FCPORT_UPDATE_NEEDED bit. In qla2x00_do_dpc(), the port lists are walked and
any drport pointer is handled and the fc transport informed of the port loss,
then the FCPORT_UPDATE_NEEDED bit is cleared. This leaves a race where the
dpc thread is processing one port removal, another port removal is marked
with a call to qla2x00_schedule_rport_del(), and the dpc thread clears the
bit for both removals, even though only the first removal was actually
handled. Until another event occurs to set FCPORT_UPDATE_NEEDED, the later
port removal is never finished and qla2xxx stays in a bad state which causes
requests to become stuck on request queues.
This patch updates the driver to test and clear FCPORT_UPDATE_NEEDED
atomically. This ensures the port state changes are processed and not lost.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit beecadea1b upstream.
The macro bit(n) is defined as ((u32)1 << n), and thus it doesn't work
with n >= 32, such as in mvs_94xx_assign_reg_set():
if (i >= 32) {
mvi->sata_reg_set |= bit(i);
...
}
The shift ((u32)1 << n) with n >= 32 also leads to undefined behavior.
The result varies depending on the architecture.
This patch changes bit(n) to do a 64-bit shift. It also simplifies
mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d23734209 upstream.
This patch fixes both the transmit and receive portion of sending
fragmented mutlicast and broadcast packets.
The transmit section was broken because the offset for INTFRAG and
LASTFRAG packets were just miscalculated by IEEE1394_GASP_HDR_SIZE (which
was reserved with skb_push() in fwnet_send_packet).
The receive section was broken because in fwnet_incoming_packet is a call
to fwnet_peer_find_by_node_id(). Called with generation == -1 it will
not find a peer and the partial datagrams are associated to a peer.
[Stefan R: The fix to use context->card->generation is not perfect.
It relies on the IR tasklet which processes packets from the prior bus
generation to run before the self-ID-complete worklet which sets the
current card generation. Alas, there is no simple way of a race-free
implementation. Let's do it this way for now.]
Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c170e0686 upstream.
This reverts commit f74b9d365d.
Turns out reverting commit a240dc7b3c
"ath9k_hw: Updated AR9003 tx gain table for 5GHz" was not enough to
bring the tx power back to normal levels on devices like the
Buffalo WZR-HP-G450H, this one needs to be reverted as well.
This revert improves tx power by ~10 db on that device
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: rmanohar@qca.qualcomm.com
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c060f943d0 upstream.
The current calculation in pfn_to_bitidx assumes that (pfn -
zone->zone_start_pfn) >> pageblock_order will return the same bit for
all pfn in a pageblock. If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.
Consider the following with pageblock order = 10, zone start 2MB:
pfn | pfn - zone start | (pfn - zone start) >> page block order
----------------------------------------------------------------
0x26000 | 0x25e00 | 0x97
0x26100 | 0x25f00 | 0x97
0x26200 | 0x26000 | 0x98
0x26300 | 0x26100 | 0x98
This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block. Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.
For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7964c06d66 upstream.
when run the folloing command under shell, it will return error
sh/$ echo 1 > /proc/sys/vm/compact_memory
sh/$ sh: write error: Bad address
After strace, I found the following log:
...
write(1, "1\n", 2) = 3
write(1, "", 4294967295) = -1 EFAULT (Bad address)
write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
) = 31
This tells system return 3(COMPACT_COMPLETE) after write data to
compact_memory.
The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
from sysctl_compaction_handler after compaction_nodes finished.
Signed-off-by: Jason Liu <r64343@freescale.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d99e79ec55 upstream.
The check to whom a device is reserved is done by checking the path
state of the affected channel paths. If it turns out that one path is
flagged as reserved by someone else the whole device is marked as such.
However the meaning of the RESVD_ELSE bit is that the addressed device
is reserved to a different pathgroup (and not reserved to a different
LPAR). If we do this test on a path which is currently not a member of
the pathgroup we could erroneously mark the device as reserved to
someone else.
To fix this collect the reserved state for all potential members of the
pathgroup and only mark the device as reserved if all of those potential
members have the RESVD_ELSE bit set.
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce73ec6db4 upstream.
The locking in update_vsyscall_tz() is not only unnecessary because the vdso
code copies the data unproteced in __kernel_gettimeofday() but also
introduces a hard to reproduce race condition between update_vsyscall()
and update_vsyscall_tz(), which causes user space process to loop
forever in vdso code.
The following patch removes the locking from update_vsyscall_tz().
Locking is not only unnecessary because the vdso code copies the data
unprotected in __kernel_gettimeofday() but also erroneous because updating
the tb_update_count is not atomic and introduces a hard to reproduce race
condition between update_vsyscall() and update_vsyscall_tz(), which further
causes user space process to loop forever in vdso code.
The below scenario describes the race condition,
x==0 Boot CPU other CPU
proc_P: x==0
timer interrupt
update_vsyscall
x==1 x++;sync settimeofday
update_vsyscall_tz
x==2 x++;sync
x==3 sync;x++
sync;x++
proc_P: x==3 (loops until x becomes even)
Because the ++ operator would be implemented as three instructions and not
atomic on powerpc.
A similar change was made for x86 in commit 6c260d5863
("x86: vdso: Remove bogus locking in update_vsyscall_tz")
Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11ee7e99f3 upstream.
If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
the kernel fails when we run at a non zero offset. It turns out
we were incorrectly wrapping some of the relocatable kernel code
with CONFIG_CRASH_DUMP.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab48b03ec9 upstream.
If the restart timer is running due to BUS-OFF and the device is
disconnected an dev_put will decrease the usage counter to -1 thus
blocking the interface removal, resulting in the following dmesg
lines repeating every 10s:
can: notifier: receive list not found for dev can0
can: notifier: receive list not found for dev can0
can: notifier: receive list not found for dev can0
unregister_netdevice: waiting for can0 to become free. Usage count = -1
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53a59fc67f upstream.
Since commit e303297e6c ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.
This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.
The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.
The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather. 10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.
This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.
The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).
BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
Supported: Yes
CPU 56
Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
RIP: 0010: _raw_spin_unlock_irqrestore+0x8/0x10
RSP: 0018:ffff883ec1037af0 EFLAGS: 00000206
RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
FS: 00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
Call Trace:
release_pages+0xc5/0x260
free_pages_and_swap_cache+0x9d/0xc0
tlb_flush_mmu+0x5c/0x80
tlb_finish_mmu+0xe/0x50
exit_mmap+0xbd/0x120
mmput+0x49/0x120
exit_mm+0x122/0x160
do_exit+0x17a/0x430
do_group_exit+0x3d/0xb0
get_signal_to_deliver+0x247/0x480
do_signal+0x71/0x1b0
do_notify_resume+0x98/0xb0
int_signal+0x12/0x17
DWARF2 unwinder stuck at int_signal+0x12/0x17
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f90b68309 upstream.
tm_mon is 0..11, whereas vt8500 expects 1..12 for the month field,
causing invalid date errors for January, and causing the day field to
roll over incorrectly.
The century flag is only handled in vt8500_rtc_read_time, but not set in
vt8500_rtc_set_time. This patch corrects the behaviour of the century
flag.
Signed-off-by: Edgar Toernig <froese@gmx.de>
Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c24bf9b4cc upstream.
The inb/outb macros for CRIS are broken from a number of points of view,
missing () around parameters and they have an unprotected if statement
in them. This was breaking the compile of IPMI on CRIS and thus I was
being annoyed by build regressions, so I fixed them.
Plus I don't think they would have worked at all, since the data values
were missing "&" and the outsl had a "3" instead of a "4" for the size.
From what I can tell, this stuff is not used at all, so this can't be
any more broken than it was before, anyway.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cae49ede00 upstream.
We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
If there wasn't another skb ready to transmit immediately, this led to a
double-free because we'd free it *again* next time we did have a packet to
send.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bf9b7bef8 upstream.
find_vma() is *not* safe when somebody else is removing vmas. Not just
the return value might get bogus just as you are getting it (this instance
doesn't try to dereference the resulting vma), the search itself can get
buggered in rather spectacular ways. IOW, ->mmap_sem really, really is
not optional here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 864aa04cd0 upstream.
When updating the page protection map after calculating the user_pgprot
value, the base protection map is temporarily stored in an unsigned long
type, causing truncation of the protection bits when LPAE is enabled.
This effectively means that calls to mprotect() will corrupt the upper
page attributes, clearing the XN bit unconditionally.
This patch uses pteval_t to store the intermediate protection values,
preserving the upper bits for 64-bit descriptors.
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 354e4aa391 ]
RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]
All TCP stacks MAY implement the following mitigation. TCP stacks
that implement this mitigation MUST add an additional input check to
any incoming segment. The ACK value is considered acceptable only if
it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
SND.NXT). All incoming segments whose ACK value doesn't satisfy the
above condition MUST be discarded and an ACK sent back.
Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd090dfc63 ]
We added support for RFC 5961 in latest kernels but TCP fails
to perform exhaustive check of ACK sequence.
We can update our view of peer tsval from a frame that is
later discarded by tcp_ack()
This makes timestamps enabled sessions vulnerable to injection of
a high tsval : peers start an ACK storm, since the victim
sends a dupack each time it receives an ACK from the other peer.
As tcp_validate_incoming() is called before tcp_ack(), we should
not peform tcp_replace_ts_recent() from it, and let callers do it
at the right time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Romain Francoise <romain@orebokech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e371589917 ]
Followup of commit 0c24604b68 (tcp: implement RFC 5961 4.2)
As reported by Vijay Subramanian, we should send a challenge ACK
instead of a dup ack if a SYN flag is set on a packet received out of
window.
This permits the ratelimiting to work as intended, and to increase
correct SNMP counters.
Suggested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c24604b68 ]
Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.
Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.
Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)
Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 282f23c6ee ]
Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.
Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)
If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.
Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.
Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45959ee7aa upstream.
When gcc inlines a function, it does not mark it with the mcount
prologue, which in turn means that inlined functions are not traced
by the function tracer. But if CONFIG_OPTIMIZE_INLINING is set, then
gcc is allowed not to inline a function that is marked inline.
Depending on the options and the compiler, a function may or may
not be traced by the function tracer, depending on whether gcc
decides to inline a function or not. This has caused several
problems in the pass becaues gcc is not always consistent with
what it decides to inline between different gcc versions.
Some places should not be traced (like paravirt native_* functions)
and these are mostly marked as inline. When gcc decides not to
inline the function, and if that function should not be traced, then
the ftrace function tracer will suddenly break when it use to work
fine. This becomes even harder to debug when different versions of
gcc will not inline that function, making the same kernel and config
work for some gcc versions and not work for others.
By making all functions marked inline to not be traced will remove
the ambiguity that gcc adds when it comes to tracing functions marked
inline. All gcc versions will be consistent with what functions are
traced and having volatile working code will be removed.
Note, only the inline macro when CONFIG_OPTIMIZE_INLINING is set needs
to have notrace added, as the attribute __always_inline will force
the function to be inlined and then not traced.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbf0a1427 upstream.
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04aa530ec0 upstream.
Sankara reported that the genirq core code fails to adjust the
affinity of an interrupt thread in several cases:
1) On request/setup_irq() the call to setup_affinity() happens before
the new action is registered, so the new thread is not notified.
2) For secondary shared interrupts nothing notifies the new thread to
change its affinity.
3) Interrupts which have the IRQ_NO_BALANCE flag set are not moving
the thread either.
Fix this by setting the thread affinity flag right on thread creation
time. This ensures that under all circumstances the thread moves to
the right place. Requires a check in irq_thread_check_affinity for an
existing affinity mask (CONFIG_CPU_MASK_OFFSTACK=y)
Reported-and-tested-by: Sankara Muthukrishnan <sankara.m@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1209041738200.2754@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e25fbe380c upstream.
The following null pointer check is broken.
*option = match_strdup(args);
return !option;
The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.
The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
cut-and-paste code.").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5f50b0c29 upstream.
If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f018458b3 upstream.
It is almost always wrong for NFS to call drop_nlink() after removing a
file. What we really want is to mark the inode's attributes for
revalidation, and we want to ensure that the VFS drops it if we're
reasonably sure that this is the final unlink().
Do the former using the usual cache validity flags, and the latter
by testing if inode->i_nlink == 1, and clearing it in that case.
This also fixes the following warning reported by Neil Brown and
Jeff Layton (among others).
[634155.004438] WARNING:
at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel
snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace:
[634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
[634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f
[634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
[634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40
[634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
[634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230
[634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230
[634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90
[634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100
[634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
[634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f259613a1e upstream.
In rare circumstances, nfs_clone_server() of a v2 or v3 server can get
an error between setting server->destory (to nfs_destroy_server), and
calling nfs_start_lockd (which will set server->nlm_host).
If this happens, nfs_clone_server will call nfs_free_server which
will call nfs_destroy_server and thence nlmclnt_done(NULL). This
causes the NULL to be dereferenced.
So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL.
The other guards there are irrelevant as nlm_host can only be non-NULL
if one of these flags are set - so remove those tests. (Thanks to Trond
for this suggestion).
This is suitable for any stable kernel since 2.6.25.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f5f64cf0c upstream.
At one point acpi_device_set_id() checks if acpi_device_hid(device)
returns NULL, but that never happens, so system bus devices with an
empty list of PNP IDs are given the dummy HID ("device") instead of
the "system bus HID" ("LNXSYBUS"). Fix the code to use the right
check.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5f165418c upstream.
The commit [88a8516a: ALSA: usbaudio: implement USB autosuspend] added
the support of autopm for USB MIDI output, but it didn't take the MIDI
input into account.
This patch adds the following for fixing the autopm:
- Manage the URB start at the first MIDI input stream open, instead of
the time of instance creation
- Move autopm code to the common substream_open()
- Make snd_usbmidi_input_start/_stop() more robust and add the running
state check
Reviewd-by: Clemens Ladisch <clemens@ladisch.de>
Tested-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2a07f40db upstream.
Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing. Very nasty. Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse. "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af35 "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.
mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect. Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).
I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL. I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad4b3fb7ff upstream.
Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
(PageHead) is true, for tail pages. If this is indeed the intended
behavior, which I doubt because it breaks cache cleaning on some ARM
systems, then the nomenclature is highly problematic.
This patch makes sure PageHead is only true for head pages and PageTail
is only true for tail pages, and neither is true for non-compound pages.
[ This buglet seems ancient - seems to have been introduced back in Apr
2008 in commit 6a1e7f777f: "pageflags: convert to the use of new
macros". And the reason nobody noticed is because the PageHead()
tests are almost all about just sanity-checking, and only used on
pages that are actual page heads. The fact that the old code returned
true for tail pages too was thus not really noticeable. - Linus ]
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <Will.Deacon@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b92b1b89a3 upstream.
Virtio devices may attempt to add descriptors to a virtqueue from atomic
context using GFP_ATOMIC allocation. This is problematic because such
allocations can fall outside of the lowmem mapping, causing virt_to_phys
to report bogus physical addresses which are subsequently passed to
userspace via the buffers for the virtual device.
This patch masks out __GFP_HIGH and __GFP_HIGHMEM from the requested
flags when allocating descriptors for a virtqueue. If an atomic
allocation is requested and later fails, we will return -ENOSPC which
will be handled by the driver.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b66c598401 upstream.
If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.
Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.
After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.
This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.
For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cdc87c5a30 upstream.
TEST_ALPHA() is broken and always returns 0.
[akpm@linux-foundation.org: return false for '@' as well, per Bjorn]
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit be364c8c0f ]
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e196c0e579 ]
Race between bonding_store_slaves_active() and slave manipulation
functions. The bond_for_each_slave use in bonding_store_slaves_active()
is not protected by any synchronization mechanism.
NULL pointer dereference is easy to reach.
Fixed by acquiring the bond->lock for the slave walk.
v2: Make description text < 75 columns
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00ca0de02f upstream.
When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.
If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.
This issue was found with the following testcase:
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b6ca2a6fe upstream.
Writing to dscr_default in sysfs doesn't actually change the DSCR -
we rely on a context switch on each CPU to do the work. There is no
guarantee we will get a context switch in a reasonable amount of time
so fire off an IPI to force an immediate change.
This issue was found with the following test case:
http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50ce5c0683 upstream.
This patch (as1636) is a partial workaround for a hardware bug
affecting OHCI controllers by NVIDIA at least, maybe others too. When
the controller retires a Transfer Descriptor, it is supposed to add
the TD onto the Done Queue. But sometimes this doesn't happen, with
the result that ohci-hcd never realizes the corresponding transfer has
finished. Symptoms can vary; a typical result is that USB audio stops
working after a while.
The patch works around the problem by recognizing that TDs are always
processed in order. Therefore, if a later TD is found on the Done
Queue than all the earlier TDs for the same endpoint must be finished
as well.
Unfortunately this won't solve the problem in cases where the missing
TD is the last one in the endpoint's queue. A complete fix would
require a signficant amount of change to the driver.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6b5e88c0e upstream.
During resume from system suspend the 'data' field of
struct pnp_dev in pnpacpi_set_resources() may be a stale pointer,
due to removal of the associated ACPI device node object in the
previous suspend-resume cycle. This happens, for example, if a
dockable machine is booted in the docking station and then suspended
and resumed and suspended again. If that happens,
pnpacpi_build_resource_template() called from pnpacpi_set_resources()
attempts to use that pointer and crashes.
However, pnpacpi_set_resources() actually checks the device's ACPI
handle, attempts to find the ACPI device node object attached to it
and returns an error code if that fails, so in fact it knows what the
correct value of dev->data should be. Use this observation to update
dev->data with the correct value if necessary and dump a call trace
if that's the case (once).
We still need to fix the root cause of this issue, but preventing
systems from crashing because of it is an improvement too.
Reported-and-tested-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=51071
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4000e62615 upstream.
Add a quirk to correctly report battery capacity on 2010 and 2011
Lenovo Thinkpad models.
The affected models that I tested (x201, t410, t410s, and x220)
exhibit a problem where, when battery capacity reporting unit is mAh,
the values being reported are wrong. Pre-2010 and 2012 models appear
to always report in mWh and are thus unaffected. Also, in mid-2012
Lenovo issued a BIOS update for the 2011 models that fixes the issue
(tested on x220 with a post-1.29 BIOS). No such update is available
for the 2010 models, so those still need this patch.
Problem description: for some reason, the affected Thinkpads switch
the reporting unit between mAh and mWh; generally, mAh is used when a
laptop is plugged in and mWh when it's unplugged, although a
suspend/resume or rmmod/modprobe is needed for the switch to take
effect. The values reported in mAh are *always* wrong. This does
not appear to be a kernel regression; I believe that the values were
never reported correctly. I tested back to kernel 2.6.34, with
multiple machines and BIOS versions.
Simply plugging a laptop into mains before turning it on is enough to
reproduce the problem. Here's a sample /proc/acpi/battery/BAT0/info
from Thinkpad x220 (before a BIOS update) with a 4-cell battery:
present: yes
design capacity: 2886 mAh
last full capacity: 2909 mAh
battery technology: rechargeable
design voltage: 14800 mV
design capacity warning: 145 mAh
design capacity low: 13 mAh
cycle count: 0
capacity granularity 1: 1 mAh
capacity granularity 2: 1 mAh
model number: 42T4899
serial number: 21064
battery type: LION
OEM info: SANYO
Once the laptop switches the unit to mWh (unplug from mains, suspend,
resume), the output changes to:
present: yes
design capacity: 28860 mWh
last full capacity: 29090 mWh
battery technology: rechargeable
design voltage: 14800 mV
design capacity warning: 1454 mWh
design capacity low: 200 mWh
cycle count: 0
capacity granularity 1: 1 mWh
capacity granularity 2: 1 mWh
model number: 42T4899
serial number: 21064
battery type: LION
OEM info: SANYO
Can you see how the values for "design capacity", etc., differ by a
factor of 10 instead of 14.8 (the design voltage of this battery)?
On the battery itself it says: 14.8V, 1.95Ah, 29Wh, so clearly the
values reported in mWh are correct and the ones in mAh are not.
My guess is that this problem has been around ever since those
machines were released, but because the most common Thinkpad
batteries are rated at 10.8V, the error (8%) is small enough that it
simply hasn't been noticed or at least nobody could be bothered to
look into it.
My patch works around the problem by adjusting the incorrectly
reported mAh values by "10000 / design_voltage". The patch also has
code to figure out if it should be activated or not. It only
activates on Lenovo Thinkpads, only when the unit is mAh, and, as an
extra precaution, only when the battery capacity reported through
ACPI does not match what is reported through DMI (I've never
encountered a machine where the first two conditions would be true
but the last would not, but better safe than sorry).
I've been using this patch for close to a year on several systems
without any problems.
References: https://bugzilla.kernel.org/show_bug.cgi?id=41062
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7e14b375b upstream.
The Newport AGILIS model AG-UC8 compact piezo motor controller
(http://search.newport.com/?q=*&x2=sku&q2=AG-UC8)
is yet another device using an FTDI USB-to-serial chip. It works
fine with the ftdi_sio driver when adding
options ftdi-sio product=0x3000 vendor=0x104d
to modprobe.d. udevadm reports "Newport" as the manufacturer,
and "Agilis" as the product name.
Signed-off-by: Martin Teichmann <lkb.teichmann@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f36446cf9b upstream.
The Huawei E173 will normally appear as 12d1:1436 in Linux. But
the modem has another mode with different device ID and a slightly
different set of descriptors. This is the mode used by Windows like
this:
3Modem: USB\VID_12D1&PID_140C&MI_00\6&3A1D2012&0&0000
Networkcard: USB\VID_12D1&PID_140C&MI_01\6&3A1D2012&0&0001
Appli.Inter: USB\VID_12D1&PID_140C&MI_02\6&3A1D2012&0&0002
PC UI Inter: USB\VID_12D1&PID_140C&MI_03\6&3A1D2012&0&0003
All interfaces have the same ff/ff/ff class codes in this mode.
Blacklisting the network interface to allow it to be picked up by
the network driver.
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18a2f371f5 upstream.
This fixes a regression in 3.7-rc, which has since gone into stable.
Commit 00442ad04a ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().
Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a,
alloc_pages_vma() has kept refcount in balance, so now no problem.
Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Not needed in 3.8 or newer as this driver is removed there. - gregkh]
We get this from user space and nothing has been done to ensure that
these strings are NUL terminated.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 387870f2d6 upstream.
dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.
This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.
Reported-by: Soeren Moch <smoch@web.de>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Soeren Moch <smoch@web.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc4b514f27 upstream.
8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue timer on
0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
megaraid - it allocated work_struct, casted it to delayed_work and
then pass that into queue_delayed_work().
Previously, this was okay because 0 @delay short-circuited to
queue_work() before doing anything with delayed_work. 8852aac25e
moved 0 @delay test into __queue_delayed_work() after sanity check on
delayed_work making megaraid trigger BUG_ON().
Although megaraid is already fixed by c1d390d8e6 ("megaraid: fix
BUG_ON() from incorrect use of delayed work"), this patch converts
BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
abusers, if there are more, trigger warning but don't crash the
machine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e69b742a67 upstream.
gcc (rightfully) complains that we are accessing beyond the
end of the fpr array (we do, to access the fpscr).
The only sane thing to do (whether anything in that code can be
called remotely sane is debatable) is to special case fpscr and
handle it as a separate statement.
I initially tried to do it it by making the array access conditional
to index < PT_FPSCR and using a 3rd else leg but for some reason gcc
was unable to understand it and still spewed the warning.
So I ended up with something a tad more intricated but it seems to
build on 32-bit and on 64-bit with and without VSX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39141ddfb6 upstream.
After commit 846a136881 ("ARM: vfp: fix
saving d16-d31 vfp registers on v6+ kernels"), the OMAP 2430SDP board
started crashing during boot with omap2plus_defconfig:
[ 3.875122] mmcblk0: mmc0:e624 SD04G 3.69 GiB
[ 3.915954] mmcblk0: p1
[ 4.086639] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[ 4.093719] Modules linked in:
[ 4.096954] CPU: 0 Not tainted (3.6.0-02232-g759e00b #570)
[ 4.103149] PC is at vfp_reload_hw+0x1c/0x44
[ 4.107666] LR is at __und_usr_fault_32+0x0/0x8
It turns out that the context save/restore fix unmasked a latent bug
in commit 5aaf254409 ("ARM: 6203/1: Make
VFPv3 usable on ARMv6"). When CONFIG_VFPv3 is set, but the kernel is
booted on a pre-VFPv3 core, the code attempts to save and restore the
d16-d31 VFP registers. These are only present on non-D16 VFPv3+, so
this results in an undefined instruction exception. The code didn't
crash before commit 846a136 because the save and restore code was
only touching d0-d15, present on all VFP.
Fix by implementing a request from Russell King to add a new HWCAP
flag that affirmatively indicates the presence of the d16-d31
registers:
http://marc.info/?l=linux-arm-kernel&m=135013547905283&w=2
and some feedback from Måns to clarify the name of the HWCAP flag.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@linaro.org>
Cc: Måns Rullgård <mans.rullgard@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d93592807 upstream.
Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.
Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
CC: James Bottomley <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a51d4ed01e upstream.
This board is incorrectly detected as having an LVDS connector,
resulting in the VGA output (the only available output on the board)
showing the console only in the top-left 1024x768 pixels, and an extra
LVDS connector appearing in X.
It's a desktop Mini-ITX board using an Atom D525 CPU with an NM10
chipset.
I've had this board for about a year, but this is the first time I
noticed the issue because I've been running it headless for most of its
life.
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-3.0 commit 42ab5316 (ipv4: fix redirect handling) was
backport of mainline commit 9cc20b26 from 3.2-rc3 where hh
member of struct dst_entry was already gone.
However, in 3.0 we still have it and we have to clean it as
well, otherwise it keeps pointing to the cleaned up (and
unusable) hh_cache entry and packets cannot be sent out.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d356cf5a74 upstream.
PMU interrupts start at IRQ_DOVE_PMU_START, not IRQ_DOVE_PMU_START + 1.
Fix the condition. (It may have been less likely to occur had the code
been written "if (irq >= IRQ_DOVE_PMU_START" which imho is the easier
to understand notation, and matches the normal way of thinking about
these things.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d3df93542 upstream.
Fix the acknowledgement of PMU interrupts on Dove: some Dove hardware
has not been sensibly designed so that interrupts can be handled in a
race free manner. The PMU is one such instance.
The pending (aka 'cause') register is a bunch of RW bits, meaning that
these bits can be both cleared and set by software (confirmed on the
Armada-510 on the cubox.)
Hardware sets the appropriate bit when an interrupt is asserted, and
software is required to clear the bits which are to be processed. If
we write ~(1 << bit), then we end up asserting every other interrupt
except the one we're processing. So, we need to do a read-modify-write
cycle to clear the asserted bit.
However, any interrupts which occur in the middle of this cycle will
also be written back as zero, which will also clear the new interrupts.
The upshot of this is: there is _no_ way to safely clear down interrupts
in this register (and other similarly behaving interrupt pending
registers on this device.) The patch below at least stops us creating
new interrupts.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52965cc012 upstream.
Some bcm5974 trackpads have a physical button beneath the physical surface.
This patch sets the property bit so user space applications can detect the
trackpad type and act accordingly.
Signed-off-by: Jussi Pakkanen <jussi.pakkanen@canonical.com>
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29e9bf1841 upstream.
Thermal throttle and power limit events are not defined as MCE errors in x86
architecture and should not generate MCE errors in mcelog.
Current kernel generates fake software defined MCE errors for these events.
This may confuse users because they may think the machine has real MCE errors
while actually only thermal throttle or power limit events happen.
To make it worse, buggy firmware on some platforms may falsely generate
the events. Therefore, kernel reports MCE errors which users think as real
hardware errors. Although the firmware bugs should be fixed, on the other hand,
kernel should not report MCE errors either.
So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count,
package_power_limit_count, core_throttle_count, and package_throttle_count) in
/sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters
for each event on each CPU. Please note that all CPU's on one package report
duplicate counters. It's user application's responsibity to retrieve a package
level counter for one package.
This patch doesn't report package level power limit, core level power limit, and
package level thermal throttle events in mcelog. When the events happen, only
report them in respective counters in sysfs.
Since core level thermal throttle has been legacy code in kernel for a while and
users accepted it as MCE error in mcelog, core level thermal throttle is still
reported in mcelog. In the mean time, the event is counted in a counter in sysfs
as well.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Borislav Petkov <bp@amd64.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ffeb9b0e6 upstream.
In get_sample_period(), unsigned long is not enough:
watchdog_thresh * 2 * (NSEC_PER_SEC / 5)
case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800
case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000
In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5260e458f5 upstream.
Make sure generic close is called at close.
The driver relies on the generic write implementation but did not call
generic close.
Note that the call to kill the read urb is not redundant, as mct_u232
uses an interrupt urb from the second port as the read urb and that
generic close therefore fails to kill it.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b03e66a6be upstream.
If kdump is triggered with pending IO, controller may not respond causing
kdump to fail.
http://marc.info/?l=linux-ide&m=133032255424658&w=2
During error recovery ata_do_dev_read_id never completes due hang
in mmio_insw.
ata_do_dev_read_id
ata_sff_data_xfer
ioread16_rep
mmio_insw
if DMA start bit is cleared before reset, PIO command is successful
and kdump succeeds.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6fdd8e5d0 upstream.
The delayed work function int_in_work() may call usb_reset_device()
and thus, indirectly, the driver's pre_reset method. Trying to
cancel the work synchronously in that situation would deadlock.
Fix by avoiding cancel_work_sync() in the pre_reset method.
If the reset was NOT initiated by int_in_work() this might cause
int_in_work() to run after the post_reset method, with urb_int_in
already resubmitted, so handle that case gracefully.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fae2ae2a90 upstream.
If a signal handler is executed on altstack and another signal comes,
we will end up with rt_sigreturn() on return from the second handler
getting -EPERM from do_sigaltstack(). It's perfectly OK, since we
are not asking to change the settings; in fact, they couldn't have been
changed during the second handler execution exactly because we'd been
on altstack all along. 64bit sigreturn on sparc treats any error from
do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics
we are using on other architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25389bb207 upstream.
Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.
Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81b401100c upstream.
Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45171002b0 upstream.
The Intel 82855PM host bridge / Mobility FireGL 9000 RV250 combination
in an (outdated) ThinkPad T41 needs AGPMode 1 for suspend/resume (under
KMS, that is). So add a quirk for it.
(Change R250 to RV250 in comment for preceding quirk too.)
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b78a4932f5 upstream.
The check whether the IBSS is active and can be removed should be
performed before deinitializing the fields used for the check/search.
Otherwise, the configured BSS will not be found and removed properly.
To make it more clear for the future, rename sdata->u.ibss to the
local pointer ifibss which is used within the checks.
This behaviour was introduced by
f3209bea11
("mac80211: fix IBSS teardown race")
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Ignacy Gawedzki <i@lri.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa10990e02 upstream.
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8c32a5c98 upstream.
Request based dm attempts to re-run the request queue off the
request completion path. If used with a driver that potentially does
end_io from its request_fn, we could deadlock trying to recurse
back into request dispatch. Fix this by punting the request queue
run to kblockd.
Tested to fix a quickly reproducible deadlock in such a scenario.
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 949a05d034 upstream.
On Thu, 2012-11-01 at 16:45 -0700, Michel Lespinasse wrote:
> Looking at the arch/parisc/kernel/sys_parisc.c implementation of
> get_shared_area(), I do have a concern though. The function basically
> ignores the pgoff argument, so that if one creates a shared mapping of
> pages 0-N of a file, and then a separate shared mapping of pages 1-N
> of that same file, both will have the same cache offset for their
> starting address.
>
> This looks like this would create obvious aliasing issues. Am I
> misreading this ? I can't understand how this could work good enough
> to be undetected, so there must be something I'm missing here ???
This turns out to be correct and we need to pay attention to the pgoff as
well as the address when creating the virtual address for the area.
Fortunately, the bug is rarely triggered as most applications which use pgoff
tend to use large values (git being the primary one, and it uses pgoff in
multiples of 16MB) which are larger than our cache coherency modulus, so the
problem isn't often seen in practise.
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e99ddfde6a upstream.
Commit 88a8516a21 (ALSA: usbaudio: implement USB autosuspend) added
autosuspend code to all files making up the snd-usb-audio driver.
However, midi.c is part of snd-usb-lib and is also used by other
drivers, not all of which support autosuspend. Thus, calls to
usb_autopm_get_interface() could fail, and this unexpected error would
result in the MIDI output being completely unusable.
Make it work by ignoring the error that is expected with drivers that do
not support autosuspend.
Reported-by: Colin Fletcher <colin.m.fletcher@googlemail.com>
Reported-by: Devin Venable <venable.devin@gmail.com>
Reported-by: Dr Nick Bailey <nicholas.bailey@glasgow.ac.uk>
Reported-by: Jannis Achstetter <jannis_achstetter@web.de>
Reported-by: Rui Nuno Capela <rncbc@rncbc.org>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49bd665c54 upstream.
SATA MICROCODE DOWNALOAD fails on isci driver. After receiving Register
Device to Host (FIS 0x34) frame Initiator resets phy.
In the frame handler routine response (FIS 0x34) was copied into wrong
buffer and upper layer did not receive any answer which resulted in
timeout and reset.
This patch corrects this bug.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd321acddc upstream.
When host_sleep_config command fails we should return error to
MMC core to indicate the failure for our device.
The misspelled variable is also removed as it's redundant.
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 450faacc62 upstream.
Documentation/networking/ifenslave.c: In function ‘if_getconfig’:
Documentation/networking/ifenslave.c:508:14: warning: variable ‘mtu’ set but not used [-Wunused-but-set-variable]
Documentation/networking/ifenslave.c:508:6: warning: variable ‘metric’ set but not used [-Wunused-but-set-variable]
The purpose of this function is to simply print out the values
it probes, so...
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c718a54649 upstream.
Fix several -Wuninitialized compiler warnings by changing the
order of getting modedb in riva_update_default_var() to set
first the fallback and then the prefered timing.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cecd353a02 upstream.
Set CommandMailbox with memset before use it. Fix for:
drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’:
arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’
may be used uninitialized in this function [-Wuninitialized]
drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’
was declared here
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bca505f109 upstream.
Fixed compiler warning:
comparison between ‘DAC960_V2_IOCTL_Opcode_T’ and ‘enum <anonymous>’
Renamed enum, added a new enum for SCSI_10.CommandOpcode in
DAC960_V2_ProcessCompletedCommand().
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47ea91b405 upstream.
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window. This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.
__find_resource() looks for gaps in resource allocation among the
children resource windows. When it encounters the last child window it
blindly tries the range next to one allocated to the last child. Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation. This leads to a conflicting
window allocation.
Michal Ludvig reported this issue seen on his platform. The following
patch fixes the problem and has been verified by Michal. I believe this
bug has been there for ages. It got exposed by git commit 2bbc694227
("PCI : ability to relocate assigned pci-resources")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michal Ludvig <mludvig@logix.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4ac9fea01 upstream.
During debug of one SRIOV enabled hotplug device, we found found that
add_size is not passed properly.
The device has devices under two level bridges:
+-[0000:80]-+-00.0-[81-8f]--
| +-01.0-[90-9f]--
| +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0 Oracle Corporation Device
| | \-03.0-[a3]--+-00.0 Oracle Corporation Device
Which means later the parent bridge will not try to add a big enough range:
[ 557.455077] pci 0000:a0:00.0: BAR 14: assigned [mem 0xf9000000-0xf93fffff]
[ 557.461974] pci 0000:a0:00.0: BAR 15: assigned [mem 0xf6000000-0xf61fffff pref]
[ 557.469340] pci 0000:a1:02.0: BAR 14: assigned [mem 0xf9000000-0xf91fffff]
[ 557.476231] pci 0000:a1:02.0: BAR 15: assigned [mem 0xf6000000-0xf60fffff pref]
[ 557.483582] pci 0000:a1:03.0: BAR 14: assigned [mem 0xf9200000-0xf93fffff]
[ 557.490468] pci 0000:a1:03.0: BAR 15: assigned [mem 0xf6100000-0xf61fffff pref]
[ 557.497833] pci 0000:a1:03.0: BAR 14: can't assign mem (size 0x200000)
[ 557.504378] pci 0000:a1:03.0: failed to add optional resources res=[mem 0xf9200000-0xf93fffff]
[ 557.513026] pci 0000:a1:02.0: BAR 14: can't assign mem (size 0x200000)
[ 557.519578] pci 0000:a1:02.0: failed to add optional resources res=[mem 0xf9000000-0xf91fffff]
It turns out we did not calculate size1 properly.
static resource_size_t calculate_memsize(resource_size_t size,
resource_size_t min_size,
resource_size_t size1,
resource_size_t old_size,
resource_size_t align)
{
if (size < min_size)
size = min_size;
if (old_size == 1 )
old_size = 0;
if (size < old_size)
size = old_size;
size = ALIGN(size + size1, align);
return size;
}
We should not pass add_size with min_size in calculate_memsize since
that will make add_size not contribute final add_size.
So just pass add_size with size1 to calculate_memsize().
With this change, we should have chance to remove extra addon in
pci_reassign_resource.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbc694227 upstream.
Currently pci-bridges are allocated enough resources to satisfy their immediate
requirements. Any additional resource-requests fail if additional free space,
contiguous to the one already allocated, is not available. This behavior is not
reasonable since sufficient contiguous resources, that can satisfy the request,
are available at a different location.
This patch provides the ability to expand and relocate a allocated resource.
v2: Changelog: Fixed size calculation in pci_reassign_resource()
v3: Changelog : Split this patch. The resource.c changes are already
upstream. All the pci driver changes are in here.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361d94a338 upstream.
Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7af1168693 upstream.
Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e06ef2e8 upstream.
In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bb3e1fc47 upstream.
When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 399f11c3d8 upstream.
Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception. This works for most cases, but causes
a hang on the following test case:
Client Server
------ ------
Open file over NFS v4.1
Write to file
Expire client
Try to lock file
The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery. However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled. The
simplest solution is to wait for session recovery to run before retrying
the lock.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9193983f4 upstream.
The overlay on the i830M has a peculiar failure mode: It works the
first time around after boot-up, but consistenly hangs the second time
it's used.
Chris Wilson has dug out a nice errata:
"1.5.12 Clock Gating Disable for Display Register
Address Offset: 06200h–06203h
"Bit 3
Ovrunit Clock Gating Disable.
0 = Clock gating controlled by unit enabling logic
1 = Disable clock gating function
DevALM Errata ALM049: Overlay Clock Gating Must be Disabled: Overlay
& L2 Cache clock gating must be disabled in order to prevent device
hangs when turning off overlay.SW must turn off Ovrunit clock gating
(6200h) and L2 Cache clock gating (C8h)."
Now I've nowhere found that 0xc8 register and hence couldn't apply the
l2 cache workaround. But I've remembered that part of the magic that
the OVERLAY_ON/OFF commands are supposed to do is to rearrange cache
allocations so that the overlay scaler has some scratch space.
And while pondering how that could explain the hang the 2nd time we
enable the overlay, I've remembered that the old ums overlay code did
_not_ issue the OVERLAY_OFF cmd.
And indeed, disabling the OFF cmd results in the overlay working
flawlessly, so I guess we can workaround the lack of the above
workaround by simply never disabling the overlay engine once it's
enabled.
Note that we have the first part of the above w/a already implemented
in i830_init_clock_gating - leave that as-is to avoid surprises.
v2: Add a comment in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47827
Tested-by: Rhys <rhyspuk@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2:
- Adjust context
- s/intel_ring_emit(ring, /OUT_RING(/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 069ddcda37 upstream.
When the eCryptfs mount options do not include '-o acl', but the lower
filesystem's mount options do include 'acl', the MS_POSIXACL flag is not
flipped on in the eCryptfs super block flags. This flag is what the VFS
checks in do_last() when deciding if the current umask should be applied
to a newly created inode's mode or not. When a default POSIX ACL mask is
set on a directory, the current umask is incorrectly applied to new
inodes created in the directory. This patch ignores the MS_POSIXACL flag
passed into ecryptfs_mount() and sets the flag on the eCryptfs super
block depending on the flag's presence on the lower super block.
Additionally, it is incorrect to allow a writeable eCryptfs mount on top
of a read-only lower mount. This missing check did not allow writes to
the read-only lower mount because permissions checks are still performed
on the lower filesystem's objects but it is best to simply not allow a
rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw
mount is valid and still allowed.
https://launchpad.net/bugs/1009207
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Stefan Beller <stefanbeller@googlemail.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0658a3366d upstream.
The use of kfree(serial) in error cases of usb_serial_probe
was invalid - usb_serial structure allocated in create_serial()
gets reference of usb_device that needs to be put, so we need
to use usb_serial_put() instead of simple kfree().
Signed-off-by: Jan Safrata <jan.nikitenko@gmail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Cc: Richard Retanubun <richardretanubun@ruggedcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38fe36a248 upstream.
ICMP tuples have id in src and type/code in dst.
So comparing src.u.all with dst.u.all will always fail here
and ip_xfrm_me_harder() is called for every ICMP packet,
even if there was no NAT.
Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64f509ce71 upstream.
Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.
See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1e0d8b70f upstream.
The correct syntax for gcc -x is "gcc -x assembler", not
"gcc -xassembler". Even though the latter happens to work, the former
is what is documented in the manual page and thus what gcc wrappers
such as icecream do expect.
This isn't a cosmetic change. The missing space prevents icecream from
recognizing compilation tasks it can't handle, leading to silent kernel
miscompilations.
Besides me, credits go to Michael Matz and Dirk Mueller for
investigating the miscompilation issue and tracking it down to this
incorrect -x parameter syntax.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aee77e4acc upstream.
The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
a box where this prevents the interface from using the gigabit line to its full
potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.
The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
(XID 0c900880). TSO is enabled.
I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
'7'=unlimited). This chart shows the results:
http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png
Interesting points:
- With the current DMA burst limit (1024):
- at the default MTU=1500 I get only 842 Mbit/s.
- when going from small MTU, the performance rises monotonically with
increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
a sudden drop to 762 MBit/s from which the throughput rises monotonically
again with further MTU increases.
- With a smaller DMA burst limit (512):
- there's a similar peak at MTU=1076 and another one at MTU=564.
- With unlimited DMA burst:
- at the default MTU=1500 I get nice 940 Mbit/s.
- the throughput rises monotonically with increasing MTU with no strange
peaks.
Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
limit plus 52. Why 52? Because:
20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52
The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
the smaller burst limit, or if any other versions have similar requirements.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a652208e0b ]
Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c9f79be29 ]
(1<<optname) is undefined behavior in C with a negative optname or
optname larger than 31. In those cases the result of the shift is
not necessarily zero (e.g., on x86).
This patch simplifies the code with a switch statement on optname.
It also allows the compiler to generate better code (e.g., using a
64-bit mask).
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34fa78b59c upstream.
The sigaddset/sigdelset/sigismember functions that are implemented with
bitfield insn cannot allow the sigset argument to be placed in a data
register since the sigset is wider than 32 bits. Remove the "d"
constraint from the asm statements.
The effect of the bug is that sending RT signals does not work, the signal
number is truncated modulo 32.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d55c4c613f upstream.
When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a28ad42a4a upstream.
This is a bugfix for a problem with the following symptoms:
1. A power cut happens
2. After reboot, we try to mount UBIFS
3. Mount fails with "No space left on device" error message
UBIFS complains like this:
UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB
The root cause of this problem is that when we mount, not all LEBs are
categorized. Only those which were read are. However, the
'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were
categorized and 'c->freeable_cnt' is valid, which is a false assumption.
This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()'
to always fall back to LPT scanning if no freeable LEBs were found.
This problem was reported by few people in the past, but Brent Taylor
was able to reproduce it and send me a flash image which cannot be mounted,
which made it easy to hunt the bug. Kudos to Brent.
Reported-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 445632ad6d upstream.
DAPM shutdown incorrectly uses "list" field of codec struct while
iterating over probed components (codec_dev_list). "list" field
refers to codecs registered in the system, "card_list" field is
used for probed components.
Signed-off-by: Misael Lopez Cruz <misael.lopez@ti.com>
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55c6f4cb6e upstream.
When MCLK is supplied externally and BCLK and LRC are configured as outputs
(codec is master), the PLL values are only calculated correctly on the first
transmission. On subsequent transmissions, at differenct sample rates, the
wrong PLL values are used. Test for f_opclk instead of f_pllout to determine
if the PLL values are needed.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9efade1b3e upstream.
cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
However cryptd_enqueue_request might be called from softirq context,
so add local_bh_disable/local_bh_enable to prevent data corruption and
panics.
Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2
v2:
- Disable software interrupts instead of hardware interrupts
Reported-by: Gurucharan Shetty <gurucharan.shetty@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0a8cc58e6 upstream.
In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.
In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.
Signed-off-by: Takamori Yamaguchi <takamori.yamaguchi@jp.sony.com>
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10e44239f6 upstream.
The recent change for USB-audio disconnection race fixes introduced a
mutex deadlock again. There is a circular dependency between
chip->shutdown_rwsem and pcm->open_mutex, depicted like below, when a
device is opened during the disconnection operation:
A. snd_usb_audio_disconnect() ->
card.c::register_mutex ->
chip->shutdown_rwsem (write) ->
snd_card_disconnect() ->
pcm.c::register_mutex ->
pcm->open_mutex
B. snd_pcm_open() ->
pcm->open_mutex ->
snd_usb_pcm_open() ->
chip->shutdown_rwsem (read)
Since the chip->shutdown_rwsem protection in the case A is required
only for turning on the chip->shutdown flag and it doesn't have to be
taken for the whole operation, we can reduce its window in
snd_usb_audio_disconnect().
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e7abe2556 upstream.
When unbinding a device so that I could pass it through to a KVM VM, I
got the lockdep report below. It looks like a legitimate lock
ordering problem:
- domain_context_mapping_one() takes iommu->lock and calls
iommu_support_dev_iotlb(), which takes device_domain_lock (inside
iommu->lock).
- domain_remove_one_dev_info() starts by taking device_domain_lock
then takes iommu->lock inside it (near the end of the function).
So this is the classic AB-BA deadlock. It looks like a safe fix is to
simply release device_domain_lock a bit earlier, since as far as I can
tell, it doesn't protect any of the stuff accessed at the end of
domain_remove_one_dev_info() anyway.
BTW, the use of device_domain_lock looks a bit unsafe to me... it's
at least not obvious to me why we aren't vulnerable to the race below:
iommu_support_dev_iotlb()
domain_remove_dev_info()
lock device_domain_lock
find info
unlock device_domain_lock
lock device_domain_lock
find same info
unlock device_domain_lock
free_devinfo_mem(info)
do stuff with info after it's free
However I don't understand the locking here well enough to know if
this is a real problem, let alone what the best fix is.
Anyway here's the full lockdep output that prompted all of this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39.1+ #1
-------------------------------------------------------
bash/13954 is trying to acquire lock:
(&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
but task is already holding lock:
(device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (device_domain_lock){-.-...}:
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
[<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
[<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
[<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
[<ffffffff81cab204>] pci_iommu_init+0x16/0x41
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
[<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
-> #0 (&(&iommu->lock)->rlock){......}:
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
6 locks held by bash/13954:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
#1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
#2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
#3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
#4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
#5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
stack backtrace:
Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
Call Trace:
[<ffffffff810993a7>] print_circular_bug+0xf7/0x100
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ce377afd1 upstream.
Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.
Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix warning about unused variable introduced by commit e681b66f2e
("USB: mos7840: remove invalid disconnect handling") upstream.
A subsequent fix which removed the disconnect function got rid of the
warning but that one was only backported to v3.6.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6e0e543f7 upstream.
Like in the case of native hdmi, which is fixed already in
commit adf00b26d1
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Tue Sep 25 13:23:34 2012 -0300
drm/i915: make sure we write all the DIP data bytes
we need to clear the entire sdvo buffer to avoid upsetting the
display.
Since infoframe buffer writing is now a bit more elaborate, extract it
into it's own function. This will be useful if we ever get around to
properly update the ELD for sdvo. Also #define proper names for the
two buffer indexes with fixed usage.
v2: Cite the right commit above, spotted by Paulo Zanoni.
v3: I'm too stupid to paste the right commit.
v4: Ben Hutchings noticed that I've failed to handle an underflow in
my loop logic, breaking it for i >= length + 8. Since I've just lost C
programmer license, use his solution. Also, make the frustrated 0-base
buffer size a notch more clear.
Reported-and-tested-by: Jürg Billeter <j@bitron.ch>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81014b9d0b upstream.
At least the worst offenders:
- SDVO specifies that the encoder should compute the ecc. Testing also
shows that we must not send the ecc field, so copy the dip_infoframe
struct to a temporay place and avoid the ecc field. This way the avi
infoframe is exactly 17 bytes long, which agrees with what the spec
mandates as a minimal storage capacity (with the ecc field it would
be 18 bytes).
- Only 17 when sending the avi infoframe. The SDVO spec explicitly
says that sending more data than what the device announces results
in undefined behaviour.
- Add __attribute__((packed)) to the avi and spd infoframes, for
otherwise they're wrongly aligned. Noticed because the avi infoframe
ended up being 18 bytes large instead of 17. We haven't noticed this
yet because we don't use the uint16_t fields yet (which are the only
ones that would be wrongly aligned).
This regression has been introduce by
3c17fe4b8f is the first bad commit
commit 3c17fe4b8f
Author: David Härdeman <david@hardeman.nu>
Date: Fri Sep 24 21:44:32 2010 +0200
i915: enable AVI infoframe for intel_hdmi.c [v4]
Patch tested on my g33 with a sdvo hdmi adaptor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Tested-by: Peter Ross <pross@xvid.org> (G35 SDVO-HDMI)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59fa624519 upstream.
Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
The detailed problem analysis shows:
Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Succeeds due to the check for F's userspace TID field == 0
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes T2 via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains real ownership of the
pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T3 --> observes inconsistent state
This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.
So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:
if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.
AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.
Now the proposed change
- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+ if (unlikely(ownerdied ||
+ !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.
What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state->mutex). Probelm solved!
Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.
Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.
lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.
So the above scenario changes to:
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Finds pi_state and blocks on pi_state->rt_mutex
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes it via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains ownership of the pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.
Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.
As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)
Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 60713a0ca7 ]
As documented in RFC4861 (Neighbor Discovery for IP version 6) 7.2.6.,
unsolicited neighbour advertisements should be sent to the all-nodes
multicast address.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 789336360e ]
When creating an L2TPv3 Ethernet session, if register_netdev() should fail for
any reason (for example, automatic naming for "l2tpeth%d" interfaces hits the
32k-interface limit), the netdev is freed in the error path. However, the
l2tp_eth_sess structure's dev pointer is left uncleared, and this results in
l2tp_eth_delete() then attempting to unregister the same netdev later in the
session teardown. This results in an oops.
To avoid this, clear the session dev pointer in the error path.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8f363b77ee ]
Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.
The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)
Steps to Reproduce:
1. Register tcp_illinois:
# sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
# watch -d ss -i
3. Establish new TCP conn to machine
Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.
This is only related to reading stats. The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params(). Thus, simply fix tcp_illinois_info().
Function tcp_illinois_info() / get_info() is called without
socket lock. Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable. Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.
Cc: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 39707c2a3b ]
Driver anchors the tx urbs and defers the urb submission if
a transmit request comes when the interface is suspended.
Anchoring urb increments the urb reference count. These
deferred urbs are later accessed by calling usb_get_from_anchor()
for submission during interface resume. usb_get_from_anchor()
unanchors the urb but urb reference count remains same.
This causes the urb reference count to remain non-zero
after usb_free_urb() gets called and urb never gets freed.
Hence call usb_put_urb() after anchoring the urb to properly
balance the reference count for these deferred urbs. Also,
unanchor these deferred urbs during disconnect, to free them
up.
Signed-off-by: Hemant Kumar <hemantk@codeaurora.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14edd87dc6 ]
Commit a02e4b7dae4551(Demark default hoplimit as zero) only changes the
hoplimit checking condition and default value in ip6_dst_hoplimit, not
zeros all hoplimit default value.
Keep the zeroing ip6_template_metrics[RTAX_HOPLIMIT - 1] to force it as
const, cause as a37e6e344910(net: force dst_default_metrics to const
section)
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3374c42aa ]
tcp_ioctl() tries to take into account if tcp socket received a FIN
to report correct number bytes in receive queue.
But its flaky because if the application ate the last skb,
we return 1 instead of 0.
Correct way to detect that FIN was received is to test SOCK_DONE.
Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d772ac557 ]
On some suspend/resume operations involving wimax device, we have
noticed some intermittent memory corruptions in netlink code.
Stéphane Marchesin tracked this corruption in netlink_update_listeners()
and suggested a patch.
It appears netlink_release() should use kfree_rcu() instead of kfree()
for the listeners structure as it may be used by other cpus using RCU
protection.
netlink_release() must set to NULL the listeners pointer when
it is about to be freed.
Also have to protect netlink_update_listeners() and
netlink_has_listeners() if listeners is NULL.
Add a nl_deref_protected() lockdep helper to properly document which
locks protects us.
Reported-by: Jonathan Kliegman <kliegs@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stéphane Marchesin <marcheu@google.com>
Cc: Sam Leffler <sleffler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0914f7961b upstream.
When disconnect callback is called, each component should wake up
sleepers and check card->shutdown flag for avoiding the endless sleep
blocking the proper resource release.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0830dbd4e upstream.
For more strict protection for wild disconnections, a refcount is
introduced to the card instance, and let it up/down when an object is
referred via snd_lookup_*() in the open ops.
The free-after-last-close check is also changed to check this refcount
instead of the empty list, too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34f3c89fda upstream.
Replace mutex with rwsem for codec->shutdown protection so that
concurrent accesses are allowed.
Also add the protection to snd_usb_autosuspend() and
snd_usb_autoresume(), too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 978520b75f upstream.
Close some races at disconnection of a USB audio device by adding the
chip->shutdown_mutex and chip->shutdown check at appropriate places.
The spots to put bandaids are:
- PCM prepare, hw_params and hw_free
- where the usb device is accessed for communication or get speed, in
mixer.c and others; the device speed is now cached in subs->speed
instead of accessing to chip->dev
The accesses in PCM open and close don't need the mutex protection
because these are already handled in the core PCM disconnection code.
The autosuspend/autoresume codes are still uncovered by this patch
because of possible mutex deadlocks. They'll be covered by the
upcoming change to rwsem.
Also the mixer codes are untouched, too. These will be fixed in
another patch, too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b0573c07f upstream.
Fix races at PCM disconnection:
- while a PCM device is being opened or closed
- while the PCM state is being changed without lock in prepare,
hw_params, hw_free ops
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3300fb4f88 upstream.
Don't assume bank 0 is selected at device probe time. This may not be
the case. Force bank selection at first register access to guarantee
that we read the right registers upon driver loading.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d96b10639 upstream.
The DNS resolver's use of the sunrpc cache involves a 'ttl' number
(relative) rather that a timeout (absolute). This confused me when
I wrote
commit c5b29f885a
"sunrpc: use seconds since boot in expiry cache"
and I managed to break it. The effect is that any TTL is interpreted
as 0, and nothing useful gets into the cache.
This patch removes the use of get_expiry() - which really expects an
expiry time - and uses get_uint() instead, treating the int correctly
as a ttl.
This fixes a regression that has been present since 2.6.37, causing
certain NFS accesses in certain environments to incorrectly fail.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a007c4c3e9 upstream.
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b1bc308f4 upstream.
If the state recovery machinery is triggered by the call to
nfs4_async_handle_error() then we can deadlock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97a5486826 upstream.
Since commit c7f404b ('vfs: new superblock methods to override
/proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
device name reported back to userland.
nfs_path() always generates a trailing slash when the given dentry is
the root of an NFS mount, but userland may expect the original device
name to be returned verbatim (as it used to be). Make this
canonicalisation optional and change the callers accordingly.
[jrnieder@gmail.com: use flag instead of bool argument]
Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu>
Reference: http://bugs.debian.org/669314
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acce94e68a upstream.
In very busy v3 environment, rpc.mountd can respond to the NULL
procedure but not the MNT procedure in a timely manner causing
the MNT procedure to time out. The problem is the mount system
call returns EIO which causes the mount to fail, instead of
ETIMEDOUT, which would cause the mount to be retried.
This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
the rpc_call_sync() call in nfs_mount() which causes
ETIMEDOUT to be returned on timed out connections.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit badecb001a upstream.
The 'ssid' field of the cfg80211_ibss_params is a u8 pointer and
its length is likely to be less than IEEE80211_MAX_SSID_LEN most
of the time.
This patch fixes the ssid copy in ieee80211_ibss_join() by using
the SSID length to prevent it from reading beyond the string.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[rewrapped commit message, small rewording]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a4f1a5808 upstream.
Due to pskb_may_pull() checking the skb length, all
non-management frames are checked on input whether
their 802.11 header is fully present. Also add that
check for management frames and remove a check that
is now duplicate. This prevents accessing skb data
beyond the frame end.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7fbf70ee9 upstream.
Per IEEE Std. 802.11-2012, Sec 8.2.4.4.1, the sequence Control field is
not present in control frames. We noticed this problem when processing
Block Ack Requests.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Javier Lopez <jlopex@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7dd111e8ee upstream.
The mesh header can have address extension by a 4th
or a 5th and 6th address, but never both. Drop such
frames in 802.11 -> 802.3 conversion along with any
frames that have the wrong extension.
Reviewed-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4a9fafc77 upstream.
No driver initializes chan->max_antenna_gain to something sensible, and
the only place where it is being used right now is inside ath9k. This
leads to ath9k potentially using less tx power than it can use, which can
decrease performance/range in some rare cases.
Rather than going through every single driver, this patch initializes
chan->orig_mag in wiphy_register(), ignoring whatever value the driver
left in there. If a driver for some reason wishes to limit it independent
from regulatory rulesets, it can do so internally.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d0f9dfb31 upstream.
If the call to core_dev_release_virtual_lun0() fails, then nothing
sets ret to anything other than 0, so even though everything is
torn down and freed, target_core_init_configfs() will seem to succeed
and the module will be loaded. Fix this by passing the return value
on up the chain.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf7e1abe43 upstream.
Some hardware has correct (!= 0xff) value of tssi_bounds[4] in the
EEPROM, but step is equal to 0xff. This results on ridiculous delta
calculations and completely broke TX power settings.
Reported-and-tested-by: Pavel Lucik <pavel.lucik@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c6e30936a upstream.
bf->bf_next is only while buffers are chained as part of an A-MPDU
in the tx queue. When a tid queue is flushed (e.g. on tearing down
an aggregation session), frames can be enqueued again as normal
transmission, without bf_next being cleared. This can lead to the
old pointer being dereferenced again later.
This patch might fix crashes and "Failed to stop TX DMA!" messages.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef5d437f71 upstream.
On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().
Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems. The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().
Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.
Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty(). This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault(). When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean(). So pagecache page is writeable if and only if it is
dirty.
Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.
The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6365201d8 upstream.
The X86_32-only disable_hlt/enable_hlt mechanism was used by the
32-bit floppy driver. Its effect was to replace the use of the
HLT instruction inside default_idle() with cpu_relax() - essentially
it turned off the use of HLT.
This workaround was commented in the code as:
"disable hlt during certain critical i/o operations"
"This halt magic was a workaround for ancient floppy DMA
wreckage. It should be safe to remove."
H. Peter Anvin additionally adds:
"To the best of my knowledge, no-hlt only existed because of
flaky power distributions on 386/486 systems which were sold to
run DOS. Since DOS did no power management of any kind,
including HLT, the power draw was fairly uniform; when exposed
to the much hhigher noise levels you got when Linux used HLT
caused some of these systems to fail.
They were by far in the minority even back then."
Alan Cox further says:
"Also for the Cyrix 5510 which tended to go castors up if a HLT
occurred during a DMA cycle and on a few other boxes HLT during
DMA tended to go astray.
Do we care ? I doubt it. The 5510 was pretty obscure, the 5520
fixed it, the 5530 is probably the oldest still in any kind of
use."
So, let's finally drop this.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephen Hemminger <shemminger@vyatta.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org
[ If anyone cares then alternative instruction patching could be
used to replace HLT with a one-byte NOP instruction. Much simpler. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaeb61a97b upstream.
`pc236_detach()` is called by the comedi core if it attempted to attach
a device and failed. `pc236_detach()` calls `pc236_intr_disable()` if
the comedi device private data pointer (`devpriv`) is non-null. This
test is insufficient as `pc236_intr_disable()` accesses hardware
registers and the attach routine may have failed before it has saved
their I/O base addresses.
Fix it by checking `dev->iobase` is non-zero before calling
`pc236_intr_disable()` as that means the I/O base addresses have been
saved and the hardware registers can be accessed. It also implies the
comedi device private data pointer is valid, so there is no need to
check it.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4045f72bcf upstream.
This patch fix corruption which can manifest itself by following crash
when switching on rfkill switch with rt2x00 driver:
https://bugzilla.redhat.com/attachment.cgi?id=615362
Pointer key->u.ccmp.tfm of group key get corrupted in:
ieee80211_rx_h_michael_mic_verify():
/* update IV in key information to be able to detect replays */
rx->key->u.tkip.rx[rx->security_idx].iv32 = rx->tkip_iv32;
rx->key->u.tkip.rx[rx->security_idx].iv16 = rx->tkip_iv16;
because rt2x00 always set RX_FLAG_MMIC_STRIPPED, even if key is not TKIP.
We already check type of the key in different path in
ieee80211_rx_h_michael_mic_verify() function, so adding additional
check here is reasonable.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7840487cd6 upstream.
The i2c core driver will turn the platform device ID to busnum
When using platfrom device ID as -1, it means dynamically assigned
the busnum. When writing code, we need to make sure the busnum,
and call i2c_register_board_info(int busnum, ...) to register device
if using -1, we do not know the value of busnum
In order to solve this issue, set the platform device ID as a fix number
Here using 0 to match the busnum used in i2c_regsiter_board_info()
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 910a578f7e upstream.
We copy head count to a 16 bit field, this works by chance on LE but on
BE guest gets 0. Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43a09f7fb0 upstream.
The command cancellation code doesn't check whether find_trb_seg()
couldn't find the segment that contains the TRB to be canceled. This
could cause a NULL pointer deference later in the function when next_trb
is called. It's unlikely to happen unless something is wrong with the
command ring pointers, so add some debugging in case it happens.
This patch should be backported to stable kernels as old as 3.0, that
contain the commit b63f4053cc "xHCI:
handle command after aborting the command ring".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e681b66f2e upstream.
Remove private zombie flag used to signal disconnect and to prevent
control urb from being submitted from interrupt urb completion handler.
The control urb will not be re-submitted as both the control urb and the
interrupt urb is killed on disconnect.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28c3ae9a8c upstream.
The private int_urb is never allocated so the submission from the
control completion handler will always fail. Remove this odd piece of
broken code.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3eb55cc4ed upstream.
The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 084817d793 upstream.
Move interface data allocation to attach so that it is deallocated on
errors in usb-serial probe.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7bc505166 upstream.
I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea0dbebffe upstream.
Make sure to allocate the control-message buffer dynamically as some
platforms cannot do DMA from stack.
Note that only the first byte of the old buffer was used.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c323dc023b upstream.
BIOS vendors keep changing the BIOS versions. Only match the beginning
of the string to match all Lucid tablets with board name M11JB.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66081a7251 upstream.
The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.
Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bc1e68ed6 upstream.
The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().
All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9d2bb2ee5 upstream.
This reverts commit 55420c24a0.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f878b657ce upstream.
Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.
Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f40b90972 upstream.
When booting a secondary CPU, the primary CPU hands two sets of page
tables via the secondary_data struct:
(1) swapper_pg_dir: a normal, cacheable, shared (if SMP) mapping
of the kernel image (i.e. the tables used by init_mm).
(2) idmap_pgd: an uncached mapping of the .idmap.text ELF
section.
The idmap is generally used when enabling and disabling the MMU, which
includes early CPU boot. In this case, the secondary CPU switches to
swapper as soon as it enters C code:
struct mm_struct *mm = &init_mm;
unsigned int cpu = smp_processor_id();
/*
* All kernel threads share the same mm context; grab a
* reference and switch to it.
*/
atomic_inc(&mm->mm_count);
current->active_mm = mm;
cpumask_set_cpu(cpu, mm_cpumask(mm));
cpu_switch_mm(mm->pgd, mm);
This causes a problem on ARMv7, where the identity mapping is treated as
strongly-ordered leading to architecturally UNPREDICTABLE behaviour of
exclusive accesses, such as those used by atomic_inc.
This patch re-orders the secondary_start_kernel function so that we
switch to swapper before performing any exclusive accesses.
Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: David McKay <david.mckay@st.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eedce141cd upstream.
The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values. Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.
That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits. This means that the API counts from the least significant bits
(LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
The same works for the lookup functions.
The genalloc code uses longs for the bitmap, as it should. In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member. When allocating the struct, genalloc should
reserve enough space for the bitmap. This should be a proper number of
longs that can fit the amount of bits in the bitmap.
However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs. 9 bytes, for
example, could be allocated for 70 bits.
This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.
This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated. In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated. And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.
What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes. Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.
Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.
So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.
[root@phantom-lp2 foo]# cat foo.c
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/genalloc.h>
MODULE_LICENSE("GPL");
MODULE_VERSION("0.1");
static struct gen_pool *foo_pool;
static __init int foo_init(void)
{
int ret;
foo_pool = gen_pool_create(10, -1);
if (!foo_pool)
return -ENOMEM;
ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
if (ret) {
gen_pool_destroy(foo_pool);
return ret;
}
return 0;
}
static __exit void foo_exit(void)
{
gen_pool_destroy(foo_pool);
}
module_init(foo_init);
module_exit(foo_exit);
[root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
CONFIG_SLOB=y
[root@phantom-lp2 foo]# insmod ./foo.ko
[root@phantom-lp2 foo]# rmmod foo
------------[ cut here ]------------
kernel BUG at lib/genalloc.c:243!
cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
sp: c0000000bb0e7be0
msr: 8000000000029032
current = 0xc0000000bb0e0000
paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01
pid = 13044, comm = rmmod
kernel BUG at lib/genalloc.c:243!
[c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
[c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
[c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
--- Exception: c00 (System Call) at 000000800753d1a0
SP (fffd0b0e640) is in userspace
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20f1de659b upstream.
Fix possible overflow of the buffer used for expanding environment
variables when building file list.
In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.
$ cat usr/crash.list
file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
$ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
*** buffer overflow detected ***: ./usr/gen_init_cpio terminated
This also replaces the space-indenting with tabs.
Patch based on existing fix extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63f4053cc upstream.
According to xHCI spec section 4.6.1.1 and section 4.6.1.2,
after aborting a command on the command ring, xHC will
generate a command completion event with its completion
code set to Command Ring Stopped at least. If a command is
currently executing at the time of aborting a command, xHC
also generate a command completion event with its completion
code set to Command Abort. When the command ring is stopped,
software may remove, add, or rearrage Command Descriptors.
To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci. When the command ring is stopped,
software will find the command trbs described by command
descriptors in cancel_cmd_list and modify it to No Op
command. If software can't find the matched trbs, we can
think it had been finished.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Note from Sarah: The TRB_TYPE_LINK_LE32 macro is not in the 3.0 stable
kernel, so I added it to this patch.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e4468b9a0 upstream.
The patch is used to cancel command when the command isn't
acknowledged and a timeout occurs.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b92cc66c04 upstream.
Software have to abort command ring and cancel command
when a command is failed or hang. Otherwise, the command
ring will hang up and can't handle the others. An example
of a command that may hang is the Address Device Command,
because waiting for a SET_ADDRESS request to be acknowledged
by a USB device is outside of the xHC's ability to control.
To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci.
Sarah: Fixed missing newline on "Have the command ring been stopped?"
debugging statement.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c181bc5b5d upstream.
Adding cmd_ring_state for command ring. It helps to verify
the current command ring state for controlling the command
ring operations.
This patch should be backported to kernels as old as 3.0. The commit
7ed603ecf8 "xhci: Add an assertion to
check for virt_dev=0 bug." papers over the NULL pointer dereference that
I now believe is related to a timed out Set Address command. This (and
the four patches that follow it) contain the real fix that also allows
VIA USB 3.0 hubs to consistently re-enumerate during the plug/unplug
stress tests.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2856cc2e4d ]
On a 2-node machine with 256GB of ram we get 512 lines of
console output, which is just too much.
This mimicks Yinghai Lu's x86 commit c2b91e2eec
(x86_64/mm: check and print vmemmap allocation continuous) except that
we aren't ever going to get contiguous block pointers in between calls
so just print when the virtual address or node changes.
This decreases the output by an order of 16.
Also demote this to KERN_DEBUG.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a27032eee8 ]
There are multiple errors in how sys_sparc64_personality() handles
personality flags stored in top three bytes.
- directly comparing current->personality against PER_LINUX32 doesn't work
in cases when any of the personality flags stored in the top three bytes
are used.
- directly forcefully setting personality to PER_LINUX32 or PER_LINUX
discards any flags stored in the top three bytes
Fix the first one by properly using personality() macro to compare only
PER_MASK bytes.
Fix the second one by setting only the bits that should be set, instead of
overwriting the whole value.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e793d8c674 ]
There was a serious disconnect in the logic happening in
sparc_pmu_disable_event() vs. sparc_pmu_enable_event().
Event disable is implemented by programming a NOP event into the PCR.
However, event enable was not reversing this operation. Instead, it
was setting the User/Priv/Hypervisor trace enable bits.
That's not sparc_pmu_enable_event()'s job, that's what
sparc_pmu_enable() and sparc_pmu_disable() do .
The intent of sparc_pmu_enable_event() is clear, since it first clear
out the event type encoding field. So fix this by OR'ing in the event
encoding rather than the trace enable bits.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08280e6c4c ]
If the MM is not active, only report the top-level PC. Do not try to
access the address space.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55c2770e41 ]
we want syscall_trace_leave() called on exit from any syscall;
skipping its call in case we'd done force_successful_syscall_return()
is broken...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c67525849 ]
After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here. What's died that's died and correct
handling normal unbound connections is obviously a priority.
Comment to comment:
> This has few benefits:
> 1. tcp_v6_send_reset already did that.
It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.
Actually, the problem appears to be even more serious than guaranteed
loss of resets. As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5175a5e76b ]
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.
RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.
Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bcf603f6d upstream.
On CougarPoint and PantherPoint PCH chips, the timing generator may fail
to start after DP training completes. This is due to a bug in the
FDI autotraining detect logic (which will stall the timing generator and
re-enable it once training completes), so disable it to avoid silent DP
mode setting failures.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>
commit a595c1ce4c upstream.
We weren't checking whether the resource was in use before calling
res_free(), so applications which called STREAMOFF on a v4l2 device that
wasn't already streaming would cause a BUG() to be hit (MythTV).
Reported-by: Larry Finger <larry.finger@lwfinger.net>
Reported-by: Jay Harbeston <jharbestonus@gmail.com>
Signed-off-by: Devin Heitmueller <dheitmueller@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
commit 168bfeef7b upstream.
If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.
As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.
Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2
It will be rewritten more robustly soonish.
Reported-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f5320d597 upstream.
notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.
# mkdir /cgroup/cpu/SRC
# mkdir /cgroup/cpu/DST
#
# echo 1 >/cgroup/cpu/SRC/notify_on_release
# echo 1 >/cgroup/cpu/DST/notify_on_release
#
# sleep 300 &
[1] 8629
#
# echo 8629 >/cgroup/cpu/SRC/tasks
# echo 8629 >/cgroup/cpu/DST/tasks
-> notify_on_release for /SRC must be triggered at this point,
but it isn't.
This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 301a29da6e upstream.
The current code assumes that CSIZE is 0000060, which appears to be
wrong on some arches (such as powerpc).
Signed-off-by: Nicolas Boullis <nboullis@debian.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5211187f7 upstream.
If the write endpoint is interrupt type, usb_sndintpipe() should
be passed to usb_fill_int_urb() instead of usb_sndbulkpipe().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a349e23d1c upstream.
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal. The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).
The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).
If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR). This may cause the application to crash or have incorrect
behaviour.
handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax. For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.
xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.
Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().
There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bbbbe779a upstream.
On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.
[ hpa: this should be done not just for > 4 GB but for everything above the legacy
region (1 MB), at the very least. That, however, turns out to require significant
restructuring. That work is well underway, but is not suitable for rc/stable. ]
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31fd84b95e upstream.
The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:
kernel/sys.c: In function 'override_release':
kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdc858a466 upstream.
The sharpsl_pcmcia_ops structure gets passed into
sa11xx_drv_pcmcia_probe, where it gets accessed at run-time,
unlike all other pcmcia drivers that pass their structures
into platform_device_add_data, which makes a copy.
This means the gcc warning is valid and the structure
must not be marked as __initdata.
Without this patch, building collie_defconfig results in:
drivers/pcmcia/pxa2xx_sharpsl.c:22:31: fatal error: mach-pxa/hardware.h: No such file or directory
compilation terminated.
make[3]: *** [drivers/pcmcia/pxa2xx_sharpsl.o] Error 1
make[2]: *** [drivers/pcmcia] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pavel Machek <pavel@suse.cz>
Cc: linux-pcmcia@lists.infradead.org
Cc: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts 12d63702c5 which was commit
303a7ce920 upstream.
Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
commit cd0b16c1c3 upstream.
If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.
Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reported-by: Larry McVoy <lm@bitmover.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 627072b06c upstream.
The tile tool chain uses the .eh_frame information for backtracing.
The vmlinux build drops any .eh_frame sections at link time, but when
present in kernel modules, it causes a module load failure due to the
presence of unsupported pc-relative relocations. When compiling to
use compiler feedback support, the compiler by default omits .eh_frame
information, so we don't see this problem. But when not using feedback,
we need to explicitly suppress the .eh_frame.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 2101aa5bb0 which is
commit 60ea8226cb upstream.
To quote Ben:
This is not needed, as there is no QUEUE_FLAG_BYPASS in 3.0.y.
To quote Tejun:
I don't think it will break anything as it simply changes
assignment to |= to avoid overwriting existing flags. That
said, any patch can break anything, so if possible it would be
better to drop for 3.0.y.
So I'll revert this to be safe.
Cc: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10f571d091 upstream.
Add chip details for E-mu 1010 PCIe card. It has the same
chip as found in E-mu 1010b but it uses different PCI id.
Signed-off-by: Maxim Kachur <mcdebugger@duganet.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68766a2edc upstream.
In case we detect a problem and bail out, we fail to set "ret" to a
nonzero value, and udf_load_logicalvol will mistakenly report success.
Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abce9ac292 upstream.
tpm_write calls tpm_transmit without checking the return value and
assigns the return value unconditionally to chip->pending_data, even if
it's an error value.
This causes three bugs.
So if we write to /dev/tpm0 with a tpm_param_size bigger than
TPM_BUFSIZE=0x1000 (e.g. 0x100a)
and a bufsize also bigger than TPM_BUFSIZE (e.g. 0x100a)
tpm_transmit returns -E2BIG which is assigned to chip->pending_data as
-7, but tpm_write returns that TPM_BUFSIZE bytes have been successfully
been written to the TPM, altough this is not true (bug #1).
As we did write more than than TPM_BUFSIZE bytes but tpm_write reports
that only TPM_BUFSIZE bytes have been written the vfs tries to write
the remaining bytes (in this case 10 bytes) to the tpm device driver via
tpm_write which then blocks at
/* cannot perform a write until the read has cleared
either via tpm_read or a user_read_timer timeout */
while (atomic_read(&chip->data_pending) != 0)
msleep(TPM_TIMEOUT);
for 60 seconds, since data_pending is -7 and nobody is able to
read it (since tpm_read luckily checks if data_pending is greater than
0) (#bug 2).
After that the remaining bytes are written to the TPM which are
interpreted by the tpm as a normal command. (bug #3)
So if the last bytes of the command stream happen to be a e.g.
tpm_force_clear this gets accidentally sent to the TPM.
This patch fixes all three bugs, by propagating the error code of
tpm_write and returning -E2BIG if the input buffer is too big,
since the response from the tpm for a truncated value is bogus anyway.
Moreover it returns -EBUSY to userspace if there is a response ready to be
read.
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49d859d78c upstream.
If the CPU declares that RDRAND is available, go through a guranteed
reseed sequence, and make sure that it is actually working (producing
data.) If it does not, disable the CPU feature flag.
Allow RDRAND to be disabled on the command line (as opposed to at
compile time) for a user who has special requirements with regards to
random numbers.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628c6246d4 upstream.
Architectural inlines to get random ints and longs using the RDRAND
instruction.
Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware. This enables RDRAND to be used directly, bypassing the
kernel random number pool.
For technical documentation, see:
http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/
In this patch, this is *only* used for the nonblocking random number
pool. RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random. The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.
Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.
The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Originally-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09e05d4805 upstream.
ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached. The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.
The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.
Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.
We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.
Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0829184711 upstream.
radeon_i2c_fini() walks thru the list of I2C bus recs rdev->i2c_bus[]
to destroy each of them.
radeon_ext_tmds_enc_destroy() however also has code to destroy it's
associated I2C bus rec which has been obtained by radeon_i2c_lookup()
and is therefore also in the i2c_bus[] list.
This causes a double free resulting in a kernel panic when unloading
the radeon driver.
Removing destroy code from radeon_ext_tmds_enc_destroy() fixes this
problem.
agd5f: fix compiler warning
Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82e6bfe2fb upstream.
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.
Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2614f86490 upstream.
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:
- the passed expectation is not inserted in the expectation table
and its timer was not initialized, since we have refreshed one
matching/existing expectation.
- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
timer is in some undefined state just after the allocation,
until it is appropriately initialized.
This can be a problem for the SIP helper during the expectation
addition:
...
if (nf_ct_expect_related(rtp_exp) == 0) {
if (nf_ct_expect_related(rtcp_exp) != 0)
nf_ct_unexpect_related(rtp_exp);
...
Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
spin_lock_bh(&nf_conntrack_lock);
if (del_timer(&exp->timeout)) {
nf_ct_unlink_expect(exp);
nf_ct_expect_put(exp);
}
spin_unlock_bh(&nf_conntrack_lock);
Note that del_timer always returns false if the timer has been
initialized. However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.
To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.
Thanks to Patrick McHardy for participating in the discussion of
this patch.
I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f22eb25cf5 upstream.
Via-headers are parsed beginning at the first character after the Via-address.
When the address is translated first and its length decreases, the offset to
start parsing at is incorrect and header parameters might be missed.
Update the offset after translating the Via-address to fix this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f509c689a upstream.
We're hitting bug while trying to reinsert an already existing
expectation:
kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
[<ffffffff812d423a>] ? in4_pton+0x72/0x131
[<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
[<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
[<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
[<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
[<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07153c6ec0 upstream.
It was reported that the Linux kernel sometimes logs:
klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392
ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.
The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.
The patch closes netfilter bugzilla id 771.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b423f6a40 upstream.
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 283283c4da upstream.
After commit 39f618b4fd (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5aa8b57200 upstream.
For IPv6, sizeof(struct ipv6hdr) = 40, thus the following
expression will result negative:
datalen = pkt_dev->cur_pkt_size - 14 -
sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
pkt_dev->pkt_overhead;
And, the check "if (datalen < sizeof(struct pktgen_hdr))" will be
passed as "datalen" is promoted to unsigned, therefore will cause
a crash later.
This is a quick fix by checking if "datalen" is negative. The following
patch will increase the default value of 'min_pkt_size' for IPv6.
This bug should exist for a long time, so Cc -stable too.
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26cff4e2aa upstream.
Adding two (or more) timers with large values for "expires" (they have
to reside within tv5 in the same list) leads to endless looping
between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL
is one and jiffies are crossing the value 1 << 18. The bug was
introduced between 2.6.11 and 2.6.12 (and survived for quite some
time).
This patch ensures that when cascade() is called timers within tv5 are
not added endlessly to their own list again, instead they are added to
the next lower tv level tv4 (as expected).
Signed-off-by: Christian Hildner <christian.hildner@siemens.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 012a121184 upstream.
As detailed in the thread titled "viafb PLL/clock tweaking causes XO-1.5
instability," enabling or disabling the IGA1/IGA2 clocks causes occasional
stability problems during suspend/resume cycles on this platform.
This is rather odd, as the documentation suggests that clocks have two
states (on/off) and the default (stable) configuration is configured to
enable the clock only when it is needed. However, explicitly enabling *or*
disabling the clock triggers this system instability, suggesting that there
is a 3rd state at play here.
Leaving the clock enable/disable registers alone solves this problem.
This fixes spurious reboots during suspend/resume behaviour introduced by
commit b692a63a.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8c4321f3d upstream.
Line 0 and 1 were both written to line 0 (on the display) and all subsequent
lines had an offset of -1. The result was that the last line on the display
was never overwritten by writes to /dev/fbN.
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Acked-by: Bernie Thompson <bernie@plugable.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c99af3752b upstream.
Cloudlinux have a product called lve that includes a kernel module. This
was previously GPLed but is now under a proprietary license, but the
module continues to declare MODULE_LICENSE("GPL") and makes use of some
EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Alex Lyashkov <umka@cloudlinux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49999ab27e upstream.
In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING
mount pending flag is not cleared.
One effect of this is when using the "browse" option, directory entry
attributes show up with all "?"s due to the incorrect callback and
subsequent failure return (when in fact no callback should be made).
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60ea8226cb upstream.
A queue newly allocated with blk_alloc_queue_node() has only
QUEUE_FLAG_BYPASS set. For request-based drivers,
blk_init_allocated_queue() is called and q->queue_flags is overwritten
with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
initial bypass is still in effect.
In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
instead of overwriting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd0608e71e upstream.
The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a7bbda5b1 upstream.
We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.
This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:
kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270
.. snip..
Call Trace:
[<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
[<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
[<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a519fc7a70 upstream.
Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 790198f74c upstream.
Fix two bugs of the /dev/fw* character device concerning the
FW_CDEV_IOC_GET_INFO ioctl with nonzero fw_cdev_get_info.bus_reset.
(Practically all /dev/fw* clients issue this ioctl right after opening
the device.)
Both bugs are caused by sizeof(struct fw_cdev_event_bus_reset) being 36
without natural alignment and 40 with natural alignment.
1) Memory corruption, affecting i386 userland on amd64 kernel:
Userland reserves a 36 bytes large buffer, kernel writes 40 bytes.
This has been first found and reported against libraw1394 if
compiled with gcc 4.7 which happens to order libraw1394's stack such
that the bug became visible as data corruption.
2) Information leak, affecting all kernel architectures except i386:
4 bytes of random kernel stack data were leaked to userspace.
Hence limit the respective copy_to_user() to the 32-bit aligned size of
struct fw_cdev_event_bus_reset.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7253b85cc6 upstream.
arm: Add ARM ERRATA 775420 workaround
Workaround for the 775420 Cortex-A9 (r2p2, r2p6,r2p8,r2p10,r3p0) erratum.
In case a date cache maintenance operation aborts with MMU exception, it
might cause the processor to deadlock. This workaround puts DSB before
executing ISB if an abort may occur on cache maintenance.
Based on work by Kouei Abe and feedback from Catalin Marinas.
Signed-off-by: Kouei Abe <kouei.abe.cp@rms.renesas.com>
[ horms@verge.net.au: Changed to implementation
suggested by catalin.marinas@arm.com ]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35c2a7f490 upstream.
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():
BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
[<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
[<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
[<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.
But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0a996eeed upstream.
This fault was detected using the kgdb test suite on boot and it
crashes recursively due to the fact that CONFIG_KPROBES on mips adds
an extra die notifier in the page fault handler. The crash signature
looks like this:
kgdbts:RUN bad memory access test
KGDB: re-enter exception: ALL breakpoints killed
Call Trace:
[<807b7548>] dump_stack+0x20/0x54
[<807b7548>] dump_stack+0x20/0x54
The fix for now is to have kgdb return immediately if the fault type
is DIE_PAGE_FAULT and allow the kprobe code to decide what is supposed
to happen.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a520d52e99 upstream.
The Linux EC driver includes a mechanism to detect GPE storms,
and switch from interrupt-mode to polling mode. However, polling
mode sometimes doesn't work, so the workaround is problematic.
Also, different systems seem to need the threshold for detecting
the GPE storm at different levels.
ACPI_EC_STORM_THRESHOLD was initially 20 when it's created, and
was changed to 8 in 2.6.28 commit 06cf7d3c7 "ACPI: EC: lower interrupt storm
threshold" to fix kernel bug 11892 by forcing the laptop in that bug to
work in polling mode. However in bug 45151, it works fine in interrupt
mode if we lift the threshold back to 20.
This patch makes the threshold a module parameter so that user has a
flexible option to debug/workaround this issue.
The default is unchanged.
This is also a preparation patch to fix specific systems:
https://bugzilla.kernel.org/show_bug.cgi?id=45151
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 303a7ce920 upstream.
Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 846a136881 upstream.
Michael Olbrich reported that his test program fails when built with
-O2 -mcpu=cortex-a8 -mfpu=neon, and a kernel which supports v6 and v7
CPUs:
volatile int x = 2;
volatile int64_t y = 2;
int main() {
volatile int a = 0;
volatile int64_t b = 0;
while (1) {
a = (a + x) % (1 << 30);
b = (b + y) % (1 << 30);
assert(a == b);
}
}
and two instances are run. When built for just v7 CPUs, this program
works fine. It uses the "vadd.i64 d19, d18, d16" VFP instruction.
It appears that we do not save the high-16 double VFP registers across
context switches when the kernel is built for v6 CPUs. Fix that.
Tested-By: Michael Olbrich <m.olbrich@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d3d688da8 upstream.
Unloading the omap2 nand driver missed to release the memory region which will
result in not being able to request it again if one want to load the driver
later on.
This patch fixes following error when loading omap2 module after unloading:
---8<---
~ $ rmmod omap2
~ $ modprobe omap2
[ 37.420928] omap2-nand: probe of omap2-nand.0 failed with error -16
~ $
--->8---
This error was introduced in 67ce04bf27 which
was the first commit of this driver.
Signed-off-by: Andreas Bießmann <andreas@biessmann.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb0a13a134 upstream.
If override size is too big, the module was actually loaded instead of
failing, because retval was not set.
This lead to memory corruption with the use of the freed structs nandsim
and nand_chip.
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1f55c680e upstream.
Update driver autcpu12-nvram.c so it compiles; map_read32/map_write32
no longer exist in the kernel so the driver is totally broken.
Additionally, map_info name passed to simple_map_init is incorrect.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d35be8bab9 upstream.
In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.
However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.
In order to achieve that, do the following:
1. Don't modify cpusets during suspend/resume. At all.
In particular, don't move the tasks from one cpuset to another, and
don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
during the CPU hotplug operations that are carried out in the
suspend/resume path.
2. However, cpusets and sched domains are related. We just want to avoid
altering cpusets alone. So, to keep the sched domains updated, build
a single sched domain (containing all active cpus) during each of the
CPU hotplug operations carried out in s/r path, effectively ignoring
the cpusets' cpus_allowed masks.
(Since userspace is frozen while doing all this, it will go unnoticed.)
3. During the last CPU online operation during resume, build the sched
domains by looking up the (unaltered) cpusets' cpus_allowed masks.
That will bring back the system to the same original state as it was in
before suspend.
Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00442ad04a upstream.
Commit cc9a6c8776 ("cpuset: mm: reduce large amounts of memory barrier
related damage v3") introduced a potential memory corruption.
shmem_alloc_page() uses a pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has
MPOL_F_SHARED. Therefore, when a cpuset update race occurs,
alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
reference count and frees the policy prematurely.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63f74ca21f upstream.
When shared_policy_replace() fails to allocate new->policy is not freed
correctly by mpol_set_shared_policy(). The problem is that shared
mempolicy code directly call kmem_cache_free() in multiple places where
it is easy to make a mistake.
This patch creates an sp_free wrapper function and uses it. The bug was
introduced pre-git age (IOW, before 2.6.12-rc2).
[mgorman@suse.de: Editted changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22d127a39 upstream.
shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot
be dereferenced if sp->lock is not held and 2) another thread can modify
sp_node between spin_unlock for allocating a new sp node and next
spin_lock. The bug was introduced before 2.6.12-rc2.
Kosaki's original patch for this problem was to allocate an sp node and
policy within shared_policy_replace and initialise it when the lock is
reacquired. I was not keen on this approach because it partially
duplicates sp_alloc(). As the paths were sp->lock is taken are not that
performance critical this patch converts sp->lock to sp->mutex so it can
sleep when calling sp_alloc().
[kosaki.motohiro@jp.fujitsu.com: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 869833f2c5 upstream.
Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled
=============================================================================
BUG numa_policy (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
__slab_alloc+0x3d3/0x445
kmem_cache_alloc+0x29d/0x2b0
mpol_new+0xa3/0x140
sys_mbind+0x142/0x620
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
__slab_free+0x2e/0x1de
kmem_cache_free+0x25a/0x260
__mpol_put+0x27/0x30
remove_vma+0x68/0x90
exit_mmap+0x118/0x140
mmput+0x73/0x110
exit_mm+0x108/0x130
do_exit+0x162/0xb90
do_group_exit+0x4f/0xc0
sys_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
The problem is that the structure is being prematurely freed due to a
reference count imbalance. In the following case mbind(addr, len) should
replace the memory policies of both vma1 and vma2 and thus they will
become to share the same mempolicy and the new mempolicy will have the
MPOL_F_SHARED flag.
+-------------------+-------------------+
| vma1 | vma2(shmem) |
+-------------------+-------------------+
| |
addr addr+len
alloc_pages_vma() uses get_vma_policy() and mpol_cond_put() pair for
maintaining the mempolicy reference count. The current rule is that
get_vma_policy() only increments refcount for shmem VMA and
mpol_conf_put() only decrements refcount if the policy has
MPOL_F_SHARED.
In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED!
The reference count will be decreased even though was not increased
whenever alloc_page_vma() is called. This has been broken since commit
[52cd3b07: mempolicy: rework mempolicy Reference Counting] in 2008.
There is another serious bug with the sharing of memory policies.
Currently, mempolicy rebind logic (it is called from cpuset rebinding)
ignores a refcount of mempolicy and override it forcibly. Thus, any
mempolicy sharing may cause mempolicy corruption. The bug was
introduced by commit [68860ec1: cpusets: automatic numa mempolicy
rebinding].
Ideally, the shared policy handling would be rewritten to either
properly handle COW of the policy structures or at least reference count
MPOL_F_SHARED based exclusively on information within the policy.
However, this patch takes the easier approach of disabling any policy
sharing between VMAs. Each new range allocated with sp_alloc will
allocate a new policy, set the reference count to 1 and drop the
reference count of the old policy. This increases the memory footprint
but is not expected to be a major problem as mbind() is unlikely to be
used for fine-grained ranges. It is also inefficient because it means
we allocate a new policy even in cases where mbind_range() could use the
new_policy passed to it. However, it is more straight-forward and the
change should be invisible to the user.
[mgorman@suse.de: Edited changelog]
Reported-by: Dave Jones <davej@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d34694c1a upstream.
Commit 05f144a0d5 ("mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages") removed vma->vm_policy updates code but it is
the purpose of mbind_range(). Now, mbind_range() is virtually a no-op
and while it does not allow memory corruption it is not the right fix.
This patch is a revert.
[mgorman@suse.de: Edited changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad1be8d345 upstream.
When register_netdev fails, the init'ed NAPIs by netif_napi_add must be
deleted with netif_napi_del, and also when driver unloads, it should
delete the NAPI before unregistering netdevice using unregister_netdev.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 477206a018 upstream.
The r8169 may get stuck or show bad behaviour after activating TSO :
the net_device is not stopped when it has no more TX descriptors.
This problem comes from TX_BUFS_AVAIL which may reach -1 when all
transmit descriptors are in use. The patch simply tries to keep positive
values.
Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
Zotac 890GXITX.
Signed-off-by: Julien Ducourthial <jducourt@free.fr>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a15cd2ff4 upstream.
With runtime PM, if the ethernet cable is disconnected, the device is
transitioned to D3 state to conserve energy. If the system is shutdown
in this state, any register accesses in rtl_shutdown are dropped on
the floor. As the device was programmed by .runtime_suspend() to wake
on link changes, it is thus brought back up as soon as the link recovers.
Resuming every suspended device through the driver core would slow things
down and it is not clear how many devices really need it now.
Original report and D0 transition patch by Sameer Nanda. Patch has been
changed to comply with advices by Rafael J. Wysocki and the PM folks.
Reported-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 811fd3010c upstream.
Realtek has specified that the post 8168c gigabit chips and the post
8105e fast ethernet chips recover automatically from a Rx FIFO overflow.
The driver does not need to clear the RxFIFOOver bit of IntrStatus and
it should rather avoid messing it.
The implementation deserves some explanation:
1. events outside of the intr_event bit mask are now ignored. It enforces
a no-processing policy for the events that either should not be there
or should be ignored.
2. RxFIFOOver was already ignored in rtl_cfg_infos[RTL_CFG_1] for the
whole 8168 line of chips with two exceptions:
- RTL_GIGA_MAC_VER_22 since b5ba6d12bd
("use RxFIFO overflow workaround for 8168c chipset.").
This one should now be correctly handled.
- RTL_GIGA_MAC_VER_11 (8168b) which requires a different Rx FIFO
overflow processing.
Though it does not conform to Realtek suggestion above, the updated
driver includes no change for RTL_GIGA_MAC_VER_12 and RTL_GIGA_MAC_VER_17.
Both are 8168b. RTL_GIGA_MAC_VER_12 is common and a bit old so I'd rather
wait for experimental evidence that the change suggested by Realtek really
helps or does not hurt in unexpected ways.
Removed case statements in rtl8169_interrupt are only 8168 relevant.
3. RxFIFOOver is masked for post 8105e 810x chips, namely the sole 8105e
(RTL_GIGA_MAC_VER_30) itself.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10953db8e1 upstream
The link down would occur when reseting PHY. And it would take about 2 ~ 5
seconds from link down to link up. If the delay of pm_schedule_suspend is
not long enough, the device would enter runtime_suspend before link up.
After link up, the device would wake up and reset PHY again. Then, you
would find the driver keep in a loop of runtime_suspend and rumtime_resume.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit deb9d93c89 upstream.
8168d and above allow jumbo frames beyond 8k. Bump the received
packet length check before enabling jumbo frames on these chipsets.
Frame length indication covers bits 0..13 of the first Rx descriptor
32 bits for the 8169 and 8168. I only have authoritative documentation
for the allowed use of the extra (13) bit with the 8169 and 8168c.
Realtek's drivers use the same mask for the 816x and the fast ethernet
only 810x.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d58d46b5d8 upstream.
- fix features : jumbo frames and checksumming can not be used at the
same time.
- introduce hw_jumbo_{enable / disable} helpers. Their content has been
creatively extracted from Realtek's own drivers. As an illustration,
it would be nice to know how/if the MaxTxPacketSize register operates
when the device can work with a 9k jumbo frame as its documentation
(8168c) can not be applied beyond ~7k.
- rtl_tx_performance_tweak is moved forward. No change.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a10d206ef1 upstream.
Each grace period is supposed to have at least one callback waiting
for that grace period to complete. However, if CONFIG_NO_HZ=n, an
extra callback-free grace period is no big problem -- it will chew up
a tiny bit of CPU time, but it will complete normally. In contrast,
CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
sleep indefinitely, in turn indefinitely delaying completion of the
callback-free grace period. Given that nothing is waiting on this grace
period, this is also not a problem.
That is, unless RCU CPU stall warnings are also enabled, as they are
in recent kernels. In this case, if a CPU wakes up after at least one
minute of inactivity, an RCU CPU stall warning will result. The reason
that no one noticed until quite recently is that most systems have enough
OS noise that they will never remain absolutely idle for a full minute.
But there are some embedded systems with cut-down userspace configurations
that consistently get into this situation.
All this begs the question of exactly how a callback-free grace period
gets started in the first place. This can happen due to the fact that
CPUs do not necessarily agree on which grace period is in progress.
If a CPU still believes that the grace period that just completed is
still ongoing, it will believe that it has callbacks that need to wait for
another grace period, never mind the fact that the grace period that they
were waiting for just completed. This CPU can therefore erroneously
decide to start a new grace period. Note that this can happen in
TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system: Deadlock
considerations mean that the CPU that detected the end of the grace
period is not necessarily officially informed of this fact for some time.
Once this CPU notices that the earlier grace period completed, it will
invoke its callbacks. It then won't have any callbacks left. If no
other CPU has any callbacks, we now have a callback-free grace period.
This commit therefore makes CPUs check more carefully before starting a
new grace period. This new check relies on an array of tail pointers
into each CPU's list of callbacks. If the CPU is up to date on which
grace periods have completed, it checks to see if any callbacks follow
the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
follow the RCU_WAIT_TAIL segment. The reason that this works is that
the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
as soon as the CPU is officially notified that the old grace period
has ended.
This change is to cpu_needs_another_gp(), which is called in a number
of places. The only one that really matters is in rcu_start_gp(), where
the root rcu_node structure's ->lock is held, which prevents any
other CPU from starting or completing a grace period, so that the
comparison that determines whether the CPU is missing the completion
of a grace period is stable.
Reported-by: Becky Bruce <bgillbruce@gmail.com>
Reported-by: Subodh Nijsure <snijsure@grid-net.com>
Reported-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e3b3b105a upstream.
SI asics store voltage information differently so we
don't have a way to deal with it properly yet.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c10514394e upstream.
While going through Ubuntu bugs, I discovered this patch being
posted and a confirmation that the patch works as expected.
Finding out how the hw volume really works would be preferrable
to just disabling the broken one, but this would be better than
nothing.
Credit: sndfnsdfin (qawsnews)
BugLink: https://bugs.launchpad.net/bugs/559939
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4f1e48bd1 upstream.
When the loopback timer handler is running, calling del_timer() (for STOP
trigger) will not wait for the handler to complete before deactivating the
timer. The timer gets rescheduled in the handler as usual. Then a subsequent
START trigger will try to start the timer using add_timer() with a timer pending
leading to a kernel panic.
Serialize the calls to add_timer() and del_timer() using a spin lock to avoid
this.
Signed-off-by: Omair Mohammed Abdullah <omair.m.abdullah@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 027ef6c878 upstream.
In many places !pmd_present has been converted to pmd_none. For pmds
that's equivalent and pmd_none is quicker so using pmd_none is better.
However (unless we delete pmd_present) we should provide an accurate
pmd_present too. This will avoid the risk of code thinking the pmd is non
present because it's under __split_huge_page_map, see the pmd_mknotpresent
there and the comment above it.
If the page has been mprotected as PROT_NONE, it would also lead to a
pmd_present false negative in the same way as the race with
split_huge_page.
Because the PSE bit stays on at all times (both during split_huge_page and
when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
but checking the PROTNONE bit too is still good to remember pmd_present
must always keep PROT_NONE into account.
This explains a not reproducible BUG_ON that was seldom reported on the
lists.
The same issue is in pmd_large, it would go wrong with both PROT_NONE and
if it races with split_huge_page.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec4d9f626d upstream.
In fuzzing with trinity, lockdep protested "possible irq lock inversion
dependency detected" when isolate_lru_page() reenabled interrupts while
still holding the supposedly irq-safe tree_lock:
invalidate_inode_pages2
invalidate_complete_page2
spin_lock_irq(&mapping->tree_lock)
clear_page_mlock
isolate_lru_page
spin_unlock_irq(&zone->lru_lock)
isolate_lru_page() is correct to enable interrupts unconditionally:
invalidate_complete_page2() is incorrect to call clear_page_mlock() while
holding tree_lock, which is supposed to nest inside lru_lock.
Both truncate_complete_page() and invalidate_complete_page() call
clear_page_mlock() before taking tree_lock to remove page from radix_tree.
I guess invalidate_complete_page2() preferred to test PageDirty (again)
under tree_lock before committing to the munlock; but since the page has
already been unmapped, its state is already somewhat inconsistent, and no
worse if clear_page_mlock() moved up.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b71fc079b5 upstream.
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a08f447fa upstream.
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f066055a34 upstream.
Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1965f66e7d upstream.
For bridges with "secondary > subordinate", i.e., invalid bus number
apertures, we don't enumerate anything behind the bridge unless the
user specified "pci=assign-busses".
This patch makes us automatically try to reassign the downstream bus
numbers in this case (just for that bridge, not for all bridges as
"pci=assign-busses" does).
We don't discover all the devices on the Intel DP43BF motherboard
without this change (or "pci=assign-busses") because its BIOS configures
a bridge as:
pci 0000:00:1e.0: PCI bridge to [bus 20-08] (subtractive decode)
[bhelgaas: changelog, change message to dev_info]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18412
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=625754
Reported-by: Brian C. Huffman <bhuffman@graze.net>
Reported-by: VL <vl.homutov@gmail.com>
Tested-by: VL <vl.homutov@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
commit d436de8ce2 upstream.
__scsi_remove_device (e.g. due to dev_loss_tmo) calls
zfcp_scsi_slave_destroy which in turn sends a close LUN FSF request to
the adapter. After 30 seconds without response,
zfcp_erp_timeout_handler kicks the ERP thread failing the close LUN
ERP action. zfcp_erp_wait in zfcp_erp_lun_shutdown_wait and thus
zfcp_scsi_slave_destroy returns and then scsi_device is no longer
valid. Sometime later the response to the close LUN FSF request may
finally come in. However, commit
b62a8d9b45
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit"
introduced a number of attempts to unconditionally access struct
zfcp_scsi_dev through struct scsi_device causing a use-after-free.
This leads to an Oops due to kernel page fault in one of:
zfcp_fsf_abort_fcp_command_handler, zfcp_fsf_open_lun_handler,
zfcp_fsf_close_lun_handler, zfcp_fsf_req_trace,
zfcp_fsf_fcp_handler_common.
Move dereferencing of zfcp private data zfcp_scsi_dev allocated in
scsi_device via scsi_transport_reserve_device after the check for
potentially aborted FSF request and thus no longer valid scsi_device.
Only then assign sdev_to_zfcp(sdev) to the local auto variable struct
zfcp_scsi_dev *zfcp_sdev.
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d99b601b63 upstream.
Upstream commit f3450c7b91
"[SCSI] zfcp: Replace local reference counting with common kref"
accidentally dropped a reference count check before tearing down
zfcp_ports that are potentially in use by zfcp_units.
Even remote ports in use can be removed causing
unreachable garbage objects zfcp_ports with zfcp_units.
Thus units won't come back even after a manual port_rescan.
The kref of zfcp_port->dev.kobj is already used by the driver core.
We cannot re-use it to track the number of zfcp_units.
Re-introduce our own counter for units per port
and check on port_remove.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca579c9f13 upstream.
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. Replace port->adapter->scsi_host by
adapter->scsi_host.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Oversight in upsteam commit of v2.6.37
a1ca48319a
"[SCSI] zfcp: Move ACL/CFDC code to zfcp_cfdc.c"
which merged the content of zfcp_erp_port_access_changed().
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb45214960 upstream.
If the mapping of FCP device bus ID and corresponding subchannel
is modified while the Linux image is suspended, the resume of FCP
devices can fail. During resume, zfcp gets callbacks from cio regarding
the modified subchannels but they can be arbitrarily mixed with the
restore/resume callback. Since the cio callbacks would trigger
adapter recovery, zfcp could wakeup before the resume callback.
Therefore, ignore the cio callbacks regarding subchannels while
being suspended. We can safely do so, since zfcp does not deal itself
with subchannels. For problem determination purposes, we still trace the
ignored callback events.
The following kernel messages could be seen on resume:
kernel: <WWPN>: parent <FCP device bus ID> should not be sleeping
As part of adapter reopen recovery, zfcp performs auto port scanning
which can erroneously try to register new remote ports with
scsi_transport_fc and the device core code complains about the parent
(adapter) still sleeping.
kernel: zfcp.3dff9c: <FCP device bus ID>:\
Setting up the QDIO connection to the FCP adapter failed
<last kernel message repeated 3 more times>
kernel: zfcp.574d43: <FCP device bus ID>:\
ERP cannot recover an error on the FCP device
In such cases, the adapter gave up recovery and remained blocked along
with its child objects: remote ports and LUNs/scsi devices. Even the
adapter shutdown as part of giving up recovery failed because the ccw
device state remained disconnected. Later, the corresponding remote
ports ran into dev_loss_tmo. As a result, the LUNs were erroneously
not available again after resume.
Even a manually triggered adapter recovery (e.g. sysfs attribute
failed, or device offline/online via sysfs) could not recover the
adapter due to the remaining disconnected state of the corresponding
ccw device.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0100998dbf upstream.
Duplicate fssrh_2 from a54ca0f62f
"[SCSI] zfcp: Redesign of the debug tracing for HBA records."
complicates distinction of generic status read response from
local link up.
Duplicate fsscth1 from 2c55b750a8
"[SCSI] zfcp: Redesign of the debug tracing for SAN records."
complicates distinction of good common transport response from
invalid port handle.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf9ecf4b63 ]
On the earliest TSO capable devices, TSO was accomplished through
firmware. The TSO cannot coexist with ASF management firmware though.
The tg3 driver determines whether or not ASF is enabled by calling
tg3_get_eeprom_hw_cfg(), which checks a particular bit of NIC memory.
Commit dabc5c670d, entitled "tg3: Move
TSO_CAPABLE assignment", accidentally moved the code that determines
TSO capabilities earlier than the call to tg3_get_eeprom_hw_cfg(). As a
consequence, the driver was attempting to determine TSO capabilities
before it had all the data it needed to make the decision.
This patch fixes the problem by revisiting and reevaluating the decision
after tg3_get_eeprom_hw_cfg() is called.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8babe8cc65 ]
In order for the network layer to see that AoE requires
no checksumming in a generic way, the packets must be
marked as requiring no checksum, so we make this requirement
explicit with the assertion.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0d680e577 ]
A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE). Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.
commit f01a5236bd
Author: Jesse Gross <jesse@nicira.com>
Date: Sun Jan 9 06:23:31 2011 +0000
net offloading: Generalize netif_get_vlan_features().
The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming. Calling it for a
protocol that requires no checksum is inappropriate.
The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets. Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:
http://www.spinics.net/lists/linux-mm/msg15184.html
The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:
https://lkml.org/lkml/2012/8/28/140
The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0cc88a762 ]
While investigating l2tp bug, I hit a bug in eth_type_trans(),
because not enough bytes were pulled in skb head.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96af69ea2a ]
mip6_mh_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b05c4b50e ]
icmpv6_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Also, if icmpv6 header cannot be found, do not deliver the packet,
as we do in IPv4.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab43ed8b74 ]
icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e10986d1d ]
Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()
Fix is to make sure socket is a SOCK_STREAM one.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6862234238 ]
In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.
For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.
This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)
The patch is co-written with Eric Dumazet <edumazet@google.com>
Signed-off-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b018d57ff ]
When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c3a5bdae2 ]
SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via
skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of
sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up
by __sctp_write_space().
Buffer space charged via sctp_set_owner_w() is released in sctp_wfree()
which calls __sctp_write_space() directly.
Buffer space charged via skb_set_owner_w() is released via sock_wfree()
which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set.
sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets.
Therefore if sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is
interrupted by a signal.
This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ...
Charging for the data twice does not make sense in the first place, it
leads to overcharging sndbuf by a factor 2. Therefore this patch only
charges a single byte in wmem_alloc when transmitting an SCTP packet to
ensure that the socket stays alive until the packet has been released.
This means that control chunks are no longer accounted for in wmem_alloc
which I believe is not a problem as skb->truesize will typically lead
to overcharging anyway and thus compensates for any control overhead.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15c041759b ]
If recv() syscall is called for a TCP socket so that
- IOAT DMA is used
- MSG_WAITALL flag is used
- requested length is bigger than sk_rcvbuf
- enough data has already arrived to bring rcv_wnd to zero
then when tcp_recvmsg() gets to calling sk_wait_data(), receive
window can be still zero while sk_async_wait_queue exhausts
enough space to keep it zero. As this queue isn't cleaned until
the tcp_service_net_dma() call, sk_wait_data() cannot receive
any data and blocks forever.
If zero receive window and non-empty sk_async_wait_queue is
detected before calling sk_wait_data(), process the queue first.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6825a26c2d ]
as we hold dst_entry before we call __ip6_del_rt,
so we should alse call dst_release not only return
-ENOENT when the rt6_info is ip6_null_entry.
and we already hold the dst entry, so I think it's
safe to call dst_release out of the write-read lock.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5316cf9a51 ]
skb_reset_mac_len() relies on the value of the skb->network_header pointer,
therefore we must wait for such pointer to be recalculated before computing
the new mac_len value.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2120c52da6 ]
I discovered I couldn't get sierra_net to work on a powerpc. Turns out
the firmware attribute check assumes the system is little endian and
hence fails because the attributes is a 16 bit value.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7126195697 ]
If the old timestamps of a class, say cl, are stale when the class
becomes active, then QFQ may assign to cl a much higher start time
than the maximum value allowed. This may happen when QFQ assigns to
the start time of cl the finish time of a group whose classes are
characterized by a higher value of the ratio
max_class_pkt/weight_of_the_class with respect to that of
cl. Inserting a class with a too high start time into the bucket list
corrupts the data structure and may eventually lead to crashes.
This patch limits the maximum start time assigned to a class.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e4d1aa40e3 ]
Add a check if pdev->bus->self == NULL (root bus). When attaching
a netxen NIC to a VM it can be on the root bus and the guest would
crash in netxen_mask_aer_correctable() because of a NULL pointer
dereference if CONFIG_PCIEAER is present.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b836ddde1 ]
Commit 36a1211970 (netprio_cgroup.h:
dont include module.h from other includes) made the following build
error on ixp4xx_hss pop up:
CC [M] drivers/net/wan/ixp4xx_hss.o
drivers/net/wan/ixp4xx_hss.c:1412:20: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1413:25: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1414:21: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1415:19: error: expected ';', ',' or ')'
before string constant
make[8]: *** [drivers/net/wan/ixp4xx_hss.o] Error 1
This was previously hidden because ixp4xx_hss includes linux/hdlc.h which
includes linux/netdevice.h which includes linux/netprio_cgroup.h which
used to include linux/module.h. The real issue was actually present since
the initial commit that added this driver since it uses macros from
linux/module.h without including this file.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ffb5ba9001 ]
chan->count is used by rx channel. If the desc count is not updated by
the clean up loop in cpdma_chan_stop, the value written to the rxfree
register in cpdma_chan_start will be incorrect.
Signed-off-by: Tao Hou <hotforest@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ecd7918745 ]
The current code fails to ensure that the netlink message actually
contains as many bytes as the header indicates. If a user creates a new
state or updates an existing one but does not supply the bytes for the
whole ESN replay window, the kernel copies random heap bytes into the
replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
netlink attribute. This leads to following issues:
1. The replay window has random bits set confusing the replay handling
code later on.
2. A malicious user could use this flaw to leak up to ~3.5kB of heap
memory when she has access to the XFRM netlink interface (requires
CAP_NET_ADMIN).
Known users of the ESN replay window are strongSwan and Steffen's
iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
uses the interface with a bitmap supplied while the former does not.
strongSwan is therefore prone to run into issue 1.
To fix both issues without breaking existing userland allow using the
XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
fully specified one. For the former case we initialize the in-kernel
bitmap with zero, for the latter we copy the user supplied bitmap. For
state updates the full bitmap must be supplied.
To prevent overflows in the bitmap length calculation the maximum size
of bmp_len is limited to 128 by this patch -- resulting in a maximum
replay window of 4096 packets. This should be sufficient for all real
life scenarios (RFC 4303 recommends a default replay window size of 64).
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Martin Willi <martin@revosec.ch>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1f86840f89 ]
The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.
Initial version of the patch by Brad Spengler.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b789836f4 ]
The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f778a63671 ]
The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c87308bde ]
copy_to_user_auth() fails to initialize the remainder of alg_name and
therefore discloses up to 54 bytes of heap memory via netlink to
userland.
Use strncpy() instead of strcpy() to fill the trailing bytes of alg_name
with null bytes.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 433a195480 ]
if xfrm_policy_get_afinfo returns 0, it has already released the read
lock, xfrm_policy_put_afinfo should not be called again.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c254637225 ]
When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 864745d291 ]
When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.
This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3b59df46a4 ]
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.
If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.
We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.
To do this, we introduce a recheck function that does this
check in the ESN case.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 959d1af8cf upstream.
WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution. When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.
Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb. Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.
Add an explicit smp_wmb() before work_clear_pending().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f6d93aa9d upstream.
The ACARD driver calls udelay() with a value > 2000, which leads to to
the following compilation error on ARM:
ERROR: "__bad_udelay" [drivers/scsi/atp870u.ko] undefined!
make[1]: *** [__modpost] Error 1
This is because udelay is defined on ARM, roughly speaking, as
#define udelay(n) ((n) > 2000 ? __bad_udelay() : \
__const_udelay((n) * ((2199023U*HZ)>>11)))
The argument to __const_udelay is the number of jiffies to wait divided
by 4, but this does not work unless the multiplication does not
overflow, and that is what the build error is designed to prevent. The
intended behavior can be achieved by using mdelay to call udelay
multiple times in a loop.
[jrnieder@gmail.com: adding context]
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f96972f2dc upstream.
As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus(). Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.
This fixes one reboot issue seen on imx6q (Cortex-A9 Quad). The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones. Otherwise, the secondary core running the restart
routine will fail to come to online after reboot.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfb117b3e5 upstream.
Check whether we evaluated _ADR successfully. Previously we ignored
failure, so we would have used garbage data from the stack as the device
and function number.
We return AE_OK so that we ignore only this slot and continue looking
for other slots.
Found by Coverity (CID 113981).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[bwh: Backported to 2.6.32/3.0: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc54ab7295 upstream.
The _OSC method may exist in module level code,
so it must be called after ACPI_FULL_INITIALIZATION
On some new platforms with Zero-Power-Optical-Disk-Drive (ZPODD)
support, this fix is necessary to save power.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b961180ef upstream.
ite_dev::rdev is currently initialised in ite_probe() after
rc_register_device() returns. If a newly registered device is opened
quickly enough, we may enable interrupts and try to use ite_dev::rdev
before it has been initialised. Move it up to the earliest point we
can, right after calling rc_allocate_device().
Reported-and-tested-by: YunQiang Su <wzssyqa@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c353acba28 upstream.
The call if_changed mechanism does not work when the command contains
backslashes. This basically is an issue with lzo and bzip2 compressed
kernels. The compressed binaries do not contain the uncompressed image
size, so these use size_append to append the size. This results in
backslashes in the executed command. With this if_changed always
detects a change in the command and rebuilds the compressed image even
if nothing has changed.
Fix this by escaping backslashes in make-cmd
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e47f8976d8 upstream.
A quote from SPC-4: "While in the unavailable primary target port
asymmetric access state, the device server shall support those of
the following commands that it supports while in the active/optimized
state: [ ... ] d) SET TARGET PORT GROUPS; [ ... ]". Hence enable
sending STPG to a target port group that is in the unavailable state.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc3f02a795 upstream.
John reports:
BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
[..]
Call Trace:
[<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
[<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
[<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
[<ffffffff81421e35>] sas_port_delete+0x25/0x160
[<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
Don't restart lookup of more stargets in the multi-target case, just
arrange to traverse the list once, on the assumption that new targets
are always added at the end. There is no guarantee that the target will
change state in scsi_target_reap() so we can end up spinning if we
restart.
Acked-by: Jack Wang <jack_wang@usish.com>
LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com>
Reported-by: John Drescher <drescherjm@gmail.com>
Tested-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: Dan Williams <djbw@fb.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be768912a4 upstream.
git commit c8adf9a3e8
"PCI: pre-allocate additional resources to devices only after
successful allocation of essential resources."
fails to take into consideration the optional-resources needed by children
devices while calculating the optional-resource needed by the bridge.
This can be a problem on some setup. For example, if a hotplug bridge has 8
children hotplug bridges, the bridge should have enough resources to accomodate
the hotplug requirements for each of its children hotplug bridges. Currently
this is not the case.
This patch fixes the problem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d70a74ffd upstream.
The oem parameter image embedded in the efi variable is at an offset
from the start of the variable. However, in the failure path we try to
free the 'orom' pointer which is only valid when the paramaters are
being read from the legacy option-rom space.
Since failure to load the oem parameters is unlikely and we keep the
memory around in the success case just defer all de-allocation to devm.
Reported-by: Don Morris <don.morris@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b796d06d5 upstream.
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bea1e22df4 upstream.
Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)
Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().
In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.
Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21e89afd32 upstream.
It turns out Smart Array logical drives do not support target
reset and when the target reset fails, the logical drive will
be taken off line. Symptoms look like this:
hpsa 0000:03:00.0: Abort request on C1:B0:T0:L0
hpsa 0000:03:00.0: resetting device 1:0:0:0
hpsa 0000:03:00.0: cp ffff880037c56000 is reported invalid (probably means target device no longer present)
hpsa 0000:03:00.0: resetting device failed.
sd 1:0:0:0: Device offlined - not ready after error recovery
sd 1:0:0:0: rejecting I/O to offline device
EXT3-fs error (device sdb1): read_block_bitmap:
LUN reset is supported though, and is what we should be using.
Target reset is also disruptive in shared SAS situations,
for example, an external MSA1210m which does support target
reset attached to Smart Arrays in multiple hosts -- a target
reset from one host is disruptive to other hosts as all LUNs
on the target will be reset and will abort all outstanding i/os
back to all the attached hosts. So we should use LUN reset,
not target reset.
Tested this with Smart Array logical drives and with tape drives.
Not sure how this bug survived since 2009, except it must be very
rare for a Smart Array to require more than 30s to complete a request.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 225c56960f upstream.
The length field in the host config packet is only 16-bit long, so
passing it 0x10000 (64K which is our standard PAGE_SIZE) doesn't
work and result in an empty config from the server.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abb3e01103 upstream.
Currently UBI fails in autoresize when it is in R/O mode (e.g., because the
underlying MTD device is R/O). This patch fixes the issue - we just skip
autoresize and print a warning.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88ed2a6061 upstream.
Uplink (TX) network data will go through gsm_dlci_data_output_framed
there is a bug where if memory allocation fails, the skb which
has already been pulled off the list will be lost.
In addition TX skbs were being processed in LIFO order
Fixed the memory leak, and changed to FIFO order processing
Signed-off-by: Russ Gorby <russ.gorby@intel.com>
Tested-by: Kappel, LaurentX <laurentx.kappel@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6e097dfdf upstream.
The Intel XHCI specification says that after clearing the run/stop bit
the controller may take up to 16ms to halt. We've seen a device take
14ms, which with the current timeout of 10ms causes the kernel to
abort the suspend. Increasing the timeout to the recommended value
fixes the problem.
This patch should be backported to kernels as old as 2.6.37, that
contain the commit 5535b1d5f8 "USB: xHCI:
PCI power management implementation".
Signed-off-by: Michael Spang <spang@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f34f9d186d upstream.
In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
memory for info->fields, it frees already allocated stuff and returns
error to its caller, fill_note_info. Which in turn returns error to its
caller, elf_core_dump. Which jumps to cleanup label and calls
free_note_info, which will happily try to free all info->fields again.
BOOM.
This is the fix.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e44708f75 upstream.
There were some locking holes in the management of the MUX's
message queue for 2 code paths:
1) gsmld_write_wakeup
2) receipt of CMD_FCON flow-control message
In both cases gsm_data_kick is called w/o locking so it can collide
with other other instances of gsm_data_kick (pulling messages tx_tail)
or potentially other instances of __gsm_data_queu (adding messages to tx_head)
Changed to take the tx_lock in these 2 cases
Signed-off-by: Russ Gorby <russ.gorby@intel.com>
Tested-by: Yin, Fengwei <fengwei.yin@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80fab3b244 upstream.
When a device with an isochronous endpoint is behind a hub plugged into
the Intel Panther Point xHCI host controller, and the driver submits
multiple frames per URB, the xHCI driver will set the Block Event
Interrupt (BEI) flag on all but the last TD for the URB. This causes
the host controller to place an event on the event ring, but not send an
interrupt. When the last TD for the URB completes, BEI is cleared, and
we get an interrupt for the whole URB.
However, under a Panther Point xHCI host controller, if the parent hub
is unplugged when one or more events from transfers with BEI set are on
the event ring, a port status change event is placed on the event ring,
but no interrupt is generated. This means URBs stop completing, and the
USB device disconnect is not noticed. Something like a USB headset will
cause mplayer to hang when the device is disconnected.
If another transfer is sent (such as running `sudo lsusb -v`), the next
transfer event seems to "unstick" the event ring, the xHCI driver gets
an interrupt, and the disconnect is reported to the USB core.
The fix is not to use the BEI flag under the Panther Point xHCI host.
This will impact power consumption and system responsiveness, because
the xHCI driver will receive an interrupt for every frame in all
isochronous URBs instead of once per URB.
Intel chipset developers confirm that this bug will be hit if the BEI
flag is used on any endpoint, not just ones that are behind a hub.
This patch should be backported to kernels as old as 3.0, that contain
the commit 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7083909023 upstream.
Some of the EFI variable attributes are missing from print out from
/sys/firmware/efi/vars/*/attributes. This patch adds those in. It also
updates code to use pre-defined constants for masking current value
of attributes.
Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26e8220adb upstream.
Apparently the same card model has two IDs, so this patch
complements the commit 39aced68d6
adding the missing one.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5dd553b9f upstream.
This works around a few glitches in the ST version of the PL011
serial driver when using very high baud rates, as we do in the
Ux500: 3, 3.25, 4 and 4.05 Mbps.
Problem Observed/rootcause:
When using high baud-rates, and the baudrate*8 is getting close to
the provided clock frequency (so a division factor close to 1), when
using bursts of characters (so they are abutted), then it seems as if
there is not enough time to detect the beginning of the start-bit which
is a timing reference for the entire character, and thus the sampling
moment of character bits is moving towards the end of each bit, instead
of the middle.
Fix:
Increase slightly the RX baud rate of the UART above the theoretical
baudrate by 5%. This will definitely give more margin time to the
UART_RX to correctly sample the data at the middle of the bit period.
Also fix the ages old copy-paste error in the very stressed comment,
it's referencing the registers used in the PL010 driver rather than
the PL011 ones.
Signed-off-by: Guillaume Jaunet <guillaume.jaunet@stericsson.com>
Signed-off-by: Christophe Arnal <christophe.arnal@stericsson.com>
Signed-off-by: Matthias Locher <matthias.locher@stericsson.com>
Signed-off-by: Rajanikanth HV <rajanikanth.hv@stericsson.com>
Cc: Bibek Basu <bibek.basu@stericsson.com>
Cc: Par-Gunnar Hjalmdahl <par-gunnar.hjalmdahl@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee8b593aff upstream.
If a user provides a buffer larger than a tty->write_buf chunk and
passes '\r' at the end of the buffer, we touch an out-of-bound memory.
Add a check there to prevent this.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samo Pogacnik <samo_pogacnik@t-2.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9490e93c1 upstream.
Change the BUG_ON to WARN_ON and return in case of tty->read_buf==NULL. We want to track a
couple of long standing reports of this but at the same time we can avoid killing the box.
Signed-off-by: Stanislav Kozina <skozina@redhat.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8cad4c89e upstream.
When `do_cmd_ioctl()` allocates memory for the kernel copy of a channel
list, it frees any previously allocated channel list in
`async->cmd.chanlist` and replaces it with the new one. However, if the
device is ever removed (or "detached") the cleanup code in
`cleanup_device()` in "drivers.c" does not free this memory so it is
lost.
A sensible place to free the kernel copy of the channel list is in
`do_become_nonbusy()` as at that point the comedi asynchronous command
associated with the channel list is no longer valid. Free the channel
list in `do_become_nonbusy()` instead of `do_cmd_ioctl()` and clear the
pointer to prevent it being freed more than once.
Note that `cleanup_device()` could be called at an inappropriate time
while the comedi device is open, but that's a separate bug not related
to this this patch.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d06e3df28 upstream.
`parse_insn()` is dereferencing the user-space pointer `insn->data`
directly when handling the `INSN_INTTRIG` comedi instruction. It
shouldn't be using `insn->data` at all; it should be using the separate
`data` pointer passed to the function. Fix it.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1878957b4 upstream.
Correct a direct dereference of I/O memory to use an appropriate I/O
memory access function. Note that the pointer being dereferenced is not
currently tagged with `__iomem` but I plan to correct that for 3.7.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b655c2c478 upstream.
`s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which
is a pointer to user memory. It should be dereferencing the separate
`data` parameter that points to a copy of the data in kernel memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40fe4f8967 upstream.
softsynth_read() reads a character at a time from the init string;
when it finds the null terminator it sets the initialized flag but
then repeats the last character.
Additionally, if the read() buffer is not big enough for the init
string, the next read() will start reading from the beginning again.
So the caller may never progress to reading anything else.
Replace the simple initialized flag with the current position in
the init string, carried over between calls. Switch to reading
real data once this reaches the null terminator.
(This assumes that the length of the init string can't change, which
seems to be the case. Really, the string and position belong together
in a per-file private struct.)
Tested-by: Samuel Thibault <sthibault@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c638eb2872 upstream.
The three Pantech devices UML190 (106c:3716), UML290 (106c:3718) and
P4200 (106c:3721) all use the same subclasses to identify vendor
specific functions. Replace the existing device specific entries
with generic vendor matching, adding support for the P4200.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba1cbad93d upstream.
The access beyond the end of device BUG_ON that was introduced to
dm_request_fn via commit 29e4013de7 ("dm: implement
REQ_FLUSH/FUA support for request-based dm") was an overly
drastic (but simple) response to this situation.
I have received a report that this BUG_ON was hit and now think
it would be better to use dm_kill_unmapped_request() to fail the clone
and original request with -EIO.
map_request() will assign the valid target returned by
dm_table_find_target to tio->ti. But when the target
isn't valid tio->ti is never assigned (because map_request isn't
called); so add a check for tio->ti != NULL to dm_done().
Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8110e16d42 upstream.
IBM reported a deadlock in select_parent(). This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.
There are two cases when the traversal needs to be restarted:
1) concurrent d_move(); this can only happen when not already locked,
since taking rename_lock protects against concurrent d_move().
2) racing with final d_put() on child just at the moment of ascending
to parent; rename_lock doesn't protect against this rare race, so it
can happen when already locked.
Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held. This patch fixes all three callers of
try_to_ascend().
IBM reported that the deadlock is gone with this patch.
[ I rewrote the patch to be smaller and just do the "goto again" if the
lock was already held, but credit goes to Miklos for the real work.
- Linus ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a76d7bd96d upstream.
The open-coded mutex implementation for ARMv6+ cores suffers from a
severe lack of barriers, so in the uncontended case we don't actually
protect any accesses performed during the critical section.
Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
code but optimised to remove a branch instruction, as the mutex fastpath
was previously inlined. Now that this is executed out-of-line, we can
reuse the atomic access code for the locking (in fact, we use the xchg
code as this produces shorter critical sections).
This patch uses the generic xchg based implementation for mutexes on
ARMv6+, which introduces barriers to the lock/unlock operations and also
has the benefit of removing a fair amount of inline assembly code.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Shan Kang <kangshan0910@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d00dc2611 upstream.
This patch (as1607) fixes a race that can occur if a USB host
controller is removed while a process is reading the
/sys/kernel/debug/usb/devices file.
The usb_device_read() routine uses the bus->root_hub pointer to
determine whether or not the root hub is registered. The is not a
valid test, because the pointer is set before the root hub gets
registered and remains set even after the root hub is unregistered and
deallocated. As a result, usb_device_read() or usb_device_dump() can
access freed memory, causing an oops.
The patch changes the test to use the hcd->rh_registered flag, which
does get set and cleared at the appropriate times. It also makes sure
to hold the usb_bus_list_lock mutex while setting the flag, so that
usb_device_read() will become aware of new root hubs as soon as they
are registered.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a129a7c845 upstream.
When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.
Fix this by always forcing vm86 to look like ring 3.
[ Backport to 3.0 notes:
Things changed there slightly:
- move mce_get_rip() up. It fills up m->cs and m->ip values which
are evaluated in mce_severity(). Therefore move it up right before
the mce_severity call. This seem to be another bug in 3.0?
- Place the backport (fix m->cs in V86 case) to where m->cs gets
filled which is mce_get_rip() in 3.0
]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5740f4b2c upstream.
try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.
Here is the sequence how it occurs:
----------------------------------+-----------------------------
|
CPU A | CPU B
----------------------------------+-----------------------------
TASK A calls exit()....
do_exit()
exit_mm()
down_read(mm->mmap_sem);
rwsem_down_failed_common()
set TASK_UNINTERRUPTIBLE
set waiter.task <= task A
list_add to sem->wait_list
:
raw_spin_unlock_irq()
(I/O interruption occured)
__rwsem_do_wake(mmap_sem)
list_del(&waiter->list);
waiter->task = NULL
wake_up_process(task A)
try_to_wake_up()
(task is still
TASK_UNINTERRUPTIBLE)
p->on_rq is still 1.)
ttwu_do_wakeup()
(*A)
:
(I/O interruption handler finished)
if (!waiter.task)
schedule() is not called
due to waiter.task is NULL.
tsk->state = TASK_RUNNING
:
check_preempt_curr();
:
task->state = TASK_DEAD
(*B)
<--- set TASK_RUNNING (*C)
schedule()
(exit task is running again)
BUG_ON() is called!
--------------------------------------------------------
The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.
HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....
If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.
By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 067aa4815a upstream.
Commit 178db7d3, "spi: Fix device unregistration when unregistering
the bus master", changed spi device initialization of dev.parent pointer
to be the master's device pointer instead of his parent.
This introduced a bug in spi-fsl-spi, since its usage of spi device
pointer was not updated accordingly. This was later fixed by commit
5039a86, "spi/mpc83xx: fix NULL pdata dereference bug", but it missed
another spot on fsl_spi_cs_control function where we also need to update
usage of spi device pointer. This change address that.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Alfredo Capella <alfredo.capella@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5039a86973 upstream.
Commit 178db7d3, "spi: Fix device unregistration when unregistering
the bus master", changed device initialization to be children of the
bus master, not children of the bus masters parent device. The pdata
pointer used in fsl_spi_chipselect must updated to reflect the changed
initialization.
Signed-off-by: Kenth Eriksson <kenth.eriksson@transmode.com>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Alfredo Capella <alfredo.capella@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78b495c39a upstream
UBI was mistakingly using 'kfree()' instead of 'kmem_cache_free()' when
freeing "attach eraseblock" structures in vtbl.c. Thankfully, this happened
only when we were doing auto-format, so many systems were unaffected. However,
there are still many users affected.
It is strange, but the system did not crash and nothing bad happened when
the SLUB memory allocator was used. However, in case of SLOB we observed an
crash right away.
This problem was introduced in 2.6.39 by commit
"6c1e875 UBI: add slab cache for ubi_scan_leb objects"
Reported-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4188bba0e9 upstream.
The driver should not try to switch to 1.8V when the SD 3.0 host
controller does not have any UHS capabilities bits set (SDR50, DDR50
or SDR104). See page 72 of "SD Specifications Part A2 SD Host
Controller Simplified Specification Version 3.00" under
"1.8V Signaling Enable". Instead of setting SDR12 and SDR25 in the host
capabilities data structure for all V3.0 host controllers, only set them
if SDR104, SDR50 or DDR50 is set in the host capabilities register. This
will prevent the switch to 1.8V later.
Signed-off-by: Al Cooper <acooper@gmail.com>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Acked-by: Philip Rakity <prakity@marvell.com>
Acked-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2815f68da upstream.
Here is Essential conditions to indicate Version 3.00 Card
(SD_SPEC=2 and SD_SPEC3=1) :
(1) The card shall support CMD6
(2) The card shall support CMD8
(3) The card shall support CMD42
(4) User area capacity shall be up to 2GB (SDSC) or 32GB (SDHC)
User area capacity shall be more than or equal to 32GB and
up to 2TB (SDXC)
(5) Speed Class shall be supported (SDHC or SDXC)
So even if SD card doesn't support any of the newly defined
UHS-I bus speed mode, it can advertise itself as SD3.0 cards
as long as it supports all the essential conditions of
SD3.0 cards. Given this, these type of cards should atleast
run in High Speed mode @50MHZ if it supports HS.
But current initialization sequence for SD3.0 cards is
such that these non-UHS-I SD3.0 cards runs in Default
Speed mode @25MHz.
This patch makes sure that these non-UHS-I SD3.0 cards run
in High Speed Mode @50MHz.
Tested this patch with SanDisk Extreme SDHC 8GB Class 10 card.
Reported-by: "Hiremath, Vaibhav" <hvaibhav@ti.com>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
commit cc37f75a9f upstream.
A Squashfs filesystem containing nothing but an empty directory,
although unusual and ultimately pointless, is still valid.
The directory_table >= next_table sanity check rejects these
filesystems as invalid because the directory_table is empty and
equal to next_table.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38bd2a1ac7 upstream.
Parity Setting value is reverse.
E.G. In case of setting ODD parity, EVEN value is set.
This patch inverts "if" condition.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9539dfb7ac upstream.
Rx Error interrupt(E.G. parity error) is not enabled.
So, when parity error occurs, error interrupt is not occurred.
As a result, the received data is not dropped.
This patch adds enable/disable rx error interrupt code.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 720bb6436f upstream.
For some reason, when the lirc daemon learns that a usb remote control
has been unplugged, it wants to read the sysfs attributes of the
disappearing device. This is useful for uncovering transient
inconsistencies, but less so for keeping the system running when such
inconsistencies exist.
Under some circumstances (like every time I unplug my dvb stick from
my laptop), lirc catches an rc_dev whose raw event handler has been
removed (presumably by ir_raw_event_unregister), and proceeds to
interrogate the raw protocols supported by the NULL pointer.
This patch avoids the NULL dereference, and ignores the issue of how
this state of affairs came about in the first place.
Version 2 incorporates changes recommended by Mauro Carvalho Chehab
(-ENODEV instead of -EINVAL, and a signed-off-by).
Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cee58483cf upstream
Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526c ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.
Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.
This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.
Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e8b14526c upstream
Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).
Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.
So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.
Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bec4596b4e upstream.
drop_monitor calls several sleeping functions while in atomic context.
BUG: sleeping function called from invalid context at mm/slub.c:943
in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
Call Trace:
[<ffffffff810697ca>] __might_sleep+0xca/0xf0
[<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
[<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
[<ffffffff815343fb>] __alloc_skb+0x4b/0x230
[<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
[<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
[<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
[<ffffffff810568e0>] process_one_work+0x130/0x4c0
[<ffffffff81058249>] worker_thread+0x159/0x360
[<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
[<ffffffff8105d403>] kthread+0x93/0xa0
[<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
[<ffffffff816be6d0>] ? gs_change+0xb/0xb
Rework the logic to call the sleeping functions in right context.
Use standard timer/workqueue api to let system chose any cpu to perform
the allocation and netlink send.
Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
use mod_timer() to wait 1/10 second before next try.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fdcfa1284 upstream.
I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error. drop monitor maintains a per cpu structure,
that gets initalized from a single cpu. Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself . Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu. If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.
This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from. In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu
Tested successfully by myself.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3885ca785a upstream.
Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections. Specifically, its possible to replace data->skb while its
being written. This patch corrects that by making data->skb an rcu protected
variable. That will prevent it from being overwritten while a tracepoint is
modifying it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cde2e9a651 upstream.
Eric Dumazet pointed out this warning in the drop_monitor protocol to me:
[ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[ 38.352582] Call Trace:
[ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50
[ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b
It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b71ca6bce upstream.
For one, the driver device pointer needs to be filled in, or the lirc core
will refuse to load the driver. And we really need to wire up all the
platform_device bits. This has been tested via the lirc sourceforge tree
and verified to work, been sitting there for months, finally getting
around to sending it. :\
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8323f26ce3 upstream.
Stefan reported a crash on a kernel before a3e5d1091c ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.
Looking at all that I found a lack of serialization and plain
wrong comments.
The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.
Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4686c71a9 upstream.
Commit d640113fe8 introduced a regression on SMP
systems where the processor core with ACPI id zero is disabled
(typically should be the case because of hyperthreading).
The regression got spread through stable kernels.
On 3.0.X it got introduced via 3.0.18.
Such platforms may be rare, but do exist.
Look out for a disabled processor with acpi_id 0 in dmesg:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] disabled)
This problem has been observed on a:
HP Proliant BL280c G6 blade
This patch restricts the introduced workaround to platforms
with nr_cpu_ids <= 1.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c531077f40 upstream.
When using my Seagate FreeAgent GoFlex eSATAp external disk enclosure,
interface errors are always seen until 1.5Gbps is negotiated [1]. This
occurs using any disk in the enclosure, and when the disk is connected
directly with a generic passive eSATAp cable, we see stable 3Gbps
operation as expected.
Blacklist 3Gbps mode to avoid dataloss and the ~30s delay bus reset
and renegotiation incurs.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06b6a1cf6e upstream.
Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.
And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.
How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.
1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c
2 run ./rds_server on one console
3 run ./rds_client on another console
4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor
/***************** rds_client.c ********************/
int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;
sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}
memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}
memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}
printf("client send data:%s\n", recvBuffer);
memset(recvBuffer, '\0', 128);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("receive data from server:%s\n", recvBuffer);
close(sock_fd);
return 0;
}
/***************** rds_server.c ********************/
int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;
sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}
memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}
printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;
/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}
close(sock_fd);
return 0;
}
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Fixed upstream by commits 2955b47d2c and
a4683487f9 from Dan Williams, but they are much
more intrusive than this tiny fix, according to Andrew - gregkh]
This patch tries to fix a dead loop in async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.
void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(&async_running) || !
list_empty(&async_pending));
}
async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.
However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.
It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.
Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 734b65417b upstream.
This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.
With thanks to Eric Dumazet for catching a bug.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fa6535faf upstream.
As pointed out by Gustavo and Marcel, all Apple-specific Broadcom
devices seen so far have the same interface class, subclass and
protocol numbers. This patch adds an entry which matches all of them,
using the new USB_VENDOR_AND_INTERFACE_INFO() macro.
In particular, this patch adds support for the MacBook Pro Retina
(05ac:8286), which is not in the present list.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Shea Levy <shea@shealevy.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92c385f46b upstream.
Many Broadcom devices has a vendor specific devices class, with this rule
we match all existent and future controllers with this behavior.
We also remove old rules to that matches product id for Broadcom devices.
Tested-by: John Hommel <john.hommel@hp.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96e65306b8 upstream.
The compiler may compile the following code into TWO write/modify
instructions.
worker->flags &= ~WORKER_UNBOUND;
worker->flags |= WORKER_REBIND;
so the other CPU may temporarily see worker->flags which doesn't have
either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
prematurely.
Fix it by using single explicit assignment via ACCESS_ONCE().
Because idle workers have another WORKER_NOT_RUNNING flag, this bug
doesn't exist for them; however, update it to use the same pattern for
consistency.
tj: Applied the change to idle workers too and updated comments and
patch description a bit.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b98b601672 upstream.
Clear Audio Enable bit to trigger unsolicated event to notify Audio
Driver part the HDMI hot plug change. The patch fixed the bug when
remove HDMI cable the bit was not cleared correctly.
In intel_hdmi_dpms(), if intel_hdmi->has_audio been true, the "Audio enable bit" will
be set to trigger unsolicated event to notify Alsa driver the change.
intel_hdmi->has_audio will be reset to false from intel_hdmi_detect() after
remove the hdmi cable, here's debug log:
[ 187.494153] [drm:output_poll_execute], [CONNECTOR:17:HDMI-A-1] status updated from 1 to 2
[ 187.525349] [drm:intel_hdmi_detect], HDMI: has_audio = 0
so when comes back to intel_hdmi_dpms(), the "Audio enable bit" will not be cleared. And this
cause the eld infomation and pin presence doesnot update accordingly in alsa driver side.
This patch will also trigger unsolicated event to alsa driver to notify the hot plug event:
[ 187.853159] ALSA sound/pci/hda/patch_hdmi.c:772 HDMI hot plug event: Codec=3 Pin=5 Presence_Detect=0 ELD_Valid=1
[ 187.853268] ALSA sound/pci/hda/patch_hdmi.c:990 HDMI status: Codec=3 Pin=5 Presence_Detect=0 ELD_Valid=0
Signed-off-by: Wang Xingchao <xingchao.wang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3766054fff upstream.
There are some new video switch keys that used by newer machines.
0xA0 - SDSP HDMI only
0xA1 - SDSP LCD + HDMI
0xA2 - SDSP CRT + HDMI
0xA3 - SDSP TV + HDMI
But in Linux, there is no suitable userspace application to handle this,
so, mapping them all to KEY_SWITCHVIDEOMODE.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52e9b39d9a upstream.
There is a more recent APU stepping with a new PCI ID
shipping in the same board by Fujitsu which needs the
same quirk to correctly mark the back plane connectors.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8636a2717 upstream.
So we've had a fair few reports of fbcon handover breakage between
efi/vesafb and i915 surface recently, so I dedicated a couple of
days to finding the problem.
Essentially the last thing we saw was the conflicting framebuffer
message and that was all.
So after much tracing with direct netconsole writes (printks
under console_lock not so useful), I think I found the race.
Thread A (driver load) Thread B (timer thread)
unbind_con_driver -> |
bind_con_driver -> |
vc->vc_sw->con_deinit -> |
fbcon_deinit -> |
console_lock() |
| |
| fbcon_flashcursor timer fires
| console_lock() <- blocked for A
|
|
fbcon_del_cursor_timer ->
del_timer_sync
(BOOM)
Of course because all of this is under the console lock,
we never see anything, also since we also just unbound the active
console guess what we never see anything.
Hopefully this fixes the problem for anyone seeing vesafb->kms
driver handoff.
v1.1: add comment suggestion from Alan.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7838f994b4 upstream.
On many of our larger systems, CPU 0 has had all of its IRQ resources
consumed before XPC loads. Worst cases on machines with multiple 10
GigE cards and multiple IB cards have depleted the entire first socket
of IRQs.
This patch makes selecting the node upon which IRQs are allocated (as
well as all the other GRU Message Queue structures) specifiable as a
module load param and has a default behavior of searching all nodes/cpus
for an available resources.
[akpm@linux-foundation.org: fix build: include cpu.h and module.h]
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58a34de7b1 upstream.
The power.deferred_resume can only be set if the runtime PM status
of device is RPM_SUSPENDING and it should be cleared after its
status has been changed, regardless of whether or not the runtime
suspend has been successful. However, it only is cleared on
suspend failure, while it may remain set on successful suspend and
is happily leaked to rpm_resume() executed in that case.
That shouldn't happen, so if power.deferred_resume is set in
rpm_suspend() after the status has been changed to RPM_SUSPENDED,
clear it before calling rpm_resume(). Then, it doesn't need to be
cleared before changing the status to RPM_SUSPENDING any more,
because it's always cleared after the status has been changed to
either RPM_SUSPENDED (on success) or RPM_ACTIVE (on failure).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f321c26c0 upstream.
For devices whose power.no_callbacks flag is set, rpm_resume()
should return 1 if the device's parent is already active, so that
the callers of pm_runtime_get() don't think that they have to wait
for the device to resume (asynchronously) in that case (the core
won't queue up an asynchronous resume in that case, so there's
nothing to wait for anyway).
Modify the code accordingly (and make sure that an idle notification
will be queued up on success, even if 1 is to be returned).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7dbfb315b2 upstream.
Correct the offset by subtracting 20 from tm_hour before taking the
modulo 12.
[ "Why 20?" I hear you ask. Or at least I did.
Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
included in the RS5C348_HOURS_MASK define. So it's really subtracting
out that bit to get "hour+12". But then because it does things modulo
12, it needs to add the 12 in again afterwards anyway.
This code is confused. It would be much clearer if RS5C348_HOURS_MASK
just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
need to do the silly subtract either.
Whatever. It's all just math, the end result is the same. - Linus ]
Reported-by: James Nute <newten82@gmail.com>
Tested-by: James Nute <newten82@gmail.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bce9c46bf upstream.
ARM recently moved to asm-generic/mutex-xchg.h for its mutex
implementation after the previous implementation was found to be missing
some crucial memory barriers. However, this has revealed some problems
running hackbench on SMP platforms due to the way in which the
MUTEX_SPIN_ON_OWNER code operates.
The symptoms are that a bunch of hackbench tasks are left waiting on an
unlocked mutex and therefore never get woken up to claim it. This boils
down to the following sequence of events:
Task A Task B Task C Lock value
0 1
1 lock() 0
2 lock() 0
3 spin(A) 0
4 unlock() 1
5 lock() 0
6 cmpxchg(1,0) 0
7 contended() -1
8 lock() 0
9 spin(C) 0
10 unlock() 1
11 cmpxchg(1,0) 0
12 unlock() 1
At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.
This patch fixes the problem by ensuring we put the lock into the
contended state if we fail to acquire it on the fastpath, ensuring that
any blocked waiters are woken up when the mutex is released.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50d0206fca upstream.
This patch fixes a particularly nasty bug that was revealed by the ring
expansion patches. The bug has been present since the very beginning of
the xHCI driver history, and could have caused general protection faults
from bad memory accesses.
The first thing to note is that a Set TR Dequeue Pointer command can
move the dequeue pointer to a link TRB, if the canceled or stalled
transfer TD ended just before a link TRB. The function to increment the
dequeue pointer, inc_deq, was written before cancellation and stall
support was added. It assumed that the dequeue pointer could never
point to a link TRB. It would unconditionally increment the dequeue
pointer at the start of the function, check if the pointer was now on a
link TRB, and move it to the top of the next segment if so.
This means that if a Set TR Dequeue Point command moved the dequeue
pointer to a link TRB, a subsequent call to inc_deq() would move the
pointer off the segment and into la-la-land. It would then read from
that memory to determine if it was a link TRB. Other functions would
often call inc_deq() until the dequeue pointer matched some other
pointer, which means this function would quite happily read all of
system memory before wrapping around to the right pointer value.
Often, there would be another endpoint segment from a different ring
allocated from the same DMA pool, which would be contiguous to the
segment inc_deq just stepped off of. inc_deq would eventually find the
link TRB in that segment, and blindly move the dequeue pointer back to
the top of the correct ring segment.
The only reason the original code worked at all is because there was
only one ring segment. With the ring expansion patches, the dequeue
pointer would eventually wrap into place, but the dequeue segment would
be out-of-sync. On the second TD after the dequeue pointer was moved to
a link TRB, trb_in_td() would fail (because the dequeue pointer and
dequeue segment were out-of-sync), and this message would appear:
ERROR Transfer event TRB DMA ptr not part of current TD
This fixes bugzilla entry 4333 (option-based modem unhappy on USB 3.0
port: "Transfer event TRB DMA ptr not part of current TD", "rejecting
I/O to offline device"),
https://bugzilla.kernel.org/show_bug.cgi?id=43333
and possibly other general protection fault bugs as well.
This patch should be backported to kernels as old as 2.6.31. A separate
patch will be created for kernels older than 3.4, since inc_deq was
modified in 3.4 and this patch will not apply.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: James Ettle <theholyettlz@googlemail.com>
Tested-by: Matthew Hall <mhall@mhcomputing.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2963657819 upstream.
For non PCI-based stacks, this function call
usb_disable_xhci_ports(to_pci_dev(hcd->self.controller));
made from xhci_shutdown is not applicable.
Ideally, we wouldn't have any PCI-specific code on
a generic driver such as the xHCI stack, but it looks
like we should just stub usb_disable_xhci_ports() out
for non-PCI devices.
[ balbi@ti.com: slight improvement to commit log ]
This patch should be backported to kernels as old as 3.0, since the
commit it fixes (e95829f474 "xhci: Switch
PPT ports to EHCI on shutdown.") was marked for stable.
Signed-off-by: Moiz Sonasath<m-sonasath@ti.com>
Signed-off-by: Ruchika Kharwar <ruchika@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29d214576f upstream.
On Intel Panther Point chipset USB 3.0 devices show up as
high-speed devices on powerup, but after an s3 cycle they are
correctly recognized as SuperSpeed. At powerup switch the port
to xHCI so that USB 3.0 devices are correctly recognized.
BugLink: http://bugs.launchpad.net/bugs/1000424
This patch should be backported to kernels as old as 3.0, that contain
commit ID 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e955a1cd08 upstream.
My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.
I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.
This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d "USB: xhci: BIOS handoff
and HW initialization."
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 052c7f9ffb upstream.
The intent was to test whether the flag was set.
This patch should be backported to stable kernels as old as 3.0, since
it fixes a bug in commit e95829f474 "xhci:
Switch PPT ports to EHCI on shutdown.", which was marked for stable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a96874a2a9 upstream.
With a previous patch to enable the EHCI/XHCI port switching, it switches
all the available ports.
The assumption is not correct because the BIOS may expect some ports
not switchable by the OS.
There are two more registers that contains the information of the switchable
and non-switchable ports.
This patch adds the checking code for the two register so that only the
switchable ports are altered.
This patch should be backported to kernels as old as 3.0, that contain
commit ID 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92fc7a8b0f upstream.
This patch (as1604) adds a CONFIG_INTF_STRINGS quirk for the Joss
infrared touchboard device. The device doesn't like to be asked for
its interface strings.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: adam ? <adam3337@wp.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dafc4f7be1 upstream.
Commit b69cc67205 added support for the E-861. After acquiring a C-867, I
realised that every Physik Instrumente's device has a different PID. They are
listed in the Windows device driver's .inf file. So here are all PIDs for the
current (and probably future) USB devices from Physik Instrumente.
Compiled, but only actually tested on the E-861 and C-867.
Signed-off-by: Éric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f08dea7348 upstream.
The Microchip vid:pid 04d8:000a is used for their CDC ACM
demo firmware application. This is a device with a single
function conforming to the CDC ACM specification and with
the intention of demonstrating CDC ACM class firmware and
driver interaction. The demo is used on a number of
development boards, and may also be used unmodified by
vendors using Microchip hardware.
Some vendors have re-used this vid:pid for other types of
firmware, emulating FTDI chips. Attempting to continue to
support such devices without breaking class based
applications that by matching on interface
class/subclass/proto being ff/ff/00. I have no information
about the actual device or interface descriptors, but this
will at least make the proper CDC ACM devices work again.
Anyone having details of the offending device's descriptors
should update this entry with the details.
Reported-by: Florian Wöhrl <fw@woehrl.biz>
Reported-by: Xiaofan Chen <xiaofanc@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d037774b4 upstream.
There is a possibility of QH overlay region having reference to a stale
qTD pointer during unlink.
Consider an endpoint having two pending qTD before unlink process begins.
The endpoint's QH queue looks like this.
qTD1 --> qTD2 --> Dummy
To unlink qTD2, QH is removed from asynchronous list and Asynchronous
Advance Doorbell is programmed. The qTD1's next qTD pointer is set to
qTD2'2 next qTD pointer and qTD2 is retired upon controller's doorbell
interrupt. If QH's current qTD pointer points to qTD1, transfer overlay
region still have reference to qTD2. But qtD2 is just unlinked and freed.
This may cause EHCI system error. Fix this by updating qTD next pointer
in QH overlay region with the qTD next pointer of the current qTD.
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01913b49cf upstream.
If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last
decode_* call must have succeeded.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 872ece86ea upstream.
Apparently, am-utils is still using the legacy binary mountdata interface,
and is having trouble parsing /proc/mounts due to the 'port=' field being
incorrectly set.
The following patch should fix up the regression.
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3f52af3e0 upstream.
When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd764 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).
At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.
Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881
Reported-by: Andi Kleen <andi@firstfloor.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a396e10019 upstream.
We need to program the rfkill switch GPIO pin direction to input at
device initialization time, not only when the interface is brought up.
Doing this only when the interface is brought up could lead to rfkill
detecting the switch is turned on erroneously and inability to create
the interface and bringing it up.
Reported-and-tested-by: Andreas Messer <andi@bastelmap.de>
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Ivo Van Doorn <ivdoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c456797681 upstream.
Avoid the construction of a malformed DMA request sent to
the DMA controller.
Log message is for debug only because this condition is unlikely to
append and may only trigger at driver development time.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a85d0d7f34 upstream.
When call_crda() is called we kick off a witch hunt search
for the same regulatory domain on our internal regulatory
database and that work gets kicked off on a workqueue, this
is done while the cfg80211_mutex is held. If that workqueue
kicks off it will first lock reg_regdb_search_mutex and
later cfg80211_mutex but to ensure two CPUs will not contend
against cfg80211_mutex the right thing to do is to have the
reg_regdb_search() wait until the cfg80211_mutex is let go.
The lockdep report is pasted below.
cfg80211: Calling CRDA to update world regulatory domain
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.8 #3 Tainted: G O
-------------------------------------------------------
kworker/0:1/235 is trying to acquire lock:
(cfg80211_mutex){+.+...}, at: [<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
but task is already holding lock:
(reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (reg_regdb_search_mutex){+.+...}:
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<81645778>] is_world_regdom+0x9f8/0xc74 [cfg80211]
-> #1 (reg_mutex#2){+.+...}:
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<8164539c>] is_world_regdom+0x61c/0xc74 [cfg80211]
-> #0 (cfg80211_mutex){+.+...}:
[<800a77b8>] __lock_acquire+0x10d4/0x17bc
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
other info that might help us debug this:
Chain exists of:
cfg80211_mutex --> reg_mutex#2 --> reg_regdb_search_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(reg_regdb_search_mutex);
lock(reg_mutex#2);
lock(reg_regdb_search_mutex);
lock(cfg80211_mutex);
*** DEADLOCK ***
3 locks held by kworker/0:1/235:
#0: (events){.+.+..}, at: [<80089a00>] process_one_work+0x230/0x460
#1: (reg_regdb_work){+.+...}, at: [<80089a00>] process_one_work+0x230/0x460
#2: (reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]
stack backtrace:
Call Trace:
[<80290fd4>] dump_stack+0x8/0x34
[<80291bc4>] print_circular_bug+0x2ac/0x2d8
[<800a77b8>] __lock_acquire+0x10d4/0x17bc
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
Reported-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e21093ef6f upstream.
The Revision 1.0 Janz CMOD-IO Carrier Board does not have support for
the reset registers. To support older hardware, the code is changed to
use the hardware reset register on the Janz VMOD-ICAN3 hardware itself.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8669cf6793 upstream.
On Toshiba Satellite C850D, the touchpad and the keyboard might randomly
not work at boot. Preventing MUX mode activation solves this issue.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1e5b7e425 upstream.
This patch zeroes the SCTLR.TRE bit prior to setting the mapping as
cacheable for ARMv7 cores in the decompressor, ensuring that the
memory region attributes are obtained from the C and B bits, not from
the page tables.
Cc: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No upstream commit as this is a merge error in the 3.0 tree.
ARM_ERRATA_764369 and PL310_ERRATA_769419 do not appear in config menu in
stable 3.0.y tree.
This is because backported patch for arm/arm/Kconfig applied wrong place.
This patch solves it.
Signed-off-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 308b135e4f upstream.
kdump can be interrupted by watchdog timer when the timer is left
activated on the crash kernel. Changed the hpwdt driver to disable
watchdog timer at boot-time. This assures that watchdog timer is
disabled until /dev/watchdog is opened, and prevents watchdog timer
to be left running on the crash kernel.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Lisa Mitchell <lisa.mitchell@hp.com>
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 256d0eaac8 upstream.
If a command status of CMD_PROTOCOL_ERR is received, this
information should be conveyed to the SCSI mid layer, not
dropped on the floor. CMD_PROTOCOL_ERR may be received
from the Smart Array for any commands destined for an external
RAID controller such as a P2000, or commands destined for tape
drives or CD/DVD-ROM drives, if for instance a cable is
disconnected. This mostly affects multipath configurations, as
disconnecting a cable on a non-multipath configuration is not
going to do anything good regardless of whether CMD_PROTOCOL_ERR
is handled correctly or not. Not handling CMD_PROTOCOL_ERR
correctly in a multipath configaration involving external RAID
controllers may cause data corruption, so this is quite a serious
bug. This bug should not normally cause a problem for direct
attached disk storage.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d653220711 upstream.
This patch fixes the following kernel panic invoked by uninitialized fields
in the chip initialization for the 1G bnx2 iSCSI offload.
One of the bits in the chip initialization is being used by the latest
firmware to control overflow packets. When this control bit gets enabled
erroneously, it would ultimately result in a bad packet placement which would
cause the bnx2 driver to dereference a NULL ptr in the placement handler.
This can happen under certain stress I/O environment under the Linux
iSCSI offload operation.
This change only affects Broadcom's 5709 chipset.
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
[<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2
RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216
RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000
RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220
RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000
R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010
R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558
FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820)
Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520
ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20
0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8
Call Trace:
<IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b
[<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231
[<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1
[<ffffffff800125a0>] __do_softirq+0x89/0x133
[<ffffffff8005e30c>] call_softirq+0x1c/0x28
[<ffffffff8006d5de>] do_softirq+0x2c/0x7d
[<ffffffff8006d46e>] do_IRQ+0xee/0xf7
[<ffffffff8005d625>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341
[<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341
[<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341
[<ffffffff80049560>] cpu_idle+0x95/0xb8
[<ffffffff80078b1c>] start_secondary+0x479/0x488
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10cce6d8b5 upstream.
This patch checks whether HBA is SAS2008 B0 controller.
if it is a SAS2008 B0 controller then it use IO-APIC interrupt instead of MSIX,
as SAS2008 B0 controller doesn't support MSIX interrupts.
[jejb: fix whitespace problems]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f0ecb907d upstream.
The quirk introduced with commit
00250ec909 (hwmon: fam15h_power: fix
bogus values with current BIOSes) is not only required during driver
load but also when system resumes from suspend. The BIOS might set the
previously recommended (but unsuitable) initilization value for the
running average range register during resume.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d54db795d upstream.
The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.
In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:
"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.
This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
We have this dump then:
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
Scanning NUMA topology in Northbridge 24
Number of physical nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000040000000
Node 1 MemBase 0000000040000000 Limit 0000000138000000
Node 2 MemBase 0000000138000000 Limit 00000001f8000000
Node 3 MemBase 00000001f8000000 Limit 0000000238000000
Initmem setup node 0 0000000000000000-0000000040000000
NODE_DATA [000000003ffd9000 - 000000003fffffff]
Initmem setup node 1 0000000040000000-0000000138000000
NODE_DATA [0000000137fd9000 - 0000000137ffffff]
Initmem setup node 2 0000000138000000-00000001f8000000
NODE_DATA [00000001f095e000 - 00000001f0984fff]
Initmem setup node 3 00000001f8000000-0000000238000000
Cannot find 159744 bytes in node 3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
.. snip..
[<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
[<ffffffff81d23348>] sparse_init+0xe4/0x25a
[<ffffffff81d16840>] paging_init+0x13/0x22
[<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[<ffffffff81683954>] ? printk+0x3c/0x3e
[<ffffffff81d01a38>] start_kernel+0xe5/0x468
[<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
"
so we just disable NUMA scanning by setting numa_off=1.
Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Acked-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f14851af0e upstream.
There may be a bug when registering section info. For example, on my
Itanium platform, the pfn range of node0 includes the other nodes, so
other nodes' section info will be double registered, and memmap's page
count will equal to 3.
node0: start_pfn=0x100, spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
node1: start_pfn=0x80000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x080000-0x100000
node2: start_pfn=0x100000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x100000-0x180000
node3: start_pfn=0x180000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x180000-0x200000
free_all_bootmem_node()
register_page_bootmem_info_node()
register_page_bootmem_info_section()
When hot remove memory, we can't free the memmap's page because
page_count() is 2 after put_page_bootmem().
sparse_remove_one_section()
free_section_usemap()
free_map_bootmem()
put_page_bootmem()
[akpm@linux-foundation.org: add code comment]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05cf96398e upstream.
I found following definition in include/linux/memory.h, in my IA64
platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
will be 0.
#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)
Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
This will cause wrong system memory infomation in sysfs.
I think it should be:
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
And "echo offline > memory0/state" will cause following call trace:
kernel BUG at mm/memory_hotplug.c:885!
sh[6455]: bugcheck! 0 [1]
Pid: 6455, CPU 0, comm: sh
psr : 0000101008526030 ifs : 8000000000000fa4 ip : [<a0000001008c40f0>] Not tainted (3.6.0-rc1)
ip is at offline_pages+0x210/0xee0
Call Trace:
show_stack+0x80/0xa0
show_regs+0x640/0x920
die+0x190/0x2c0
die_if_kernel+0x50/0x80
ia64_bad_break+0x3d0/0x6e0
ia64_native_leave_kernel+0x0/0x270
offline_pages+0x210/0xee0
alloc_pages_current+0x180/0x2a0
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cab32f39dc upstream.
The MCP2515 has a silicon bug causing repeated frame transmission, see section
5 of MCP2515 Rev. B Silicon Errata Revision G (March 2007).
Basically, setting TXBnCTRL.TXREQ in either SPI mode (00 or 11) will eventually
cause the bug. The workaround proposed by Microchip is to use mode 00 and send
a RTS command on the SPI bus to initiate the transmission.
Signed-off-by: Benoît Locher <Benoit.Locher@skf.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dcebaa9a0 upstream.
On some platforms, bootloaders are known to do some interesting RTC
programming. Without going into the obscurities as to why this may be
the case, suffice it to say the the driver should not make any
assumptions about the state of the RTC when the driver loads. In
particular, the driver probe should be sure that all interrupts are
disabled until otherwise programmed.
This was discovered when finding bursty I2C traffic every second on
Overo platforms. This I2C overhead was keeping the SoC from hitting
deep power states. The cause was found to be the RTC firing every
second on the I2C-connected TWL PMIC.
Special thanks to Felipe Balbi for suggesting to look for a rogue driver
as the source of the I2C traffic rather than the I2C driver itself.
Special thanks to Steve Sakoman for helping track down the source of the
continuous RTC interrups on the Overo boards.
Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Felipe Balbi <balbi@ti.com>
Tested-by: Steve Sakoman <steve@sakoman.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ba8f2d593 upstream.
The heuristic method for buddy has been introduced since commit
43506fad21 ("mm/page_alloc.c: simplify calculation of combined index
of adjacent buddy lists"). But the page address of higher page's buddy
was wrongly calculated, which will lead page_is_buddy to fail for ever.
IOW, the heuristic method would be disabled with the wrong page address
of higher page's buddy.
Calculating the page address of higher page's buddy should be based
higher_page with the offset between index of higher page and index of
higher page's buddy.
Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 667a5313ec upstream.
commit 27a7b260f7
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
changed 0.90 metadata handling to truncated size to 4TB as that is
all that 0.90 can record.
However for RAID0 and Linear, 0.90 doesn't need to record the size, so
this truncation is not needed and causes working arrays to become too small.
So avoid the truncation for RAID0 and Linear
This bug was introduced in 3.1 and is suitable for any stable kernels
from then onwards.
As the offending commit was tagged for 'stable', any stable kernel
that it was applied to should also get this patch. That includes
at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for
providing that list).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67a806d949 upstream.
The following build error occurred during an alpha build:
net/core/sock.c:274:36: error: initializer element is not constant
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60e233a566 upstream.
Fengguang Wu <fengguang.wu@intel.com> writes:
> After the __devinit* removal series, I can still get kernel panic in
> show_uevent(). So there are more sources of bug..
>
> Debug patch:
>
> @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device
> goto out;
>
> /* copy keys to file */
> - for (i = 0; i < env->envp_idx; i++)
> + dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name);
> + for (i = 0; i < env->envp_idx; i++) {
> + printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]);
> count += sprintf(&buf[count], "%s\n", env->envp[i]);
> + }
>
> Oops message, the env[] is again not properly initilized:
>
> [ 44.068623] input input0: uevent 61 env[805306368]: input0/.../input0
> [ 44.069552] uevent 0 env[0]: (null)
This is a completely different CONFIG_HOTPLUG problem, only
demonstrating another reason why CONFIG_HOTPLUG should go away. I had a
hard time trying to disable it anyway ;-)
The problem this time is lots of code assuming that a call to
add_uevent_var() will guarantee that env->buflen > 0. This is not true
if CONFIG_HOTPLUG is unset. So things like this end up overwriting
env->envp_idx because the array index is -1:
if (add_uevent_var(env, "MODALIAS="))
return -ENOMEM;
len = input_print_modalias(&env->buf[env->buflen - 1],
sizeof(env->buf) - env->buflen,
dev, 0);
Don't know what the best action is, given that there seem to be a *lot*
of this around the kernel. This patch "fixes" the problem for me, but I
don't know if it can be considered an appropriate fix.
[ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to
always be on is the longterm fix, but it's too late for 3.6 and older
kernels to resolve this that way - gregkh ]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74f330bcea upstream.
Since commit 30832ab56 ("mmc: sdhci: Always pass clock request value
zero to set_clock host op") was merged, esdhc_set_clock starts hitting
"if (clock == 0)" where ESDHC_SYSTEM_CONTROL has been operated. This
causes SDHCI card-detection function being broken. Fix the regression
by moving "if (clock == 0)" above ESDHC_SYSTEM_CONTROL operation.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6fa941d94 upstream.
Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event. Use explicit refcount of its own
instead. Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61ed59ed09 upstream.
Don't zero out bits 15..12 of the data value in `das08jr_ao_winsn()` as
that knobbles the upper three-quarters of the output range for the
'das08jr-16-ao' board.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa209eef3c upstream.
Hi,
This patch fixes a bug with driver failing to negotiate a connection.
The bug was traced to commit
203e4615ee
staging: vt6656: removed custom definitions of Ethernet packet types
In that patch, definitions in include/linux/if_ether.h replaced ones
in tether.h which had both big and little endian definitions.
include/linux/if_ether.h only refers to big endian values, cpu_to_be16
should be used for the correct endian architectures.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ea418b8b2 upstream.
A local static variable was declared as a pointer to a string
constant. We're assigning to the underlying memory, so it
needs to be an array instead.
Signed-off-by: Christopher Brannon <chris@the-brannons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3737e2be50 upstream.
The AK4396 DAC has a linear-scale attentuator, but
sound/pci/ice1712/prodigy_hifi.c used a log scale instead, which is
not quite right. This patch restores the correct scale, borrowing
from the ak4396 code in sound/pci/oxygen/oxygen.c.
Signed-off-by: Matteo Frigo <athena@fftw.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c054ba63a upstream.
This patch fixes a long-standing bug with SCSI overflow handling
where se_cmd->data_length was incorrectly being re-assigned to
the larger CDB extracted allocation length, resulting in a number
of fabric level errors that would end up causing a session reset
in most cases. So instead now:
- Only re-assign se_cmd->data_length durining UNDERFLOW (to use the
smaller value)
- Use existing se_cmd->data_length for OVERFLOW (to use the smaller
value)
This fix has been tested with the following CDB to generate an
SCSI overflow:
sg_raw -r512 /dev/sdc 28 0 0 0 0 0 0 0 9 0
Tested using iscsi-target, tcm_qla2xxx, loopback and tcm_vhost fabric
ports. Here is a bit more detail on each case:
- iscsi-target: Bug with open-iscsi with overflow, sg_raw returns
-3584 bytes of data.
- tcm_qla2xxx: Working as expected, returnins 512 bytes of data
- loopback: sg_raw returns CHECK_CONDITION, from overflow rejection
in transport_generic_map_mem_to_cmd()
- tcm_vhost: Same as loopback
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8335eafc28 upstream.
After calling into the lower filesystem to do a rename, the lower target
inode's attributes were not copied up to the eCryptfs target inode. This
resulted in the eCryptfs target inode staying around, rather than being
evicted, because i_nlink was not updated for the eCryptfs inode. This
also meant that eCryptfs didn't do the final iput() on the lower target
inode so it stayed around, as well. This would result in a failure to
free up space occupied by the target file in the rename() operation.
Both target inodes would eventually be evicted when the eCryptfs
filesystem was unmounted.
This patch calls fsstack_copy_attr_all() after the lower filesystem
does its ->rename() so that important inode attributes, such as i_nlink,
are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
and eCryptfs can drop its final reference on the lower inode.
http://launchpad.net/bugs/561129
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72d3eb13b5 upstream.
This netconsole_target_put() is obviously redundant, and it
causes a kernel segfault when removing a bridge device which has
netconsole running on it.
This is caused by:
commit 8d8fc29d02
Author: Amerigo Wang <amwang@redhat.com>
Date: Thu May 19 21:39:10 2011 +0000
netpoll: disable netpoll when enslave a device
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b161dfa693 upstream.
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock. Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.
The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed. This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.
This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.
IBM reported successful test results with this patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55815f7014 upstream.
We already use them for openat() and friends, but fstat() also wants to
be able to use O_PATH file descriptors. This should make it more
directly comparable to the O_SEARCH of Solaris.
Note that you could already do the same thing with "fstatat()" and an
empty path, but just doing "fstat()" directly is simpler and faster, so
there is no reason not to just allow it directly.
See also commit 332a2e1244, which did the same thing for fchdir, for
the same reasons.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2453f5f992 upstream.
If a command completes with a status of CMD_PROTOCOL_ERR, this
information should be conveyed to the SCSI mid layer, not dropped
on the floor. Unlike a similar bug in the hpsa driver, this bug
only affects tape drives and CD and DVD ROM drives in the cciss
driver, and to induce it, you have to disconnect (or damage) a
cable, so it is not a very likely scenario (which would explain
why the bug has gone undetected for the last 10 years.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6889125b8b upstream.
powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed. The
function migrates the kworker back to the original CPU but this is
still broken. Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.
It is unclear why this bug is being reported now. Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change. Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting. Maybe something made
contention on fidvid_mutex more likely? I don't know.
This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one. The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.
stable: ed48ece27c ("workqueue: reimplement work_on_cpu() using
system_wq") should be applied before this; otherwise, the
behavior could be horrible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Tested-by: Duncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed48ece27c upstream.
The existing work_on_cpu() implementation is hugely inefficient. It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.
Now that system_wq can handle concurrent executions, there's no
advantage of doing this. Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.
stable: While this isn't a fix in itself, it's needed to fix a
workqueue related bug in cpufreq/powernow-k8. AFAICS, this
shouldn't break other existing users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acbb219d5f ]
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.
Tested in Linux 3.4.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99469c32f7 ]
Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.
Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 20e1db19db ]
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.
The userspace process usually verifies the legitimate origin in
two ways:
a) Socket credentials. If UID != 0, then the message comes from
some ilegitimate process and the message needs to be dropped.
b) Netlink portID. In general, portID == 0 means that the origin
of the messages comes from the kernel. Thus, discarding any
message not coming from the kernel.
However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.
Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.
This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.
Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 43da5f2e0d ]
The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d8a041b7b ]
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b07f8eb75 ]
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3592aaeb80 ]
The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.
Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 792039c73c ]
The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9344a97296 ]
The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f9432c5ec8 ]
The RFCOMM code fails to initialize the two padding bytes of struct
rfcomm_dev_list_req inserted for alignment before copying it to
userland. Additionally there are two padding bytes in each instance of
struct rfcomm_dev_info. The ioctl() that for disclosures two bytes plus
dev_num times two bytes uninitialized kernel heap memory.
Allocate the memory using kzalloc() to fix this issue.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e15ca9a0ef ]
The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c0c5cfdcd ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e862f1a9b7 ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7f5c3e3a80 ]
Here's a quote of the comment about the BUG macro from asm-generic/bug.h:
Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of. If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.
If you're tempted to BUG(), think again: is completely giving up
really the *only* solution? There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.
In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7364e445f6 ]
Do not leak memory by updating pointer with potentially NULL realloc return value.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 696ecdc106 ]
gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.
So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1485348d24 ]
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e6d06f0de ]
Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 30b678d844 ]
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43ca6cb28c upstream.
The old interface is bugged and reads the wrong sensor when retrieving
the reading for the chassis fan (it reads the CPU sensor); the new
interface works fine.
Reported-by: Göran Uddeborg <goeran@uddeborg.se>
Tested-by: Göran Uddeborg <goeran@uddeborg.se>
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 276bdb82de upstream.
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bba3d8c3b3 upstream.
The following build error occured during a parisc build with
swap-over-NFS patches applied.
net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9e67d4837 upstream.
In some cases fuse_retrieve() would return a short byte count if offset was
non-zero. The data returned was correct, though.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 156bddd8e5 upstream.
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c2fc0de1a upstream.
When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).
Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.
Reported-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14216561e1 upstream.
This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
SAT-2 says (section 8.12.2)
if the device is in the stopped state as the result of
processing a START STOP UNIT command (see 9.11), then the SATL
shall terminate the TEST UNIT READY command with CHECK CONDITION
status with the sense key set to NOT READY and the additional
sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
REQUIRED;
mpt2sas internal SATL seems to implement this. The result is very confusing
standby behaviour (using hdparm -y). If you suspend a drive and then send
another command, usually it wakes up. However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands. This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back. If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.
This bit us badly because
commit 85ef06d1d2
Author: Tejun Heo <tj@kernel.org>
Date: Fri Jul 1 16:17:47 2011 +0200
block: flush MEDIA_CHANGE from drivers on close(2)
Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.
The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.
The fix for this is twofold:
1. Set the allow_restart flag so we wake the drive when we see it has been
suspended
2. Return all TEST UNIT READY status directly to the mid layer without any
further error handling which prevents us causing error handling which
may offline the device just because of a media check TUR.
Reported-by: Matthias Prager <linux@matthiasprager.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 338b131a32 upstream.
If the specified max_queue_depth setting is less than the
expected number of internal commands, then driver will calculate
the queue depth size to a negitive number. This negitive number
is actually a very large number because variable is unsigned
16bit integer. So, the driver will ask for a very large amount of
memory for message frames and resulting into oops as memory
allocation routines will not able to handle such a large request.
So, in order to limit this kind of oops, The driver need to set
the max_queue_depth to a scsi mid layer's can_queue value. Then
the overall message frames required for IO is minimum of either
(max_queue_depth plus internal commands) or the IOC global
credits.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd8d6dd43a upstream.
The following patch moves the poll_aen_lock initializer from
megasas_probe_one() to megasas_init(). This prevents a crash when a user
loads the driver and tries to issue a poll() system call on the ioctl
interface with no adapters present.
Signed-off-by: Kashyap Desai <Kashyap.Desai@lsi.com>
Signed-off-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1021cb268b upstream.
If the default DSCR is non zero we set thread.dscr_inherit in
copy_thread() meaning the new thread and all its children will ignore
future updates to the default DSCR. This is not intended and is
a change in behaviour that a number of our users have hit.
We just need to inherit thread.dscr and thread.dscr_inherit from
the parent which ends up being much simpler.
This was found with the following test case:
http://ozlabs.org/~anton/junkcode/dscr_default_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99f347caa4 upstream.
If a device specifies zero endpoints in its interface descriptor,
the kernel oopses in acm_probe(). Even though that's clearly an
invalid descriptor, we should test wether we have all endpoints.
This is especially bad as this oops can be triggered by just
plugging a USB device in.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c4167cbb upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Christian Lamparter <chunkeey@googlemail.com>
CC: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e694d51888 upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Hans de Goede <hdegoede@redhat.com>
CC: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676ce6d5ca upstream.
Commit 91f68c89d8 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.
Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.
I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).
(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)
Revert 91f68c89d8, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix). Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.
And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b68c8e2c3 upstream.
Commit dbf0e4c (PCI: EHCI: fix crash during suspend on ASUS
computers) added a workaround for an ASUS suspend issue related to
USB EHCI and a bug in a number of ASUS BIOSes that attempt to shut
down the EHCI controller during system suspend if its PCI command
register doesn't contain 0 at that time.
It turns out that the same workaround is necessary in the analogous
hibernation code path, so add it.
References: https://bugzilla.kernel.org/show_bug.cgi?id=45811
Reported-and-tested-by: Oleksij Rempel <bug-track@fisher-privat.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1352fde56 upstream.
ath_rx_tasklet() calls ath9k_rx_skb_preprocess() and ath9k_rx_skb_postprocess()
in a loop over the received frames. The decrypt_error flag is
initialized to false
just outside ath_rx_tasklet() loop. ath9k_rx_accept(), called by
ath9k_rx_skb_preprocess(),
only sets decrypt_error to true and never to false.
Then ath_rx_tasklet() calls ath9k_rx_skb_postprocess() and passes
decrypt_error to it.
So, after a decryption error, in ath9k_rx_skb_postprocess(), we can
have a leftover value
from another processed frame. In that case, the frame will not be marked with
RX_FLAG_DECRYPTED even if it is decrypted correctly.
When using CCMP encryption this issue can lead to connection stuck
because of CCMP
PN corruption and a waste of CPU time since mac80211 tries to decrypt an already
deciphered frame with ieee80211_aes_ccm_decrypt.
Fix the issue initializing decrypt_error flag at the begging of the
ath_rx_tasklet() loop.
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f06f00a24d upstream.
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.
Symptoms were data corruption preceded by svc_tcp_sendto logging
something like
kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d10f27a750 upstream.
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.
It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.
Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available. If it finds that there is not
space, it then subtracts the estimate back out.
This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.
The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be1e44441a upstream.
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).
svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY. However,
that means the inconsistency must be repaired before dropping XPT_BUSY.
Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.
Symptoms were a BUG() in svc_tcp_clear_pages.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2140fc0cb upstream.
Refcounting of fsnotify_mark in audit tree is broken. E.g:
refcount
create_chunk
alloc_chunk 1
fsnotify_add_mark 2
untag_chunk
fsnotify_get_mark 3
fsnotify_destroy_mark
audit_tree_freeing_mark 2
fsnotify_put_mark 1
fsnotify_put_mark 0
via destroy_list
fsnotify_mark_destroy -1
This was reported by various people as triggering Oops when stopping auditd.
We could just remove the put_mark from audit_tree_freeing_mark() but that would
break freeing via inode destruction. So this patch simply omits a put_mark
after calling destroy_mark or adds a get_mark before.
The additional get_mark is necessary where there's no other put_mark after
fsnotify_destroy_mark() since it assumes that the caller is holding a reference
(or the inode is keeping the mark pinned, not the case here AFAICS).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Valentin Avram <aval13@gmail.com>
Reported-by: Peter Moody <pmoody@google.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fe33aae0e upstream.
Don't do free_chunk() after fsnotify_add_mark(). That one does a delayed unref
via the destroy list and this results in use-after-free.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47fbf7976e upstream.
Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
disconnected data server) we've been sending layoutreturn calls
while there is potentially still outstanding I/O to the data
servers. The reason we do this is to avoid races between replayed
writes to the MDS and the original writes to the DS.
When this happens, the BUG_ON() in nfs4_layoutreturn_done can
be triggered because it assumes that we would never call
layoutreturn without knowing that all I/O to the DS is
finished. The fix is to remove the BUG_ON() now that the
assumptions behind the test are obsolete.
Reported-by: Boaz Harrosh <bharrosh@panasas.com>
Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0866004304 upstream.
If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.
Reported-by: Yuanming Chen <hikvision_linux@163.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb48c07146 upstream.
Each page mapped in a process's address space must be correctly
accounted for in _mapcount. Normally the rules for this are
straightforward but hugetlbfs page table sharing is different. The page
table pages at the PMD level are reference counted while the mapcount
remains the same.
If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:
kernel BUG at mm/filemap.c:135!
invalid opcode: 0000 [#1] SMP
CPU 22
Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
RIP: 0010:[<ffffffff8112cfed>] [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
Call Trace:
delete_from_page_cache+0x40/0x80
truncate_hugepages+0x115/0x1f0
hugetlbfs_evict_inode+0x18/0x30
evict+0x9f/0x1b0
iput_final+0xe3/0x1e0
iput+0x3e/0x50
d_kill+0xf8/0x110
dput+0xe2/0x1b0
__fput+0x162/0x240
During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte. The logic is if
the PMD page is the same, they must be shared. This assumes that the
sharing is between the parent and child. However, if the sharing is
with a different process entirely then this check fails as in this
diagram:
parent
|
------------>pmd
src_pte----------> data page
^
other--------->pmd--------------------|
^
child-----------|
dst_pte
For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other. This is
possible due to the following style of race.
PROC A PROC B
copy_hugetlb_page_range copy_hugetlb_page_range
src_pte == huge_pte_offset src_pte == huge_pte_offset
!src_pte so no sharing !src_pte so no sharing
(time passes)
hugetlb_fault hugetlb_fault
huge_pte_alloc huge_pte_alloc
huge_pmd_share huge_pmd_share
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
pmd_alloc pmd_alloc
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed. When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.
This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd. This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.
Race identified and changelog written mostly by Mel Gorman.
{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2fa3ccd7b upstream.
Currently we export SOCK_NONBLOCK to user space but that conflicts with
the definition from glibc leading to compilation errors in user programs
(e.g. see Debian bug #658460).
The generic socket.h restricts the definition of SOCK_NONBLOCK to the
kernel, as does the MIPS specific socket.h, so let's do the same on
Alpha.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e68726ff72 upstream.
Userspace can pass weird create mode in open(2) that we canonicalize to
"(mode & S_IALLUGO) | S_IFREG" in vfs_create().
The problem is that we use the uncanonicalized mode before calling vfs_create()
with unforseen consequences.
So do the canonicalization early in build_open_flags().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccf795847a upstream.
Currently the microphone input source is not selectable as while there is
a DAPM widget it's not connected to anything so it won't be properly
instantiated. Add something more correct for the input structure to get
things going, even though it's not hooked into the rest of the routing
map and so won't actually achieve anything except allowing the relevant
register bits to be written.
Reported-by: Christop Fritz <chf.fritz@googlemail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f637c4c940 upstream.
The i.MX cpufreq implementation uses the CPU_FREQ_TABLE helpers,
so it needs to select that code to be built. This problem has
apparently existed since the i.MX cpufreq code was first merged
in v2.6.37.
Building IMX without CPU_FREQ_TABLE results in:
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_exit':
arch/arm/plat-mxc/cpufreq.c:173: undefined reference to `cpufreq_frequency_table_put_attr'
arch/arm/plat-mxc/built-in.o: In function `mxc_set_target':
arch/arm/plat-mxc/cpufreq.c:84: undefined reference to `cpufreq_frequency_table_target'
arch/arm/plat-mxc/built-in.o: In function `mxc_verify_speed':
arch/arm/plat-mxc/cpufreq.c:65: undefined reference to `cpufreq_frequency_table_verify'
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_init':
arch/arm/plat-mxc/cpufreq.c:154: undefined reference to `cpufreq_frequency_table_cpuinfo'
arch/arm/plat-mxc/cpufreq.c:162: undefined reference to `cpufreq_frequency_table_get_attr'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Yong Shen <yong.shen@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b01858c780 upstream.
Commit d670ac019f (ARM: SAMSUNG: DMA Cleanup as per sparse) changed the
prototype of the s3c2410_dma_* functions to use the enum dma_ch instead
of an generic unsigned int.
In the s3c24xx dma.c s3c2410_dma_enqueue seems to have been forgotten,
the other functions there were changed correctly.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 730a8128cd upstream.
Commit 5a783cbc48 ("ARM: 7478/1: errata: extend workaround for erratum
#720789") added workarounds for erratum #720789 to the range TLB
invalidation functions with the observation that the erratum only
affects SMP platforms. However, when running an SMP_ON_UP kernel on a
uniprocessor platform we must take care to preserve the ASID as the
workaround is not required.
This patch ensures that we don't set the ASID to 0 when flushing the TLB
on such a system, preserving the original behaviour with the workaround
disabled.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5f2025ef3 upstream.
Page migration encodes the pfn in the offset field of a swp_entry_t.
For LPAE, we support physical addresses of up to 36 bits (due to
sparsemem limitations with the size of page flags), requiring 24 bits
to represent a pfn. A further 3 bits are used to encode a swp_entry into
a pte, leaving 5 bits for the type field. Furthermore, the core code
defines MAX_SWAPFILES_SHIFT as 5, so the additional type bit does not
get used.
This patch reduces the width of the type field to 5 bits, allowing us
to create up to 31 swapfiles of 64GB each.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47f1204329 upstream.
Swap entries are encoding in ptes such that !pte_present(pte) and
pte_file(pte). The remaining bits of the descriptor are used to identify
the swapfile and offset within it to the swap entry.
When writing such a pte for a user virtual address, set_pte_at
unconditionally sets the nG bit, which (in the case of LPAE) will
corrupt the swapfile offset and lead to a BUG:
[ 140.494067] swap_free: Unused swap offset entry 000763b4
[ 140.509989] BUG: Bad page map in process rs:main Q:Reg pte:0ec76800 pmd:8f92e003
This patch fixes the problem by only setting the nG bit for user
mappings that are actually present.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83957df21d upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c263b92f8 upstream.
* Use the buffer content length as opposed to the total buffer size. This can
be a real problem when using the mos7840 as a usb serial-console as all
kernel output is truncated during boot.
Signed-off-by: Mark Ferrell <mferrell@uplogix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1b5c997e6 upstream.
The ZTE (Vodafone) K5006-Z use the following
interface layout:
00 DIAG
01 secondary
02 modem
03 networkcard
04 storage
Ignoring interface #3 which is handled by the qmi_wwan
driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee6f827df9 upstream.
In this patch, we add new declarations into option.c to support the new
interfaces of Huawei Data Card devices. And at the same time, remove the
redundant declarations from option.c.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d81a5d1956 upstream.
A lot of Broadcom Bluetooth devices provides vendor specific interface
class and we are getting flooded by patches adding new device support.
This change will help us enable support for any other Broadcom with vendor
specific device that arrives in the future.
Only the product id changes for those devices, so this macro would be
perfect for us:
{ USB_VENDOR_AND_INTERFACE_INFO(0x0a5c, 0xff, 0x01, 0x01) }
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Henrik Rydberg <rydberg@bitmath.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e95829f474 upstream.
The Intel desktop boards DH77EB and DH77DF have a hardware issue that
can be worked around by BIOS. If the USB ports are switched to xHCI on
shutdown, the xHCI host will send a spurious interrupt, which will wake
the system. Some BIOS will work around this, but not all.
The bug can be avoided if the USB ports are switched back to EHCI on
shutdown. The Intel Windows driver switches the ports back to EHCI, so
change the Linux xHCI driver to do the same.
Unfortunately, we can't tell the two effected boards apart from other
working motherboards, because the vendors will change the DMI strings
for the DH77EB and DH77DF boards to their own custom names. One example
is Compulab's mini-desktop, the Intense-PC. Instead, key off the
Panther Point xHCI host PCI vendor and device ID, and switch the ports
over for all PPT xHCI hosts.
The only impact this will have on non-effected boards is to add a couple
hundred milliseconds delay on boot when the BIOS has to switch the ports
over from EHCI to xHCI.
This patch should be backported to kernels as old as 3.0, that contain
the commit 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Denis Turischev <denis@compulab.co.il>
Tested-by: Denis Turischev <denis@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22ceac1912 upstream.
The NEC/Renesas 720201 xHCI host controller does not complete its reset
within 250 milliseconds. In fact, it takes about 9 seconds to reset the
host controller, and 1 second for the host to be ready for doorbell
rings. Extend the reset and CNR polling timeout to 10 seconds each.
This patch should be backported to kernels as old as 2.6.31, that
contain the commit 66d4eadd8d "USB: xhci:
BIOS handoff and HW initialization."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Edwin Klein Mentink <e.kleinmentink@zonnet.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cb7df2b2d upstream.
Gary reports that with recent kernels, he notices more xHCI driver
warnings:
xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
We think his Etron xHCI host controller may have the same buggy behavior
as the Fresco Logic xHCI host. When a short transfer is received, the
host will mark the transfer as successfully completed when it should be
marking it with a short completion.
Fix this by turning on the XHCI_TRUST_TX_LENGTH quirk when the Etron
host is discovered. Note that Gary has revision 1, but if Etron fixes
this bug in future revisions, the quirk will have no effect.
This patch should be backported to kernels as old as 2.6.36, that
contain a backported version of commit
1530bbc627 "xhci: Add new short TX quirk
for Fresco Logic host."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Gary E. Miller <gem@rellim.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e731bc9a1 upstream.
Commit 03179fe923 introduced a kmemcheck complaint in
ext4_da_get_block_prep() because we save and restore
ei->i_da_metadata_calc_last_lblock even though it is left
uninitialized in the case where i_da_metadata_calc_len is zero.
This doesn't hurt anything, but silencing the kmemcheck complaint
makes it easier for people to find real bugs.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
(which is marked as a regression).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81ee8fb6b5 upstream.
It seems we can not update the crtc scanout address. After disabling
crtc, update to base address do not take effect after crtc being
reenable leading to at least frame being scanout from the old crtc
base address. Disabling crtc display request lead to same behavior.
So after changing the vram address if we don't keep crtc disabled
we will have the GPU trying to read some random system memory address
with some iommu this will broke the crtc engine and will lead to
broken display and iommu error message.
So to avoid this, disable crtc. For flicker less boot we will need
to avoid moving the vram start address.
This patch should also fix :
https://bugs.freedesktop.org/show_bug.cgi?id=42373
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e0d95c04 upstream.
When the frontend and the backend reside on the same domain, even if we
add pages to the m2p_override, these pages will never be returned by
mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
always fail, so the pfn of the frontend will be returned instead
(resulting in a deadlock because the frontend pages are already locked).
INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-system-i38 D ffff8800cfc137c0 0 1085 1 0x00000000
ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
Call Trace:
[<ffffffff81101ee0>] ? __lock_page+0x70/0x70
[<ffffffff81a0fdd9>] schedule+0x29/0x70
[<ffffffff81a0fe80>] io_schedule+0x60/0x80
[<ffffffff81101eee>] sleep_on_page+0xe/0x20
[<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81101ed7>] __lock_page+0x67/0x70
[<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
[<ffffffff811867e6>] ? bio_add_page+0x36/0x40
[<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
[<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
[<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
[<ffffffff81104027>] generic_file_aio_read+0x737/0x780
[<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
[<ffffffff811038f0>] ? find_get_pages+0x150/0x150
[<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
[<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
[<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
[<ffffffff811998b8>] do_io_submit+0x708/0xb90
[<ffffffff81199d50>] sys_io_submit+0x10/0x20
[<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
The explanation is in the comment within the code:
We need to do this because the pages shared by the frontend
(xen-blkfront) can be already locked (lock_page, called by
do_read_cache_page); when the userspace backend tries to use them
with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
do_blockdev_direct_IO is going to try to lock the same pages
again resulting in a deadlock.
A simplified call graph looks like this:
pygrub QEMU
-----------------------------------------------
do_read_cache_page io_submit
| |
lock_page ext3_direct_IO
|
bio_add_page
|
lock_page
Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
a 'struct page' to have a different MFN (so that it can point to another
guest). It also can easily find out whether another pfn corresponding
to the mfn exists in the m2p, and can set the FOREIGN bit
in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
This allows the backend to perform direct_IO on these pages, but as a
side effect prevents the frontend from using get_user_pages_fast on
them while they are being shared with the backend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb6ccff667 upstream.
Commit 7572777eef attempted to verify that
the total iovec from the client doesn't overflow iov_length() but it
only checked the first element. The iovec could still overflow by
starting with a small element. The obvious fix is to check all the
elements.
The overflow case doesn't look dangerous to the kernel as the copy is
limited by the length after the overflow. This fix restores the
intention of returning an error instead of successfully copying less
than the iovec represented.
I found this by code inspection. I built it but don't have a test case.
I'm cc:ing stable because the initial commit did as well.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e858712185 upstream.
The native 31 bit and the compat behaviour for the mmap system calls differ:
In native 31 bit mode the passed in address for the mmap system call will be
unmodified passed to sys_mmap_pgoff().
In compat mode however the passed in address will be modified with
compat_ptr() which masks out the most significant bit.
The result is that in native 31 bit mode each mmap request (with MAP_FIXED)
will fail where the most significat bit is set, while in compat mode it
may succeed.
This odd behaviour was introduced with d3815898 "[S390] mmap: add missing
compat_ptr conversion to both mmap compat syscalls".
To restore a consistent behaviour accross native and compat mode this
patch functionally reverts the above mentioned commit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6dc463511d upstream.
Bamboo One's with ID of 0x6a and 0x6b were added with correct
indication of 1024 pressure levels but the Graphire packet routine
was only looking at 9 bits. Increased to 10 bits.
This bug caused these devices to roll over to zero pressure at half
way mark.
The other devices using this routine only support 256 or 512 range
and look to fix unused bits at zero.
Signed-off-by: Chris Bagwell <chris@cnpbagwell.com>
Reported-by: Tushant Mirchandani <tushantin@gmail.com>
Reviewed-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
partial of commit 8e8b41f9d8 upstream.
As part of commit 463454b5db ("cfg80211: fix interface
combinations check"), this extra check was introduced:
if ((all_iftypes & used_iftypes) != used_iftypes)
goto cont;
However, most wireless NIC drivers did not advertise ADHOC in
wiphy.iface_combinations[i].limits[] and hence we'll get -EBUSY
when we bring up a ADHOC wlan with commands similar to:
# iwconfig wlan0 mode ad-hoc && ifconfig wlan0 up
In commit 8e8b41f9d8 ("cfg80211: enforce lack of interface
combinations"), the change below fixes the issue:
if (total == 1)
return 0;
But it also introduces other dependencies for stable. For example,
a full cherry pick of 8e8b41f9d8 would introduce additional
regressions unless we also start cherry picking driver specific
fixes like the following:
9b4760e ath5k: add possible wiphy interface combinations
1ae2fc2 mac80211_hwsim: advertise interface combinations
20c8e8d ath9k: add possible wiphy interface combinations
And the purpose of the 'if (total == 1)' is to cover the specific
use case (IBSS, adhoc) that was mentioned above. So we just pick
the specific part out from 8e8b41f9d8 here.
Doing so gives stable kernels a way to fix the change introduced
by 463454b5db, without having to make cherry picks specific to
various NIC drivers.
Signed-off-by: Liang Li <liang.li@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f6fc43e62 upstream.
libertas currently calls cfg80211_disconnected() when it is being
brought down. This causes an event to be allocated, but since the
wdev is already removed from the rdev by the time that the event
processing work executes, the event is never processed or freed.
http://article.gmane.org/gmane.linux.kernel.wireless.general/95666
Fix this leak, and other possible situations, by processing the event
queue when a device is being unregistered. Thanks to Johannes Berg for
the suggestion.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59ee93a528 upstream.
The irq_to_gpio function was removed from the pxa platform
in linux-3.2, and this driver has been broken since.
There is actually no in-tree user of this driver that adds
this platform device, but the driver can and does get enabled
on some platforms.
Without this patch, building ezx_defconfig results in:
drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work':
drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bed491c8d upstream.
The CONFIG_DEFAULT_MMAP_MIN_ADDR was set to 65536 in mxs_defconfig,
this caused severe breakage of userland applications since the upper
limit for ARM is 32768. By default CONFIG_DEFAULT_MMAP_MIN_ADDR is
set to 4096 and can also be changed via /proc/sys/vm/mmap_min_addr
if needed.
Quoting Russell King [1]:
"4096 is also fine for ARM too. There's not much point in having
defconfigs change it - that would just be pure noise in the config
files."
the CONFIG_DEFAULT_MMAP_MIN_ADDR can be removed from the defconfig
altogether.
This problem was introduced by commit cde7c41 (ARM: configs: add
defconfig for mach-mxs).
[1] http://marc.info/?l=linux-arm-kernel&m=134401593807820&w=2
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Wolfgang Denk <wd@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d833352a43 upstream.
If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted. Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log. The final teardown will trigger a
BUG_ON in mm/filemap.c.
This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37. It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages
look like this;
[ ..........] Lots of bad pmd messages followed by this
[ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[ 127.186778] ------------[ cut here ]------------
[ 127.186781] kernel BUG at mm/filemap.c:134!
[ 127.186782] invalid opcode: 0000 [#1] SMP
[ 127.186783] CPU 7
[ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[ 127.186801]
[ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002
[ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[ 127.186821] Stack:
[ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[ 127.186827] Call Trace:
[ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0
[ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50
[ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130
[ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0
[ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230
[ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30
[ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80
[ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360
[ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186870] RSP <ffff8804144b5c08>
[ 127.186871] ---[ end trace 7cbac5d1db69f426 ]---
The bug is a race and not always easy to reproduce. To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.
$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done
On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted. The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON. shutdown will also hang and needs a
hard reset or a sysrq-b.
The basic problem is a race between page table sharing and teardown. For
the most part page table sharing depends on i_mmap_mutex. In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.
Unfortunately it appears to be also insufficient. Consider the following
situation
Process A Process B
--------- ---------
hugetlb_fault shmdt
LockWrite(mmap_sem)
do_munmap
unmap_region
unmap_vmas
unmap_single_vma
unmap_hugepage_range
Lock(i_mmap_mutex)
Lock(mm->page_table_lock)
huge_pmd_unshare/unmap tables <--- (1)
Unlock(mm->page_table_lock)
Unlock(i_mmap_mutex)
huge_pte_alloc ...
Lock(i_mmap_mutex) ...
vma_prio_walk, find svma, spte ...
Lock(mm->page_table_lock) ...
share spte ...
Unlock(mm->page_table_lock) ...
Unlock(i_mmap_mutex) ...
hugetlb_no_page <--- (2)
free_pgtables
unlink_file_vma
hugetlb_free_pgd_range
remove_vma_list
In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down. The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables. At (1) above,
the page tables are not shared yet so it unmaps the PMDs. Process A sets
up page table sharing and at (2) faults a new entry. Process B then trips
up on it in free_pgtables.
This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed. This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA
ineligible for sharing and avoids the race. Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).
This should be treated as a -stable candidate if it is merged.
Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.
==== CUT HERE ====
static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;
unsigned int get_random(unsigned int max)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srandom(tv.tv_usec);
return random() % max;
}
static void play(void *addr, size_t size)
{
unsigned char *start = addr,
*end = start + size,
*a;
start += get_random(size/2);
/* we could itterate on huge pages but let's give it more time. */
for (a = start; a < end; a += 4096)
*a = 0;
}
int main(int argc, char **argv)
{
key_t key = IPC_PRIVATE;
size_t sizeA = nr_huge_page_A * huge_page_size;
size_t sizeB = nr_huge_page_B * huge_page_size;
int shmidA, shmidB;
void *addrA = NULL, *addrB = NULL;
int nr_children = 300, n = 0;
if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
fork_child:
switch(fork()) {
case 0:
switch (n%3) {
case 0:
play(addrA, sizeA);
break;
case 1:
play(addrB, sizeB);
break;
case 2:
break;
}
break;
case -1:
perror("fork:");
break;
default:
if (++n < nr_children)
goto fork_child;
play(addrA, sizeA);
break;
}
shmdt(addrA);
shmdt(addrB);
do {
wait(NULL);
} while (--n > 0);
shmctl(shmidA, IPC_RMID, NULL);
shmctl(shmidB, IPC_RMID, NULL);
return 0;
}
[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9fc3f778a upstream.
Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.
So turn off the possibility of doing that per core and allow it only
system-wide.
This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:
$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...
and disable the interface on the other cores:
$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument
Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.
A more generic fix will follow.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2e7c96af1 upstream.
Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf(). This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.
[ Commit description modified by tytso to remove an extended
advertisement for the RDRAND instruction. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbc96b7594 upstream.
Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.
However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().
Add a big fat comment to rand_initialize() spelling out
this requirement.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5857ccf29 upstream.
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27130f0cc3 upstream.
wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dccf55f4c upstream.
The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.
Only do this on boot since the goal is to seed the pool rather than add
useful entropy.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 330e0a01d5 upstream.
Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2557a303a upstream.
Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present. Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.
The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door. (For example, an
increasing counter encrypted by an AES key known to the NSA.)
It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
--- especially since Bull Mountain is documented to use AES as a
whitener. Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel. Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.
Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.
This way we get almost of the benefits of the HW RNG without any
potential liabilities. The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.
For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6d4947b12 upstream.
If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.
Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2080a67ab upstream.
Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot). This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).
[ Modified by tytso to mix in a timestamp, since there may be some
variability caused by the time needed to detect/configure the hardware
in question. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 902c098a36 upstream.
The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 775f4b297b upstream.
We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.
This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool. Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool. This
assures that /dev/urandom is returning unpredictable data as soon as
possible.
(Based on an original patch by Linus, but significantly modified by
tytso.)
Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44e4360fa3 upstream.
/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes. If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads. Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.
Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e88bdff1c upstream.
If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores. Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf833d0b99 upstream.
We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.
Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".
And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.
So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.
Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd29e568a4 upstream.
If there is an architecture-specific random number generator we use it
to acquire randomness one "long" at a time. We should put these random
words into consecutive words in the result buffer - not just overwrite
the first word again and again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63d7717326 upstream.
Add support for architecture-specific hooks into the kernel-directed
random number generator interfaces. This patchset does not use the
architecture random number generator interfaces for the
userspace-directed interfaces (/dev/random and /dev/urandom), thus
eliminating the need to distinguish between them based on a pool
pointer.
Changes in version 3:
- Moved the hooks from extract_entropy() to get_random_bytes().
- Changes the hooks to inlines.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd4c9260e7 upstream.
The mesh path timer needs to be canceled when
leaving the mesh as otherwise it could fire
after the interface has been removed already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ad3d901bb upstream.
mmu_notifier_release() is called when the process is exiting. It will
delete all the mmu notifiers. But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:
CPU 0 CPU 1
mmu_notifier_release: try_to_unmap:
hlist_del_init_rcu(&mn->hlist);
ptep_clear_flush_notify:
mmu nofifler not found
free page !!!!!!
/*
* At the point, the page has been
* freed, but it is still mapped in
* the secondary MMU.
*/
mn->ops->release(mn, mm);
Then the box is not stable and sometimes we can get this bug:
[ 738.075923] BUG: Bad page state in process migrate-perf pfn:03bec
[ 738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping: (null) index:0x8076
[ 738.075936] page flags: 0x20000000000014(referenced|dirty)
The same issue is present in mmu_notifier_unregister().
We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b74253f784 upstream.
The vivt_flush_cache_{range,page} functions check that the mm_struct
of the VMA being flushed has been active on the current CPU before
performing the cache maintenance.
The gate_vma has a NULL mm_struct pointer and, as such, will cause a
kernel fault if we try to flush it with the above operations. This
happens during ELF core dumps, which include the gate_vma as it may be
useful for debugging purposes.
This patch adds checks to the VIVT cache flushing functions so that VMAs
with a NULL mm_struct are flushed unconditionally (the vectors page may
be dirty if we use it to store the current TLS pointer).
Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Tested-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a783cbc48 upstream.
Commit cdf357f1 ("ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS
operations can broadcast a faulty ASID") replaced by-ASID TLB flushing
operations with all-ASID variants to workaround A9 erratum #720789.
This patch extends the workaround to include the tlb_range operations,
which were overlooked by the original patch.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc32f63453 upstream.
Commit a6bc32b899 ("mm: compaction: introduce sync-light migration for
use by compaction") changed the declaration of migrate_pages() and
migrate_huge_pages().
But it missed changing the argument of migrate_huge_pages() in
soft_offline_huge_page(). In this case, we should call
migrate_huge_pages() with MIGRATE_SYNC.
Additionally, there is a mismatch between type the of argument and the
function declaration for migrate_pages().
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4088ac3a upstream.
efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output. The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.
The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:
panic+0x01ca
do_exit+0x043c
oops_end+0x00a7
no_context+0x0119
__bad_area_nosemaphore+0x0138
bad_area_nosemaphore+0x000e
do_page_fault+0x0321
page_fault+0x0020
reserve_memtype+0x02a1
__ioremap_caller+0x0123
ioremap_nocache+0x0012
efi_setup_pcdp_console+0x002b
setup_arch+0x03a9
start_kernel+0x00d4
x86_64_start_reservations+0x012c
x86_64_start_kernel+0x00fe
This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.
This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.
Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 572d8b3945 upstream.
An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:
chcp D ffff88013870f3d0 0 1325 1324 0x00000004
...
Call Trace:
nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
wake_up_bit+0x20/0x20
copy_from_user+0x18/0x30 [nilfs2]
nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
nilfs_ioctl+0x252/0x61a [nilfs2]
do_page_fault+0x311/0x34c
get_unmapped_area+0x132/0x14e
do_vfs_ioctl+0x44b/0x490
__set_task_blocked+0x5a/0x61
vm_mmap_pgoff+0x76/0x87
__set_current_blocked+0x30/0x4a
sys_ioctl+0x4b/0x6f
system_call_fastpath+0x16/0x1b
thaw D ffff88013870d890 0 1352 1351 0x00000004
...
Call Trace:
rwsem_down_failed_common+0xdb/0x10f
call_rwsem_down_write_failed+0x13/0x20
down_write+0x25/0x27
thaw_super+0x13/0x9e
do_vfs_ioctl+0x1f5/0x490
vm_mmap_pgoff+0x76/0x87
sys_ioctl+0x4b/0x6f
filp_close+0x64/0x6c
system_call_fastpath+0x16/0x1b
where the thaw ioctl deadlocked at thaw_super() when called while chcp was
waiting at nilfs_transaction_begin() called from
nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible.
This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
read mode and then waits for unfreezing in nilfs_transaction_begin(),
whereas thaw_super() locks sb->s_umount in write mode. The locking of
sb->s_umount here was intended to make snapshot mounts and the downgrade
of snapshots to checkpoints exclusive.
This fixes the deadlock issue by replacing the sb->s_umount usage in
nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
mounts.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caea33da89 upstream.
Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a119365586 upstream.
The following build error occured during a ia64 build with
swap-over-NFS patches applied.
net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant
This is identical to a parisc build error. Fengguang Wu, Mel Gorman
and James Bottomley did all the legwork to track the root cause of
the problem. This fix and entire commit log is shamelessly copied
from them with one extra detail to change a dubious runtime use of
ATOMIC_INIT() to atomic_set() in drivers/char/mspec.c
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 141168c36c and
commit 3f806e5098 upstream.
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space. However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files. The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:
text data bss dec hex filename
4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before
4737444 506459 972040 6215943 5ed907 vmlinux.o.after
for a difference of 276 bytes for an example UP config.
If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c663600584 upstream.
Booting a 3.2, 3.3, or 3.4-rc4 kernel on an Atari using the
`nfeth' ethernet device triggers a WARN_ONCE() in generic irq
handling code on the first irq for that device:
WARNING: at kernel/irq/handle.c:146 handle_irq_event_percpu+0x134/0x142()
irq 3 handler nfeth_interrupt+0x0/0x194 enabled interrupts
Modules linked in:
Call Trace: [<000299b2>] warn_slowpath_common+0x48/0x6a
[<000299c0>] warn_slowpath_common+0x56/0x6a
[<00029a4c>] warn_slowpath_fmt+0x2a/0x32
[<0005b34c>] handle_irq_event_percpu+0x134/0x142
[<0005b34c>] handle_irq_event_percpu+0x134/0x142
[<0000a584>] nfeth_interrupt+0x0/0x194
[<001ba0a8>] schedule_preempt_disabled+0x0/0xc
[<0005b37a>] handle_irq_event+0x20/0x2c
[<0005add4>] generic_handle_irq+0x2c/0x3a
[<00002ab6>] do_IRQ+0x20/0x32
[<0000289e>] auto_irqhandler_fixup+0x4/0x6
[<00003144>] cpu_idle+0x22/0x2e
[<001b8a78>] printk+0x0/0x18
[<0024d112>] start_kernel+0x37a/0x386
[<0003021d>] __do_proc_dointvec+0xb1/0x366
[<0003021d>] __do_proc_dointvec+0xb1/0x366
[<0024c31e>] _sinittext+0x31e/0x9c0
After invoking the irq's handler the kernel sees !irqs_disabled()
and concludes that the handler erroneously enabled interrupts.
However, debugging shows that !irqs_disabled() is true even before
the handler is invoked, which indicates a problem in the platform
code rather than the specific driver.
The warning does not occur in 3.1 or older kernels.
It turns out that the ALLOWINT definition for Atari is incorrect.
The Atari definition of ALLOWINT is ~0x400, the stated purpose of
that is to avoid taking HSYNC interrupts. irqs_disabled() returns
true if the 3-bit ipl & 4 is non-zero. The nfeth interrupt runs at
ipl 3 (it's autovector 3), but 3 & 4 is zero so irqs_disabled() is
false, and the warning above is generated.
When interrupts are explicitly disabled, ipl is set to 7. When they
are enabled, ipl is masked with ALLOWINT. On Atari this will result
in ipl = 3, which blocks interrupts at ipl 3 and below. So how come
nfeth interrupts at ipl 3 are received at all? That's because ipl
is reset to 2 by Atari-specific code in default_idle(), again with
the stated purpose of blocking HSYNC interrupts. This discrepancy
means that ipl 3 can remain blocked for longer than intended.
Both default_idle() and falcon_hblhandler() identify HSYNC with
ipl 2, and the "Atari ST/.../F030 Hardware Register Listing" agrees,
but ALLOWINT is defined as if HSYNC was ipl 3.
[As an experiment I modified default_idle() to reset ipl to 3, and
as expected that resulted in all nfeth interrupts being blocked.]
The fix is simple: define ALLOWINT as ~0x500 instead. This makes
arch_local_irq_enable() consistent with default_idle(), and prevents
the !irqs_disabled() problems for ipl 3 interrupts.
Tested on Atari running in an Aranym VM.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Michael Schmitz <schmitzmic@googlemail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e2760d18b upstream.
User space access must always go through uaccess accessors, since on
classic m68k user space and kernel space are completely separate.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8edf3e552 upstream.
Otherwise if someone tries to use all four channels on AIF1 with the
device in master mode we won't be able to clock out all the data.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc733d4952 upstream.
The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aff252a848 upstream.
uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.
In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f96a4216e8 upstream.
The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.
This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits a117dacde0
and 8bbb181308 ]
The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59ea33a68a ]
Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.
The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.
This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.
If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.
This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.
This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b1beb681cb ]
When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.
This can be easily trigerred by doing:
tcpdump -i eth0 &
ip l s up eth0
ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.
Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e4c7f259c5 ]
The problem is that we call this with a spin lock held. The call tree
is:
kaweth_start_xmit() holds kaweth->device_lock.
-> kaweth_async_set_rx_mode()
-> kaweth_control()
-> kaweth_internal_control_msg()
The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4249357010 ]
TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 89d7ae34cd ]
As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).
This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface. In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.
A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>
struct local_tag {
char type;
char length;
char info[4];
};
struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};
int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};
memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));
return 0;
}
CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2eebc1e188 ]
A few days ago Dave Jones reported this oops:
[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140] ffff880147c03a10
[22766.387140] ffffffffa169f2b6
[22766.387140] ffff88013ed95728
[22766.387143] 0000000000000002
[22766.387143] 0000000000000000
[22766.387143] ffff880003fad062
[22766.387144] ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <ffff880147c039b0>
[22766.387142] ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.
Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.
I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.
I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.
I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c1f5163de4 ]
In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring. We were not using
the proper macro to skip the last entry when advancing the tx index.
Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97795d2a5b upstream.
If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks. In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.
This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:
Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair. The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6fb99cadc upstream.
Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cf02d09b5 upstream.
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:
PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
#0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
#1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.
Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2930d381d2 upstream.
Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca2ccde5e2 upstream.
To have DP behave like VGA/DVI we need to retrain the link
on hotplug. For this to happen we need to force link
training to happen by setting connector dpms to off
before asking it turning it on again.
v2: agd5f
- drop the dp_get_link_status() change in atombios_dp.c
for now. We still need the dpms OFF change.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 266dcba541 upstream.
No need to retrain the link for passive adapters.
v2: agd5f
- no passive DP to VGA adapters, update comments
- assign radeon_connector_atom_dig after we are sure
we have a digital connector as analog connectors
have different private data.
- get new sink type before checking for retrain. No
need to check if it's no longer a DP connection.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d1c702aa0 upstream.
We want to print link status query failed only if it's
an unexepected fail. If we query to see if we need
link training it might be because there is nothing
connected and thus link status query have the right
to fail in that case.
To avoid printing failure when it's expected, move the
failure message to proper place.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f60ec4c7df upstream.
This could previously fail if either of the enabled displays was using a
horizontal resolution that is a multiple of 128, and only the leftmost column
of the cursor was (supposed to be) visible at the right edge of that display.
The solution is to move the cursor one pixel to the left in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33183
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9fbcb4220 upstream.
Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.
This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f197ac13f6 upstream.
In the ac.c, power_supply_register()'s return value is not checked.
As a result, the driver's add() ops may return success
even though the device failed to initialize.
For example, some BIOS may describe two ACADs in the same DSDT.
The second ACAD device will fail to register,
but ACPI driver's add() ops returns sucessfully.
The ACPI device will receive ACPI notification and cause OOPS.
https://bugzilla.redhat.com/show_bug.cgi?id=772730
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6575820221 upstream.
Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.
Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.
However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.
While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.
Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.
Workqueue cpu hotplug operations will soon go through further cleanup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 443772d408 upstream.
If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).
This issue was explained in more detail in commit f42ac38c59 (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e8 (tracing: allow tracing of
suspend/resume & hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.
So fix this by disabling function tracing during suspend/resume & hibernation.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ec4f431eb upstream.
The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int. Thus some illegal values
may be let through and cause problems in later code.
[ They actually *don't* cause problems in mainline, as of Dave Jones's
commit 8d657eb3b4 "Remove easily user-triggerable BUG from
generic_setlease", but we should fix this anyway. And this patch will
be necessary to fix real bugs on earlier kernels. ]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31bde1ceaa upstream.
A "usb0" interface that has never been connected to a host has an unknown
operstate, and therefore the IFF_RUNNING flag is (incorrectly) asserted
when queried by ifconfig, ifplugd, etc. This is a result of calling
netif_carrier_off() too early in the probe function; it should be called
after register_netdev().
Similar problems have been fixed in many other drivers, e.g.:
e826eafa6 (bonding: Call netif_carrier_off after register_netdevice)
0d672e9f8 (drivers/net: Call netif_carrier_off at the end of the probe)
6a3c869a6 (cxgb4: fix reported state of interfaces without link)
Fix is to move netif_carrier_off() to the end of the function.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2102e06a5f upstream.
iso data buffers may have holes in them if some packets were short, so for
iso urbs we should always copy the entire buffer, just like the regular
processcompl does.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b110547e58 upstream.
Commit 9fa2df6b90
(ARM: OMAP2+: OPP: allow OPP enumeration to continue if device is not present)
makes the logic:
for (i = 0; i < opp_def_size; i++) {
<snip>
if (!oh || !oh->od) {
<snip>
continue;
}
<snip>
opp_def++;
}
In short, the moment we hit a "Bad OPP", we end up looping the list
comparing against the bad opp definition pointer for the rest of the
iteration count. Instead, increment opp_def in the for loop itself
and allow continue to be used in code without much thought so that
we check the next set of OPP definition pointers :)
Cc: Steve Sakoman <steve@sakoman.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 940f5d47e2 upstream.
When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device. If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.
Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b661a92e8 upstream.
The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
...
Call Trace:
[<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
[<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8125e641>] kobject_add_varg+0x41/0x50
[<ffffffff8125e70b>] kobject_add+0x64/0x66
[<ffffffff8131122b>] device_add+0x12d/0x63a
[<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
[<ffffffff8107de15>] ? module_refcount+0x89/0xa0
[<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
[<ffffffff8132dcbb>] do_scan_async+0x9c/0x145
...teach scsi_sysfs_add_devices() to check for deleted devices() before
trying to add them, and teach scsi_remove_target() how to remove targets
that have not been added via device_add().
Reported-by: Dariusz Majchrzak <dariusz.majchrzak@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57fc2e335f upstream.
Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.
A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.
Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh. Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b17caa174a upstream.
commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues
The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports. The result is random discovery failures
depending on configuration.
Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'. Convert it to bool and stop
returning its result up the stack.
Tested-by: Dan Melnic <dan.melnic@amd.com>
Reported-by: Dan Melnic <dan.melnic@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26f2f199ff upstream.
Continue running revalidation until no more broadcast devices are
discovered. Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f5072d4f6 upstream.
Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
numbers on the way. This causes nonsensical cpu idle/iowait values to be
displayed in /proc/stat (the only user of usecs_to_cputime so far).
This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
its value and using it directly in cputime_to_usecs instead of doing two
multiplications.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55fc05b741 upstream.
At http://dev.laptop.org/ticket/11980 we have determined that the
Marvell CaFe SDHCI controller reports bad card presence during
resume. It reports that no card is present even when it is.
This is a regression -- resume worked back around 2.6.37.
Around 400ms after resuming, a "card inserted" interrupt is
generated, at which point it starts reporting presence.
Work around this hardware oddity by setting the
SDHCI_QUIRK_BROKEN_CARD_DETECTION flag.
Thanks to Chris Ball for helping with diagnosis.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 635697c663 upstream.
Stable note: The commit [acf92b48: vmscan: shrinker->nr updates race and
go wrong] aimed to reduce excessive reclaim of slab objects but
had bug in how it treated shrinker functions that returned -1.
A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.
Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c12cbcd0 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This
is part of a series of patches that reduce page allocator overhead.
Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce
large amounts of memory barrier related damage v3")
Local variable "page" can be uninitialized if the nodemask from vma policy
does not intersects with nodemask from cpuset. Even if it doesn't happens
it is better to initialize this variable explicitly than to introduce
a kernel oops in a weird corner case.
mm/hugetlb.c: In function `alloc_huge_page':
mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc9a6c8776 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This
is part of a series of patches that reduce page allocator overhead.
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.
[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.
For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.
This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.
While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.
In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were
3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87
The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.
For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.
To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.
Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b246272ecc upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This is
part of a series of patches that reduce page allocator overhead.
Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.
c0ff7453bb ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by
89e8a244b9 ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e8a244b9 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is
extremely expensive and severely impacted page allocator performance.
This is part of a series of patches that reduce page allocator
overhead.
{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.
This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself. It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example. In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!
The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes. This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.
If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes. Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b95a2f2d48 upstream - WARNING: this is a substitute patch.
Stable note: Not tracked in Bugzilla. This is a partial backport of an
upstream commit addressing a completely different issue
that accidentally contained an important fix. The workload
this patch helps was memcached when IO is started in the
background. memcached should stay resident but without this patch
it gets swapped. Sometimes this manifests as a drop in throughput
but mostly it was observed through /proc/vmstat.
Commit [246e87a9: memcg: fix get_scan_count() for small targets] was meant
to fix a problem whereby small scan targets on memcg were ignored causing
priority to raise too sharply. It forced scanning to take place if the
target was small, memcg or kswapd.
From the time it was introduced it caused excessive reclaim by kswapd
with workloads being pushed to swap that previously would have stayed
resident. This was accidentally fixed in commit [b95a2f2d: mm: vmscan:
convert global reclaim to per-memcg LRU lists] by making it harder for
kswapd to force scan small targets but that patchset is not suitable for
backporting. This was later changed again by commit [90126375: mm/vmscan:
push lruvec pointer into get_scan_count()] into a format that looks
like it would be a straight-forward backport but there is a subtle
difference due to the use of lruvecs.
The impact of the accidental fix is to make it harder for kswapd to force
scan small targets by taking zone->all_unreclaimable into account. This
patch is the closest equivalent available based on what is backported.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 043bcbe5ec upstream.
Stable note: Not tracked in Bugzilla. There were reports of shared
mapped pages being unfairly reclaimed in comparison to older kernels.
This is being addressed over time. Even though the subject
refers to lumpy reclaim, it impacts compaction as well.
Lumpy reclaim does well to stop at a PageAnon when there's no swap, but
better is to stop at any PageSwapBacked, which includes shmem/tmpfs too.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 86cfd3a450 upstream.
Stable note: Not tracked in Bugzilla. This patch reduces kswapd CPU
usage on swapless systems with high anonymous memory usage.
It's pointless to continue reclaiming when we have no swap space and lots
of anon pages in the inactive list.
Without this patch, it is possible when swap is disabled to continue
trying to reclaim when there are only anonymous pages in the system even
though that will not make any progress.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34dbc67a64 upstream.
Stable note: Not tracked in Bugzilla. There were reports of shared
mapped pages being unfairly reclaimed in comparison to older kernels.
This is being addressed over time. The specific workload being
addressed here in described in paragraph four and while paragraph
five says it did not help performance as such, it made a difference
to major page faults. I'm aware of at least one bug for a large
vendor that was due to increased major faults.
Commit 6457474624 ("vmscan: detect mapped file pages used only once")
greatly decreases lifetime of single-used mapped file pages.
Unfortunately it also decreases life time of all shared mapped file
pages. Because after commit bf3f3bc5e7 ("mm: don't mark_page_accessed
in fault path") page-fault handler does not mark page active or even
referenced.
Thus page_check_references() activates file page only if it was used twice
while it stays in inactive list, meanwhile it activates anon pages after
first access. Inactive list can be small enough, this way reclaimer can
accidentally throw away any widely used page if it wasn't used twice in
short period.
After this patch page_check_references() also activate file mapped page at
first inactive list scan if this page is already used multiple times via
several ptes.
I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5
(~2.6.18). There a complete mess with >100 web/mail/spam/ftp containers,
they share all their files but there a lot of anonymous pages: ~500mb
shared file mapped memory and 15-20Gb non-shared anonymous memory. In
this situation major-pagefaults are very costly, because all containers
share the same page. In my load kernel created a disproportionate
pressure on the file memory, compared with the anonymous, they equaled
only if I raise swappiness up to 150 =)
These patches actually wasn't helped a lot in my problem, but I saw
noticable (10-20 times) reduce in count and average time of
major-pagefault in file-mapped areas.
Actually both patches are fixes for commit v2.6.33-5448-g6457474, because
it was aimed at one scenario (singly used pages), but it breaks the logic
in other scenarios (shared and/or executable pages)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 0cee34fd72 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time.
If compaction can proceed for a given zone, shrink_zones() does not
reclaim any more pages from it. After commit [e0c2327: vmscan: abort
reclaim/compaction if compaction can proceed], do_try_to_free_pages()
tries to finish as soon as possible once one zone can compact.
This was intended to prevent slabs being shrunk unnecessarily but there
are side-effects. One is that a small zone that is ready for compaction
will abort reclaim even if the chances of successfully allocating a THP
from that zone is small. It also means that reclaim can return too early
even though sc->nr_to_reclaim pages were not reclaimed.
This partially reverts the commit until it is proven that slabs are really
being shrunk unnecessarily but preserves the check to return 1 to avoid
OOM if reclaim was aborted prematurely.
[aarcange@redhat.com: This patch replaces a revert from Andrea]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7335084d44 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but otherwise has little to justify it. The
problem it fixes was never observed but the source of the
theoretical problem did not exist for very long.
During direct reclaim it is possible that reclaim will be aborted so that
compaction can be attempted to satisfy a high-order allocation. If this
decision is made before any pages are reclaimed, it is possible that 0 is
returned to the page allocator potentially triggering an OOM. This has
not been observed but it is a possibility so this patch addresses it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe4b1b244b upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time. This patch
addresses a problem where the fix regressed THP allocation
success rates.
In commit e0887c19 ("vmscan: limit direct reclaim for higher order
allocations"), Rik noted that reclaim was too aggressive when THP was
enabled. In his initial patch he used the number of free pages to decide
if reclaim should abort for compaction. My feedback was that reclaim and
compaction should be using the same logic when deciding if reclaim should
be aborted.
Unfortunately, this had the effect of reducing THP success rates when the
workload included something like streaming reads that continually
allocated pages. The window during which compaction could run and return
a THP was too small.
This patch combines Rik's two patches together. compaction_suitable() is
still used to decide if reclaim should be aborted to allow compaction is
used. However, it will also ensure that there is a reasonable buffer of
free pages available. This improves upon the THP allocation success rates
but bounds the number of pages that are freed for compaction.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6bc32b899 upstream.
Stable note: Not tracked in Buzilla. This was part of a series that
reduced interactivity stalls experienced when THP was enabled.
These stalls were particularly noticable when copying data
to a USB stick but the experiences for users varied a lot.
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.
This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.
[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0dfcde099 upstream.
Stable note: Fixes https://bugzilla.redhat.com/show_bug.cgi?id=712019. This
patch reduces kswapd CPU usage.
There 2 places to read pgdat in kswapd. One is return from a successful
balance, another is waked up from kswapd sleeping. The new_order and
new_classzone_idx represent the balance input order and classzone_idx.
But current new_order and new_classzone_idx are not assigned after
kswapd_try_to_sleep(), that will cause a bug in the following scenario.
1: after a successful balance, kswapd goes to sleep, and new_order = 0;
new_classzone_idx = __MAX_NR_ZONES - 1;
2: kswapd waked up with order = 3 and classzone_idx = ZONE_NORMAL
3: in the balance_pgdat() running, a new balance wakeup happened with
order = 5, and classzone_idx = ZONE_NORMAL
4: the first wakeup(order = 3) finished successufly, return order = 3
but, the new_order is still 0, so, this balancing will be treated as a
failed balance. And then the second tighter balancing will be missed.
So, to avoid the above problem, the new_order and new_classzone_idx need
to be assigned for later successful comparison.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Pádraig Brady <P@draigBrady.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2ebd0f6b8 upstream.
Stable note: Fixes https://bugzilla.redhat.com/show_bug.cgi?id=712019. This
patch reduces kswapd CPU usage.
In commit 215ddd66 ("mm: vmscan: only read new_classzone_idx from pgdat
when reclaiming successfully") , Mel Gorman said kswapd is better to sleep
after a unsuccessful balancing if there is tighter reclaim request pending
in the balancing. But in the following scenario, kswapd do something that
is not matched our expectation. The patch fixes this issue.
1, Read pgdat request A (classzone_idx, order = 3)
2, balance_pgdat()
3, During pgdat, a new pgdat request B (classzone_idx, order = 5) is placed
4, balance_pgdat() returns but failed since returned order = 0
5, pgdat of request A assigned to balance_pgdat(), and do balancing again.
While the expectation behavior of kswapd should try to sleep.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Pádraig Brady <P@draigBrady.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c824493528 upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
information by reducing LRU list churning had the side-effect of
reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list. This
had to be partially reverted because some dirty pages can be migrated by
compaction without blocking.
This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise LRU
disruption.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66199712e9 upstream.
Stable note: Not tracked in Buzilla. This was part of a series that
reduced interactivity stalls experienced when THP was enabled.
If compaction is deferred, direct reclaim is used to try to free enough
pages for the allocation to succeed. For small high-orders, this has a
reasonable chance of success. However, if the caller has specified
__GFP_NO_KSWAPD to limit the disruption to the system, it makes more sense
to fail the allocation rather than stall the caller in direct reclaim.
This patch skips direct reclaim if compaction is deferred and the caller
specifies __GFP_NO_KSWAPD.
Async compaction only considers a subset of pages so it is possible for
compaction to be deferred prematurely and not enter direct reclaim even in
cases where it should. To compensate for this, this patch also defers
compaction only if sync compaction failed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b969c4ab9f upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page
aging information by reducing LRU list churning had the side-effect
of reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates. Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;
if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;
This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a77ebd333c upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
information by reducing LRU list churning had the side-effect of
reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Short summary: There are severe stalls when a USB stick using VFAT is
used with THP enabled that are reduced by this series. If you are
experiencing this problem, please test and report back and considering I
have seen complaints from openSUSE and Fedora users on this as well as a
few private mails, I'm guessing it's a widespread issue. This is a new
type of USB-related stall because it is due to synchronous compaction
writing where as in the past the big problem was dirty pages reaching
the end of the LRU and being written by reclaim.
Am cc'ing Andrew this time and this series would replace
mm-do-not-stall-in-synchronous-compaction-for-thp-allocations.patch.
I'm also cc'ing Dave Jones as he might have merged that patch to Fedora
for wider testing and ideally it would be reverted and replaced by this
series.
That said, the later patches could really do with some review. If this
series is not the answer then a new direction needs to be discussed
because as it is, the stalls are unacceptable as the results in this
leader show.
For testers that try backporting this to 3.1, it won't work because
there is a non-obvious dependency on not writing back pages in direct
reclaim so you need those patches too.
Changelog since V5
o Rebase to 3.2-rc5
o Tidy up the changelogs a bit
Changelog since V4
o Added reviewed-bys, credited Andrea properly for sync-light
o Allow dirty pages without mappings to be considered for migration
o Bound the number of pages freed for compaction
o Isolate PageReclaim pages on their own LRU list
This is against 3.2-rc5 and follows on from discussions on "mm: Do
not stall in synchronous compaction for THP allocations" and "[RFC
PATCH 0/5] Reduce compaction-related stalls". Initially, the proposed
patch eliminated stalls due to compaction which sometimes resulted in
user-visible interactivity problems on browsers by simply never using
sync compaction. The downside was that THP success allocation rates
were lower because dirty pages were not being migrated as reported by
Andrea. His approach at fixing this was nacked on the grounds that
it reverted fixes from Rik merged that reduced the amount of pages
reclaimed as it severely impacted his workloads performance.
This series attempts to reconcile the requirements of maximising THP
usage, without stalling in a user-visible fashion due to compaction
or cheating by reclaiming an excessive number of pages.
Patch 1 partially reverts commit 39deaf85 to allow migration to isolate
dirty pages. This is because migration can move some dirty
pages without blocking.
Patch 2 notes that the /proc/sys/vm/compact_memory handler is not using
synchronous compaction when it should be. This is unrelated
to the reported stalls but is worth fixing.
Patch 3 checks if we isolated a compound page during lumpy scan and
account for it properly. For the most part, this affects
tracing so it's unrelated to the stalls but worth fixing.
Patch 4 notes that it is possible to abort reclaim early for compaction
and return 0 to the page allocator potentially entering the
"may oom" path. This has not been observed in practice but
the rest of the series potentially makes it easier to happen.
Patch 5 adds a sync parameter to the migratepage callback and gives
the callback responsibility for migrating the page without
blocking if sync==false. For example, fallback_migrate_page
will not call writepage if sync==false. This increases the
number of pages that can be handled by asynchronous compaction
thereby reducing stalls.
Patch 6 restores filter-awareness to isolate_lru_page for migration.
In practice, it means that pages under writeback and pages
without a ->migratepage callback will not be isolated
for migration.
Patch 7 avoids calling direct reclaim if compaction is deferred but
makes sure that compaction is only deferred if sync
compaction was used.
Patch 8 introduces a sync-light migration mechanism that sync compaction
uses. The objective is to allow some stalls but to not call
->writepage which can lead to significant user-visible stalls.
Patch 9 notes that while we want to abort reclaim ASAP to allow
compation to go ahead that we leave a very small window of
opportunity for compaction to run. This patch allows more pages
to be freed by reclaim but bounds the number to a reasonable
level based on the high watermark on each zone.
Patch 10 allows slabs to be shrunk even after compaction_ready() is
true for one zone. This is to avoid a problem whereby a single
small zone can abort reclaim even though no pages have been
reclaimed and no suitably large zone is in a usable state.
Patch 11 fixes a problem with the rate of page scanning. As reclaim is
rarely stalling on pages under writeback it means that scan
rates are very high. This is particularly true for direct
reclaim which is not calling writepage. The vmstat figures
implied that much of this was busy work with PageReclaim pages
marked for immediate reclaim. This patch is a prototype that
moves these pages to their own LRU list.
This has been tested and other than 2 USB keys getting trashed,
nothing horrible fell out. That said, I am a bit unhappy with the
rescue logic in patch 11 but did not find a better way around it. It
does significantly reduce scan rates and System CPU time indicating
it is the right direction to take.
What is of critical importance is that stalls due to compaction
are massively reduced even though sync compaction was still
allowed. Testing from people complaining about stalls copying to USBs
with THP enabled are particularly welcome.
The following tests all involve THP usage and USB keys in some
way. Each test follows this type of pattern
1. Read from some fast fast storage, be it raw device or file. Each time
the copy finishes, start again until the test ends
2. Write a large file to a filesystem on a USB stick. Each time the copy
finishes, start again until the test ends
3. When memory is low, start an alloc process that creates a mapping
the size of physical memory to stress THP allocation. This is the
"real" part of the test and the part that is meant to trigger
stalls when THP is enabled. Copying continues in the background.
4. Record the CPU usage and time to execute of the alloc process
5. Record the number of THP allocs and fallbacks as well as the number of THP
pages in use a the end of the test just before alloc exited
6. Run the test 5 times to get an idea of variability
7. Between each run, sync is run and caches dropped and the test
waits until nr_dirty is a small number to avoid interference
or caching between iterations that would skew the figures.
The individual tests were then
writebackCPDeviceBasevfat
Disable THP, read from a raw device (sda), vfat on USB stick
writebackCPDeviceBaseext4
Disable THP, read from a raw device (sda), ext4 on USB stick
writebackCPDevicevfat
THP enabled, read from a raw device (sda), vfat on USB stick
writebackCPDeviceext4
THP enabled, read from a raw device (sda), ext4 on USB stick
writebackCPFilevfat
THP enabled, read from a file on fast storage and USB, both vfat
writebackCPFileext4
THP enabled, read from a file on fast storage and USB, both ext4
The kernels tested were
3.1 3.1
vanilla 3.2-rc5
freemore Patches 1-10
immediate Patches 1-11
andrea The 8 patches Andrea posted as a basis of comparison
The results are very long unfortunately. I'll start with the case
where we are not using THP at all
writebackCPDeviceBasevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.28 ( 0.00%) 54.49 (-4143.46%) 48.63 (-3687.69%) 4.69 ( -265.11%) 51.88 (-3940.81%)
+/- 0.06 ( 0.00%) 2.45 (-4305.55%) 4.75 (-8430.57%) 7.46 (-13282.76%) 4.76 (-8440.70%)
User Time 0.09 ( 0.00%) 0.05 ( 40.91%) 0.06 ( 29.55%) 0.07 ( 15.91%) 0.06 ( 27.27%)
+/- 0.02 ( 0.00%) 0.01 ( 45.39%) 0.02 ( 25.07%) 0.00 ( 77.06%) 0.01 ( 52.24%)
Elapsed Time 110.27 ( 0.00%) 56.38 ( 48.87%) 49.95 ( 54.70%) 11.77 ( 89.33%) 53.43 ( 51.54%)
+/- 7.33 ( 0.00%) 3.77 ( 48.61%) 4.94 ( 32.63%) 6.71 ( 8.50%) 4.76 ( 35.03%)
THP Active 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault Alloc 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault Fallback 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
The THP figures are obviously all 0 because THP was enabled. The
main thing to watch is the elapsed times and how they compare to
times when THP is enabled later. It's also important to note that
elapsed time is improved by this series as System CPu time is much
reduced.
writebackCPDevicevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.22 ( 0.00%) 13.89 (-1040.72%) 46.40 (-3709.20%) 4.44 ( -264.37%) 47.37 (-3789.33%)
+/- 0.06 ( 0.00%) 22.82 (-37635.56%) 3.84 (-6249.44%) 6.48 (-10618.92%) 6.60
(-10818.53%)
User Time 0.06 ( 0.00%) 0.06 ( -6.90%) 0.05 ( 17.24%) 0.05 ( 13.79%) 0.04 ( 31.03%)
+/- 0.01 ( 0.00%) 0.01 ( 33.33%) 0.01 ( 33.33%) 0.01 ( 39.14%) 0.01 ( 25.46%)
Elapsed Time 10445.54 ( 0.00%) 2249.92 ( 78.46%) 70.06 ( 99.33%) 16.59 ( 99.84%) 472.43 (
95.48%)
+/- 643.98 ( 0.00%) 811.62 ( -26.03%) 10.02 ( 98.44%) 7.03 ( 98.91%) 59.99 ( 90.68%)
THP Active 15.60 ( 0.00%) 35.20 ( 225.64%) 65.00 ( 416.67%) 70.80 ( 453.85%) 62.20 ( 398.72%)
+/- 18.48 ( 0.00%) 51.29 ( 277.59%) 15.99 ( 86.52%) 37.91 ( 205.18%) 22.02 ( 119.18%)
Fault Alloc 121.80 ( 0.00%) 76.60 ( 62.89%) 155.40 ( 127.59%) 181.20 ( 148.77%) 286.60 ( 235.30%)
+/- 73.51 ( 0.00%) 61.11 ( 83.12%) 34.89 ( 47.46%) 31.88 ( 43.36%) 68.13 ( 92.68%)
Fault Fallback 881.20 ( 0.00%) 926.60 ( -5.15%) 847.60 ( 3.81%) 822.00 ( 6.72%) 716.60 ( 18.68%)
+/- 73.51 ( 0.00%) 61.26 ( 16.67%) 34.89 ( 52.54%) 31.65 ( 56.94%) 67.75 ( 7.84%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 3540.88 1945.37 716.04 64.97 1937.03
Total Elapsed Time (seconds) 52417.33 11425.90 501.02 230.95 2520.28
The first thing to note is the "Elapsed Time" for the vanilla kernels
of 2249 seconds versus 56 with THP disabled which might explain the
reports of USB stalls with THP enabled. Applying the patches brings
performance in line with THP-disabled performance while isolating
pages for immediate reclaim from the LRU cuts down System CPU time.
The "Fault Alloc" success rate figures are also improved. The vanilla
kernel only managed to allocate 76.6 pages on average over the course
of 5 iterations where as applying the series allocated 181.20 on
average albeit it is well within variance. It's worth noting that
applies the series at least descreases the amount of variance which
implies an improvement.
Andrea's series had a higher success rate for THP allocations but
at a severe cost to elapsed time which is still better than vanilla
but still much worse than disabling THP altogether. One can bring my
series close to Andrea's by removing this check
/*
* If compaction is deferred for high-order allocations, it is because
* sync compaction recently failed. In this is the case and the caller
* has requested the system not be heavily disrupted, fail the
* allocation now instead of entering direct reclaim
*/
if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
goto nopage;
I didn't include a patch that removed the above check because hurting
overall performance to improve the THP figure is not what the average
user wants. It's something to consider though if someone really wants
to maximise THP usage no matter what it does to the workload initially.
This is summary of vmstat figures from the same test.
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
Page Ins 3257266139 1111844061 17263623 10901575 161423219
Page Outs 81054922 30364312 3626530 3657687 8753730
Swap Ins 3294 2851 6560 4964 4592
Swap Outs 390073 528094 620197 790912 698285
Direct pages scanned 1077581700 3024951463 1764930052 115140570 5901188831
Kswapd pages scanned 34826043 7112868 2131265 1686942 1893966
Kswapd pages reclaimed 28950067 4911036 1246044 966475 1497726
Direct pages reclaimed 805148398 280167837 3623473 2215044 40809360
Kswapd efficiency 83% 69% 58% 57% 79%
Kswapd velocity 664.399 622.521 4253.852 7304.360 751.490
Direct efficiency 74% 9% 0% 1% 0%
Direct velocity 20557.737 264745.137 3522673.849 498551.938 2341481.435
Percentage direct scans 96% 99% 99% 98% 99%
Page writes by reclaim 722646 529174 620319 791018 699198
Page writes file 332573 1080 122 106 913
Page writes anon 390073 528094 620197 790912 698285
Page reclaim immediate 0 2552514720 1635858848 111281140 5478375032
Page rescued immediate 0 0 0 87848 0
Slabs scanned 23552 23552 9216 8192 9216
Direct inode steals 231 0 0 0 0
Kswapd inode steals 0 0 0 0 0
Kswapd skipped wait 28076 786 0 61 6
THP fault alloc 609 383 753 906 1433
THP collapse alloc 12 6 0 0 6
THP splits 536 211 456 593 1136
THP fault fallback 4406 4633 4263 4110 3583
THP collapse fail 120 127 0 0 4
Compaction stalls 1810 728 623 779 3200
Compaction success 196 53 60 80 123
Compaction failures 1614 675 563 699 3077
Compaction pages moved 193158 53545 243185 333457 226688
Compaction move failure 9952 9396 16424 23676 45070
The main things to look at are
1. Page In/out figures are much reduced by the series.
2. Direct page scanning is incredibly high (264745.137 pages scanned
per second on the vanilla kernel) but isolating PageReclaim pages
on their own list reduces the number of pages scanned significantly.
3. The fact that "Page rescued immediate" is a positive number implies
that we sometimes race removing pages from the LRU_IMMEDIATE list
that need to be put back on a normal LRU but it happens only for
0.07% of the pages marked for immediate reclaim.
writebackCPDeviceext4
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%)
+/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%)
User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%)
+/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%)
Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%)
+/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%)
THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%)
+/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%)
Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%)
+/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%)
Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%)
+/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45
Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26
Similar test but the USB stick is using ext4 instead of vfat. As
ext4 does not use writepage for migration, the large stalls due to
compaction when THP is enabled are not observed. Still, isolating
PageReclaim pages on their own list helped completion time largely
by reducing the number of pages scanned by direct reclaim although
time spend in congestion_wait could also be a factor.
Again, Andrea's series had far higher success rates for THP allocation
at the cost of elapsed time. I didn't look too closely but a quick
look at the vmstat figures tells me kswapd reclaimed 8 times more pages
than the patch series and direct reclaim reclaimed roughly three times
as many pages. It follows that if memory is aggressively reclaimed,
there will be more available for THP.
writebackCPFilevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.76 ( 0.00%) 29.10 (-1555.52%) 46.01 (-2517.18%) 4.79 ( -172.35%) 54.89 (-3022.53%)
+/- 0.14 ( 0.00%) 25.61 (-18185.17%) 2.15 (-1434.83%) 6.60 (-4610.03%) 9.75
(-6863.76%)
User Time 0.05 ( 0.00%) 0.07 ( -45.83%) 0.05 ( -4.17%) 0.06 ( -29.17%) 0.06 ( -16.67%)
+/- 0.02 ( 0.00%) 0.02 ( 20.11%) 0.02 ( -3.14%) 0.01 ( 31.58%) 0.01 ( 47.41%)
Elapsed Time 22520.79 ( 0.00%) 1082.85 ( 95.19%) 73.30 ( 99.67%) 32.43 ( 99.86%) 291.84 ( 98.70%)
+/- 7277.23 ( 0.00%) 706.29 ( 90.29%) 19.05 ( 99.74%) 17.05 ( 99.77%) 125.55 ( 98.27%)
THP Active 83.80 ( 0.00%) 12.80 ( 15.27%) 15.60 ( 18.62%) 13.00 ( 15.51%) 0.80 ( 0.95%)
+/- 66.81 ( 0.00%) 20.19 ( 30.22%) 5.92 ( 8.86%) 15.06 ( 22.54%) 1.17 ( 1.75%)
Fault Alloc 171.00 ( 0.00%) 67.80 ( 39.65%) 97.40 ( 56.96%) 125.60 ( 73.45%) 133.00 ( 77.78%)
+/- 82.91 ( 0.00%) 30.69 ( 37.02%) 53.91 ( 65.02%) 55.05 ( 66.40%) 21.19 ( 25.56%)
Fault Fallback 832.00 ( 0.00%) 935.20 ( -12.40%) 906.00 ( -8.89%) 877.40 ( -5.46%) 870.20 ( -4.59%)
+/- 82.91 ( 0.00%) 30.69 ( 62.98%) 54.01 ( 34.86%) 55.05 ( 33.60%) 20.91 ( 74.78%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 7229.81 928.42 704.52 80.68 1330.76
Total Elapsed Time (seconds) 112849.04 5618.69 571.11 360.54 1664.28
In this case, the test is reading/writing only from filesystems but as
it's vfat, it's slow due to calling writepage during compaction. Little
to observe really - the time to complete the test goes way down
with the series applied and THP allocation success rates go up in
comparison to 3.2-rc5. The success rates are lower than 3.1.0 but
the elapsed time for that kernel is abysmal so it is not really a
sensible comparison.
As before, Andrea's series allocates more THPs at the cost of overall
performance.
writebackCPFileext4
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%)
+/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%)
User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%)
+/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%)
Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%)
+/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%)
THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%)
+/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%)
Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%)
+/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%)
Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%)
+/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45
Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26
Same type of story - elapsed times go down. In this case, allocation
success rates are roughtly the same. As before, Andrea's has higher
success rates but takes a lot longer.
Overall the series does reduce latencies and while the tests are
inherency racy as alloc competes with the cp processes, the variability
was included. The THP allocation rates are not as high as they could
be but that is because we would have to be more aggressive about
reclaim and compaction impacting overall performance.
This patch:
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list.
What was missed during review is that asynchronous migration moves dirty
pages if their ->migratepage callback is migrate_page() because these can
be moved without blocking. This potentially impacted hugepage allocation
success rates by a factor depending on how many dirty pages are in the
system.
This patch partially reverts 39deaf85 to allow migration to isolate dirty
pages again. This increases how much compaction disrupts the LRU but that
is addressed later in the series.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f80c067361 upstream.
Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU list
leading to poor reclaim decisions which has a variable
performance impact.
In __zone_reclaim case, we don't want to shrink mapped page. Nonetheless,
we have isolated mapped page and re-add it into LRU's head. It's
unnecessary CPU overhead and makes LRU churning.
Of course, when we isolate the page, the page might be mapped but when we
try to migrate the page, the page would be not mapped. So it could be
migrated. But race is rare and although it happens, it's no big deal.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39deaf8585 upstream.
Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU
list leading to poor reclaim decisions which has a variable
performance impact.
In async mode, compaction doesn't migrate dirty or writeback pages. So,
it's meaningless to pick the page and re-add it to lru list.
Of course, when we isolate the page in compaction, the page might be dirty
or writeback but when we try to migrate the page, the page would be not
dirty, writeback. So it could be migrated. But it's very unlikely as
isolate and migration cycle is much faster than writeout.
So, this patch helps cpu overhead and prevent unnecessary LRU churning.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4356f21d09 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but has no other impact.
Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.
Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags. INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."
This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e84ac153 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but has no other impact.
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or
LRU_ACTIVE_FILE. In addtion, cc->nr_[anon|file] is used in only
acct_isolated so it doesn't have fields in conpact_control.
This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0c23279c9 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time.
If compaction can proceed, shrink_zones() stops doing any work but its
callers still call shrink_slab() which raises the priority and potentially
sleeps. This is unnecessary and wasteful so this patch aborts direct
reclaim/compaction entirely if compaction can proceed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0887c19b2 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time. Paragraph
3 of this changelog is the motivation for this patch.
When suffering from memory fragmentation due to unfreeable pages, THP page
faults will repeatedly try to compact memory. Due to the unfreeable
pages, compaction fails.
Needless to say, at that point page reclaim also fails to create free
contiguous 2MB areas. However, that doesn't stop the current code from
trying, over and over again, and freeing a minimum of 4MB (2UL <<
sc->order pages) at every single invocation.
This resulted in my 12GB system having 2-3GB free memory, a corresponding
amount of used swap and very sluggish response times.
This can be avoided by having the direct reclaim code not reclaim from
zones that already have plenty of free memory available for compaction.
If compaction still fails due to unmovable memory, doing additional
reclaim will only hurt the system, not help.
[jweiner@redhat.com: change comment to explain the order check]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3567b59aa8 upstream.
Stable note: Not tracked in Bugzilla. This patch reduces excessive
reclaim of slab objects reducing the amount of information that
has to be brought back in from disk. The third and fourth paragram
in the series describes the impact.
When a shrinker returns -1 to shrink_slab() to indicate it cannot do
any work given the current memory reclaim requirements, it adds the
entire total_scan count to shrinker->nr. The idea ehind this is that
whenteh shrinker is next called and can do work, it will do the work
of the previously aborted shrinker call as well.
However, if a filesystem is doing lots of allocation with GFP_NOFS
set, then we get many, many more aborts from the shrinkers than we
do successful calls. The result is that shrinker->nr winds up to
it's maximum permissible value (twice the current cache size) and
then when the next shrinker call that can do work is issued, it
has enough scan count built up to free the entire cache twice over.
This manifests itself in the cache going from full to empty in a
matter of seconds, even when only a small part of the cache is
needed to be emptied to free sufficient memory.
Under metadata intensive workloads on ext4 and XFS, I'm seeing the
VFS caches increase memory consumption up to 75% of memory (no page
cache pressure) over a period of 30-60s, and then the shrinker
empties them down to zero in the space of 2-3s. This cycle repeats
over and over again, with the shrinker completely trashing the inode
and dentry caches every minute or so the workload continues.
This behaviour was made obvious by the shrink_slab tracepoints added
earlier in the series, and made worse by the patch that corrected
the concurrent accounting of shrinker->nr.
To avoid this problem, stop repeated small increments of the total
scan value from winding shrinker->nr up to a value that can cause
the entire cache to be freed. We still need to allow it to wind up,
so use the delta as the "large scan" threshold check - if the delta
is more than a quarter of the entire cache size, then it is a large
scan and allowed to cause lots of windup because we are clearly
needing to free lots of memory.
If it isn't a large scan then limit the total scan to half the size
of the cache so that windup never increases to consume the whole
cache. Reducing the total scan limit further does not allow enough
wind-up to maintain the current levels of performance, whilst a
higher threshold does not prevent the windup from freeing the entire
cache under sustained workloads.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acf92b485c upstream.
Stable note: Not tracked in Bugzilla. This patch reduces excessive
reclaim of slab objects reducing the amount of information
that has to be brought back in from disk.
shrink_slab() allows shrinkers to be called in parallel so the
struct shrinker can be updated concurrently. It does not provide any
exclusio for such updates, so we can get the shrinker->nr value
increasing or decreasing incorrectly.
As a result, when a shrinker repeatedly returns a value of -1 (e.g.
a VFS shrinker called w/ GFP_NOFS), the shrinker->nr goes haywire,
sometimes updating with the scan count that wasn't used, sometimes
losing it altogether. Worse is when a shrinker does work and that
update is lost due to racy updates, which means the shrinker will do
the work again!
Fix this by making the total_scan calculations independent of
shrinker->nr, and making the shrinker->nr updates atomic w.r.t. to
other updates via cmpxchg loops.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 095760730c upstream.
Stable note: This patch makes later patches easier to apply but otherwise
has little to justify it. It is a diagnostic patch that was part
of a series addressing excessive slab shrinking after GFP_NOFS
failures. There is detailed information on the series' motivation
at https://lkml.org/lkml/2011/6/2/42 .
It is impossible to understand what the shrinkers are actually doing
without instrumenting the code, so add a some tracepoints to allow
insight to be gained.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 439423f689 upstream.
Stable note: Not tracked in Bugzilla. kswapd is responsible for clearing
ZONE_CONGESTED after it balances a zone and this patch fixes a bug
where that was failing to happen. Without this patch, processes
can stall in wait_iff_congested unnecessarily. For users, this can
look like an interactivity stall but some workloads would see it
as sudden drop in throughput.
ZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any
task. It's possible ZONE_CONGESTED isn't cleared in some cases:
1. the zone is already balanced just entering balance_pgdat() for
order-0 because concurrent tasks free memory. In this case, later
check will skip the zone as it's balanced so the flag isn't cleared.
2. high order balance fallbacks to order-0. quote from Mel: At the
end of balance_pgdat(), kswapd uses the following logic;
If reclaiming at high order {
for each zone {
if all_unreclaimable
skip
if watermark is not met
order = 0
loop again
/* watermark is met */
clear congested
}
}
i.e. it clears ZONE_CONGESTED if it the zone is balanced. if not,
it restarts balancing at order-0. However, if the higher zones are
balanced for order-0, kswapd will miss clearing ZONE_CONGESTED as
that only happens after a zone is shrunk. This can mean that
wait_iff_congested() stalls unnecessarily.
This patch makes kswapd clear ZONE_CONGESTED during its initial
highmem->dma scan for zones that are already balanced.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4d3e9e763 upstream.
Stable note: Not tracked in Bugzilla. This patch augments an earlier commit
that avoids scanning priority being artificially raised. The older
fix was particularly important for small memcgs to avoid calling
wait_iff_congested() unnecessarily.
Without swap, anonymous pages are not scanned. As such, they should not
count when considering force-scanning a small target if there is no swap.
Otherwise, targets are not force-scanned even when their effective scan
number is zero and the other conditions--kswapd/memcg--apply.
This fixes 246e87a939 ("memcg: fix get_scan_count() for small
targets").
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 938929f14c upstream.
Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 .
Large machines with 1TB or more of RAM take a long time to boot
without this patch and may spew out soft lockup warnings.
When min_free_kbytes is updated, some pageblocks are marked
MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early
in boot but on large machines with 1TB of memory, this has been reported
to delay boot times, probably due to the NUMA distances involved.
The bulk of the work is due to calling calling pageblock_is_reserved() an
unnecessary amount of times and accessing far more struct page metadata
than is necessary. This patch significantly reduces the amount of work
done by setup_zone_migrate_reserve() improving boot times on 1TB machines.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbcb87883 upstream.
Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=721039 .
Without the patch, memory hot-add can fail for kernel configurations
that do not set CONFIG_SPARSEMEM_VMEMMAP.
(Resending as I am not seeing it in -next so maybe it got lost)
mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
It is expected that memory being brought online is PageReserved
similar to what happens when the page allocator is being brought up.
Memory is onlined in "memory blocks" which consist of one or more
sections. Unfortunately, the code that verifies PageReserved is
currently assuming that the memmap backing all these pages is virtually
contiguous which is only the case when CONFIG_SPARSEMEM_VMEMMAP is set.
As a result, memory hot-add is failing on those configurations with
the message;
kernel: section number XXX page number 256 not reserved, was it already online?
This patch updates the PageReserved check to lookup struct page once
per section to guarantee the correct struct page is being checked.
[Check pages within sections properly: rientjes@google.com]
[original patch by: nfont@linux.vnet.ibm.com]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a1cb2c60dd upstream.
Stable note: Not tracked on Bugzilla. This patch is known to make a big
difference to tmpfs performance on larger machines.
This was found to adversely affect tmpfs I/O performance.
Tests run on a 640 cpu UV system.
With 120 threads doing parallel writes, each to different tmpfs mounts:
No patch: ~300 MB/sec
With vm_stat alignment: ~430 MB/sec
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 751f188dd5 upstream.
This patch fixes a crash when a discard request is sent during mirror
recovery.
Firstly, some background. Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
simultaneously recovered regions (by default the semaphore value is 1,
so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
__rh_recovery_prepare asks the log driver for the next region to
recover. Then, it sets the region state to DM_RH_RECOVERING. If there
are no pending I/Os on this region, the region is added to
quiesced_regions list. If there are pending I/Os, the region is not
added to any list. It is added to the quiesced_regions list later (by
dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
flight on this region. The region is popped from the list in
dm_rh_recovery_start function. Then, a kcopyd job is started in the
recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
dm_rh_recovery_end. dm_rh_recovery_end adds the region to
recovered_regions or failed_recovered_regions list (depending on
whether the copy operation was successful or not).
The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.
Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.
There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.
These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb (dm raid1: support discard).
In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING). However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.
This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.
Reference: https://bugzilla.redhat.com/837607
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6727932cf upstream.
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).
The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.
There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:
UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.
The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c7e7f6c07 upstream.
Offlining memory may block forever, waiting for kswapd() to wake up
because kswapd() does not check the event kthread->should_stop before
sleeping.
The proper pattern, from Documentation/memory-barriers.txt, is:
--- waker ---
event_indicated = 1;
wake_up_process(event_daemon);
--- sleeper ---
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (event_indicated)
break;
schedule();
}
set_current_state() may be wrapped by:
prepare_to_wait();
In the kswapd() case, event_indicated is kthread->should_stop.
=== offlining memory (waker) ===
kswapd_stop()
kthread_stop()
kthread->should_stop = 1
wake_up_process()
wait_for_completion()
=== kswapd_try_to_sleep (sleeper) ===
kswapd_try_to_sleep()
prepare_to_wait()
.
.
schedule()
.
.
finish_wait()
The schedule() needs to be protected by a test of kthread->should_stop,
which is wrapped by kthread_should_stop().
Reproducer:
Do heavy file I/O in background.
Do a memory offline/online in a tight loop
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd60042cc1 upstream.
When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.
Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.
Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org>
Reported-by: Bill Robertson <bill_robertson@debortoli.com.au>
Reported-by: Dion Edwards <dion_edwards@debortoli.com.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 3e997130bd
The leap second rework unearthed another issue of inconsistent data.
On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.
This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.
Add the missing update call, so all the data is consistent everywhere.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 5baefd6d84
The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.
clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.
In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.
So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.
ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().
The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.
This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of f6c06abfb3
To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 4873fa070a
The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.
The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix
Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of f55a6faa38
clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().
For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.
Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.
[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
rid of all this ifdeffery ASAP ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of dd48d708ff
When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 6b43ae8a61
This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.
Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
write_seqlock_irq(&xtime_lock);
process_adjtimex_modes()
process_adj_status()
ntp_start_leap_timer()
hrtimer_start()
hrtimer_reprogram()
tick_program_event()
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock); [DEADLOCK]
This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.
NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.
Original commit message below:
Since commit 7dffa3c673 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.
Thomas diagnosed this as the following pattern:
CPU 0 CPU 1
do_adjtimex()
spin_lock_irq(&ntp_lock);
process_adjtimex_modes(); timer_interrupt()
process_adj_status(); do_timer()
ntp_start_leap_timer(); write_lock(&xtime_lock);
hrtimer_start(); update_wall_time();
hrtimer_reprogram(); ntp_tick_length()
tick_program_event() spin_lock(&ntp_lock);
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock);
This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.
The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ) after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).
This patch applies on top of tip/timers/core.
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8cdddb8d6 upstream.
Don't validate interface combinations on a stopped
interface. Otherwise we might end up being able to
create a new interface with a certain type, but
won't be able to change an existing interface
into that type.
This also skips some other functions when
interface is stopped and changing interface type.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[Fixes regression introduced by cherry pick of 463454b5db]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit e76b8ee25e upstream.
I couldn't find the vendor ID in any of the online databases, but this
mat has a Pump It Up logo on the top side of the controller compartment,
and a disclaimer stating that Andamiro will not be liable on the bottom.
Signed-off-by: Yuri Khan <yurivkhan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0efa8f23a upstream.
SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efd821182c upstream.
On rt2x00_dmastart() we increase index specified by Q_INDEX and on
rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries
between Q_INDEX_DONE and Q_INDEX are those we currently process in the
hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can
submit to the hardware.
According to that fix rt2x00usb_kick_queue(), as we need to submit RX
entries that are not processed by the hardware. It worked before only
for empty queue, otherwise was broken.
Note that for TX queues indexes ordering are ok. We need to kick entries
that have filled skb, but was not submitted to the hardware, i.e.
started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set.
From practical standpoint this fixes RX queue stall, usually reproducible
in AP mode, like for example reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=828824
Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy>
Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05d290d66b upstream.
If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns. In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted. Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.
The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come. (On my systems, this happens
pretty reliably within about 5 to 500 iterations. Others report that
it manages to loop ~forever sometimes; YMMV.)
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)
void handler(int signum) {}
int main()
{
struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
CHECK(sigaction(SIGCHLD, &act, NULL));
CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
for (;;) {
int fd;
pid_t pid;
putc('.', stderr);
CHECK(pid = fork());
if (pid == 0) {
CHECK(fd = open("fifo", O_RDONLY));
_exit(0);
}
CHECK(fd = open("fifo", O_WRONLY));
CHECK(close(fd));
CHECK(waitpid(pid, NULL, 0));
}
}
This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:
http://bugs.debian.org/678852
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88ca518b0b upstream.
intel_ips driver spews the warning message
"ME failed to update for more than 1s, likely hung"
at each second endlessly on HP ProBook laptops with IronLake.
As this has never worked, better to blacklist the driver for now.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 596fd46268 upstream.
We don't need to open code the divide function, just use div_u64 that
already exists and do the same job. While this is a straightforward
clean up, there is more to that, the real motivation for this.
While building on a cross compiling environment in armel, using gcc
4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), I was getting the following build
error:
ERROR: "__aeabi_uldivmod" [drivers/mtd/nand/nandsim.ko] undefined!
After investigating with objdump and hand built assembly version
generated with the compiler, I narrowed __aeabi_uldivmod as being
generated from the divide function. When nandsim.c is built with
-fno-inline-functions-called-once, that happens when
CONFIG_DEBUG_SECTION_MISMATCH is enabled, the do_div optimization in
arch/arm/include/asm/div64.h doesn't work as expected with the open
coded divide function: even if the do_div we are using doesn't have a
constant divisor, the compiler still includes the else parts of the
optimized do_div macro, and translates the divisions there to use
__aeabi_uldivmod, instead of only calling __do_div_asm -> __do_div64 and
optimizing/removing everything else out.
So to reproduce, gcc 4.6 plus CONFIG_DEBUG_SECTION_MISMATCH=y and
CONFIG_MTD_NAND_NANDSIM=m should do it, building on armel.
After this change, the compiler does the intended thing even with
-fno-inline-functions-called-once, and optimizes out as expected the
constant handling in the optimized do_div on arm. As this also avoids a
build issue, I'm marking for Stable, as I think is applicable for this
case.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91f68c89d8 upstream.
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.
The problem was initially reported by Torsten Hilbrich here:
https://lkml.org/lkml/2012/6/18/54
and also reported independently here:
http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
and then Richard W.M. Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue. This patch has been
confirmed to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=835019
The main problem is here, in __getblk_slow:
for (;;) {
struct buffer_head * bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page. I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding. However, we also continue to loop for other cases, like the
block lying beond the end of the disk. So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).
The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
[ Jens is on vacation, taking this directly - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41002f8dd5 upstream.
We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.
Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f68b4c2e1 upstream.
Current WARN msg is only for the ati_ixp4x0 board, while this function
is used by mulitple platforms. So this one board specific warning
is not appropriate any more.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae10ccdc30 upstream.
Currently when acpi_skip_timer_override is set, it only cover the
(source_irq == 0 && global_irq == 2) cases. While there is also
platform which need use this option and its global_irq is not 2.
This patch will extend acpi_skip_timer_override to cover all
timer overriding cases as long as the source irq is 0.
This is the first part of a fix to kernel bug bugzilla 40002:
"IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002
Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ab4233dd0 upstream.
Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).
The bug was introduced by commit 90ed52ebe4 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")
[bwh: Backported to 3.2:
- Adjust context
- madvise_remove() calls vmtruncate_range(), not do_fallocate()]
[luto: Backported to 3.0: Adjust context]
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fea9f718b3 upstream.
There is a bug in the below scenario for !CONFIG_MMU:
1. create a new file
2. mmap the file and write to it
3. read the file can't get the correct value
Because
sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()
which causes the page to be zeroed.
Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bf2bba375 upstream.
If page migration cannot charge the temporary page to the memcg,
migrate_pages() will return -ENOMEM. This isn't considered in memory
compaction however, and the loop continues to iterate over all
pageblocks trying to isolate and migrate pages. If a small number of
very large memcgs happen to be oom, however, these attempts will mostly
be futile leading to an enormous amout of cpu consumption due to the
page migration failures.
This patch will short circuit and fail memory compaction if
migrate_pages() returns -ENOMEM. COMPACT_PARTIAL is returned in case
some migrations were successful so that the page allocator will retry.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc448a18ae upstream.
If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored. We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location. This results in an error, and the recovery
aborts.
When we get to that last chunk we should just stop - there is nothing
more to do anyway.
This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c0544e255 upstream.
In chunk_aligned_read() we are adding data_offset before calling
is_badblock. But is_badblock also adds data_offset, so that is bad.
So move the addition of data_offset to after the call to
is_badblock.
This bug was introduced by commit 31c176ecdf
md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0. So that patch is suitable for any
-stable kernel from 3.0.y onwards. However it will need minor
revision for most of those (as the comment didn't appear until
recently).
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: ignored missing comment]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ad3341130 upstream.
It makes sense to label "Digital Thermal Sensor" as "DTS", but
unfortunately the string "dts" was already used for "Debug Store", and
/proc/cpuinfo is a user space ABI.
Therefore, rename this to "dtherm".
This conflict went into mainline via the hwmon tree without any x86
maintainer ack, and without any kind of hint in the subject.
a4659053 x86/hwmon: fix initialization of coretemp
Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2: drop the coretemp device table change]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32587371ad upstream.
Fix a regression introduced by 7eaceaccab ("block: remove per-queue
plugging"). In that patch, Jens removed the whole mm_unplug_device()
function, which used to be the trigger to make umem start to work.
We need to implement unplugging to make umem start to work, or I/O will
never be triggered.
Signed-off-by: Tao Guo <Tao.Guo@emc.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Shaohua Li <shli@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fde0a8cfd upstream.
Fix:
BUG: sleeping function called from invalid context at kernel/workqueue.c:2547
in_atomic(): 1, irqs_disabled(): 0, pid: 629, name: wpa_supplicant
2 locks held by wpa_supplicant/629:
#0: (rtnl_mutex){+.+.+.}, at: [<c08b2b84>] rtnl_lock+0x14/0x20
#1: (&trigger->leddev_list_lock){.+.?..}, at: [<c0867f41>] led_trigger_event+0x21/0x80
Pid: 629, comm: wpa_supplicant Not tainted 3.3.0-0.rc3.git5.1.fc17.i686
Call Trace:
[<c046a9f6>] __might_sleep+0x126/0x1d0
[<c0457d6c>] wait_on_work+0x2c/0x1d0
[<c045a09a>] __cancel_work_timer+0x6a/0x120
[<c045a160>] cancel_delayed_work_sync+0x10/0x20
[<f7dd3c22>] rtl8187_led_brightness_set+0x82/0xf0 [rtl8187]
[<c0867f7c>] led_trigger_event+0x5c/0x80
[<f7ff5e6d>] ieee80211_led_radio+0x1d/0x40 [mac80211]
[<f7ff3583>] ieee80211_stop_device+0x13/0x230 [mac80211]
Removing _sync is ok, because if led_on work is currently running
it will be finished before led_off work start to perform, since
they are always queued on the same mac80211 local->workqueue.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=795176
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fab363b5ff upstream.
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but
the two bits have relationship. A delayed stripe can be moved to hold list only
when preread active stripe count is below IO_THRESHOLD. If a stripe has both
the bits set, such stripe will be in delayed list and preread count not 0,
which will make such stripe never leave delayed list.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d550dda192 upstream.
This is a tiny, but important, patch to vhost.
Vhost's worker thread only called schedule() when it had no work to do, and
it wanted to go to sleep. But if there's always work to do, e.g., the guest
is running a network-intensive program like netperf with small message sizes,
schedule() was *never* called. This had several negative implications (on
non-preemptive kernels):
1. Passing time was not properly accounted to the "vhost" process (ps and
top would wrongly show it using zero CPU time).
2. Sometimes error messages about RCU timeouts would be printed, if the
core running the vhost thread didn't schedule() for a very long time.
3. Worst of all, a vhost thread would "hog" the core. If several vhost
threads need to share the same core, typically one would get most of the
CPU time (and its associated guest most of the performance), while the
others hardly get any work done.
The trivial solution is to add
if (need_resched())
schedule();
After doing every piece of work. This will not do the heavy schedule() all
the time, just when the timer interrupt decided a reschedule is warranted
(so need_resched returns true).
Thanks to Abel Gordon for this patch.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 51c9e6c773 upstream, but modified
to get this to apply on 3.0.
If the user chooses to say "no" to CONFIG_USB_XHCI_HCD on a system
with an Intel Panther Point chipset, the PCI quirks code or the EHCI
driver will switch the ports over to the xHCI host, but the xHCI driver
will never load. The ports will be powered off and seem "dead" to the
user.
Fix this by only switching the ports over if CONFIG_USB_XHCI_HCD is
either compiled in, or compiled as a module.
This patch should be backported to the 3.0 stable kernel, since it
contains the commit 69e848c209 "Intel
xhci: Support EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Eric Anholt <eric.anholt@intel.com>
Reported-by: David Bein <d.bein@f5.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbf0e4c725 upstream.
Quite a few ASUS computers experience a nasty problem, related to the
EHCI controllers, when going into system suspend. It was observed
that the problem didn't occur if the controllers were not put into the
D3 power state before starting the suspend, and commit
151b612847 (USB: EHCI: fix crash during
suspend on ASUS computers) was created to do this.
It turned out this approach messed up other computers that didn't have
the problem -- it prevented USB wakeup from working. Consequently
commit c2fb8a3fa2 (USB: add
NO_D3_DURING_SLEEP flag and revert 151b612847) was merged; it
reverted the earlier commit and added a whitelist of known good board
names.
Now we know the actual cause of the problem. Thanks to AceLan Kao for
tracking it down.
According to him, an engineer at ASUS explained that some of their
BIOSes contain a bug that was added in an attempt to work around a
problem in early versions of Windows. When the computer goes into S3
suspend, the BIOS tries to verify that the EHCI controllers were first
quiesced by the OS. Nothing's wrong with this, but the BIOS does it
by checking that the PCI COMMAND registers contain 0 without checking
the controllers' power state. If the register isn't 0, the BIOS
assumes the controller needs to be quiesced and tries to do so. This
involves making various MMIO accesses to the controller, which don't
work very well if the controller is already in D3. The end result is
a system hang or memory corruption.
Since the value in the PCI COMMAND register doesn't matter once the
controller has been suspended, and since the value will be restored
anyway when the controller is resumed, we can work around the BIOS bug
simply by setting the register to 0 during system suspend. This patch
(as1590) does so and also reverts the second commit mentioned above,
which is now unnecessary.
In theory we could do this for every PCI device. However to avoid
introducing new problems, the patch restricts itself to EHCI host
controllers.
Finally the affected systems can suspend with USB wakeup working
properly.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728
Based-on-patch-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Javier Marcet <jmarcet@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Oleksij Rempel <bug-track@fisher-privat.net>
Tested-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fe79d7600 upstream.
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.
The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dc6780587 upstream.
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.
In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.
https://launchpad.net/bugs/994247
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 863555be0c upstream.
Use rcu_dereference_protected to tell rcu that the ft_lport_lock
is held during ft_lport_create. This resolved "suspicious RCU usage"
warnings when debugging options are turned on.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48f8b64129 upstream.
The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.
Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 332a2e1244 upstream.
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors. This should make it comparable
to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.
Noticed during development of multithread support for ksh93.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 925839243d upstream.
Currently we check the sequence number of last packet received
against start_win. If a sequence hole is detected, start_win is
updated to next sequence number.
Since the rx sequence number is initialized to 0, a corner case
exists when BA setup happens immediately after association. As
0 is a valid sequence number, start_win gets increased to 1
incorrectly. This causes the first packet with sequence number 0
being dropped.
Initialize rx sequence number as 0xffff and skip adjusting
start_win if the sequence number remains 0xffff. The sequence
number will be updated once the first packet is received.
Signed-off-by: Stone Piao <piaoyun@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b5ebccc40 upstream.
When receiving an "individually addressed" action frame, the
receiver is required to return it to the sender. mac80211
gets this wrong as it also returns group addressed (mcast)
frames to the sender. Fix this and update the reference to
the new 802.11 standards version since things were shuffled
around significantly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e734568b67 upstream.
The OProfile perf backend uses a static array to keep track of the
perf events on the system. When compiling with CONFIG_CPUMASK_OFFSTACK=y
&& SMP, nr_cpumask_bits is not a compile-time constant and the build
will fail with:
oprofile_perf.c:28: error: variably modified 'perf_events' at file scope
This patch uses NR_CPUs instead of nr_cpumask_bits for the array
initialisation. If this causes space problems in the future, we can
always move to dynamic allocation for the events array.
Cc: Matt Fleming <matt@console-pimps.org>
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fcc8f9682 upstream.
This patch adds 10 device IDs for CP210x based devices from the following manufacturers:
Timewave
Clipsal
Festo
Link Instruments
Signed-off-by: Craig Shelley <craig@microtron.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb3979f64d upstream.
Distribution kernel maintainers routinely backport fixes for users that
were deemed important but not "something critical" as defined by the
rules. To users of these kernels they are very serious and failing to fix
them reduces the value of -stable.
The problem is that the patches fixing these issues are often subtle and
prone to regressions in other ways and need greater care and attention.
To combat this, these "serious" backports should have a higher barrier
to entry.
This patch relaxes the rules to allow a distribution maintainer to merge
to -stable a backported patch or small series that fixes a "serious"
user-visible performance issue. They should include additional information on
the user-visible bug affected and a link to the bugzilla entry if available.
The same rules about the patch being already in mainline still apply.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6b54f083c upstream.
This is the 2nd part of fix for kernel bugzilla 40002:
"IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002
The root cause is the buggy FW, whose ACPI tables assign the GSI 16
to 2 irqs 0 and 16(VGA), and the VGA is the right owner of GSI 16.
So add a quirk to ignore the irq0 overriding GSI 16 for the
FUJITSU SIEMENS AMILO PRO V2030 platform will solve this issue.
Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f16012610 upstream.
The acpi_pad driver can get stuck in destroy_power_saving_task()
waiting for kthread_stop() to stop a power_saving thread. The problem
is that the isolated_cpus_lock mutex is owned when
destroy_power_saving_task() calls kthread_stop(), which waits for a
power_saving thread to end, and the power_saving thread tries to
acquire the isolated_cpus_lock when it calls round_robin_cpu(). This
patch fixes the issue by making round_robin_cpu() use its own mutex.
https://bugzilla.kernel.org/show_bug.cgi?id=42981
Signed-off-by: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6db65cbb94 upstream.
This patch fixes the problem on some HP desktop machines with eDP
which give blank screens after S3 resume.
It turned out that BLC_PWM_CPU_CTL must be written after
BLC_PWM_CPU_CTL2. Otherwise it doesn't take effect on these
SNB machines.
Tested with 3.5-rc3 kernel.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49233
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bd0c15fcf upstream.
nv_two_heads() was never meant to be used outside of pre-nv50 code. The
code checks for >= NV_10 for 2 CRTCs, then downgrades a few specific
chipsets to 1 CRTC based on (pci_device & 0x0ff0).
The breakage example seen is on GTX 560Ti, with a pciid of 0x1200, which
gets detected as an NV20 (0x020x) with 1 CRTC by nv_two_heads(), causing
memory corruption because there's actually 2 CRTCs..
This switches fbcon to use the CRTC count directly from the mode_config
structure, which will also fix the same issue on Kepler boards which have
4 CRTCs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b196a4980f upstream.
We need to initialize this to false, because the is_rb callback only
ever sets it to true.
Noticed while reading through the code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6305567e7 upstream.
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.
This commit forces the delayed updates to run so the
replay code can find any modifications done. It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9fe573a65 upstream.
In sound/soc/codecs/tlv320aic3x.c
data = snd_soc_read(codec, AIC3X_PLL_PROGA_REG);
snd_soc_write(codec, AIC3X_PLL_PROGA_REG,
data | (pll_p << PLLP_SHIFT));
In the above code, pll-p value is OR'ed with previous value without
clearing it. Bug is not seen if pll-p value doesn't change across
Sampling frequency.
However on some platforms (like AM335x EVM-SK), pll-p may have different
values across different sampling frequencies. In such case, above code
configures the pll with a wrong value.
Because of this bug, when a audio stream is played with pll value
different from previous stream, audio is heard as differently(like its
stretched).
Signed-off-by: Hebbar, Gururaja <gururaja.hebbar@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcb7ad7bcb upstream.
steps to recreate:
load latest ath9k driver with AR9485
stop the network-manager and wpa_supplicant
bring the interface up
Call Trace:
[<ffffffffa0517490>] ? ath_hw_check+0xe0/0xe0 [ath9k]
[<ffffffff812cd1e8>] __const_udelay+0x28/0x30
[<ffffffffa03bae7a>] ar9003_get_pll_sqsum_dvc+0x4a/0x80 [ath9k_hw]
[<ffffffffa05174eb>] ath_hw_pll_work+0x5b/0xe0 [ath9k]
[<ffffffff810744fe>] process_one_work+0x11e/0x470
[<ffffffff8107530f>] worker_thread+0x15f/0x360
[<ffffffff810751b0>] ? manage_workers+0x230/0x230
[<ffffffff81079af3>] kthread+0x93/0xa0
[<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
[<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815fd3a0>] ? gs_change+0x13/0x13
ensure that the PLL-WAR for AR9485/AR9340 is executed only if the STA is
associated (or) IBSS/AP mode had started beaconing. Ideally this WAR
is needed to recover from some rare beacon stuck during stress testing.
Before the STA is associated/IBSS had started beaconing, PLL4(0x1618c)
always seem to have zero even though we had configured PLL3(0x16188) to
query about PLL's locking status. When we keep on polling infinitely PLL4's
8th bit(ie check for PLL locking measurements is done), machine hangs
due to softlockup.
fixes https://bugzilla.redhat.com/show_bug.cgi?id=811142
Reported-by: Rolf Offermanns <rolf.offermanns@gmx.net>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1df2ae31c7 upstream.
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit adee11b208 upstream.
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbb24a3a91 upstream.
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.
Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes. Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.
For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number. They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.
However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset. Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.
This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.
nilfs_palloc_freev: entry number 307234 already freed.
...
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac852edb47 upstream.
Key lookups may call read_smc() with a fixed-length key string,
and if the lookup fails, trailing stack content may appear in the
kernel log. Fixed with this patch.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 954fba0274 ]
Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :
"skb->len += len;" instead of "skb_put(skb, len);"
Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
only packet headers would be copied, leaving garbage in the payload.
However the skb_realloc_headroom() must be avoided as much as possible
since it requires memory and netpoll tries hard to work even if memory
is exhausted (using a pool of preallocated skbs)
It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
which happens to work for typicall drivers but not all.
Right thing is to use LL_RESERVED_SPACE(dev)
(And also add dev->needed_tailroom of tailroom)
This patch combines both fixes.
Many thanks to Bogdan for raising this issue.
Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ff0feac88 ]
The newer flavors of Yukon II use a different method for receive
checksum offload. This is indicated in the driver by the SKY2_HW_NEW_LE
flag. On these newer chips, the BMU_ENA_RX_CHKSUM should not be set.
The driver would get incorrectly toggle the bit, enabling the old
checksum logic on these chips and cause a BUG_ON() assertion. If
receive checksum was toggled via ethtool.
Reported-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d189634eca ]
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.
This opens up a short time frame to access fib_table_hash with its pants
down.
Move the registration of the proc files to a later point in the init
order to avoid the race.
Tested :-)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ee31c6898 ]
In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping. This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.
When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit. In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.
Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.
If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)
This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :
netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.
Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.
Based on a previous patch from Tom Herbert.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 149ddd83a9 ]
This ensures that bridges created with brctl(8) or ioctl(2) directly
also carry IFLA_LINKINFO when dumped over netlink. This also allows
to create a bridge with ioctl(2) and delete it with RTM_DELLINK.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16b0dc29c1 ]
Trying to "modprobe dummy numdummies=30000" triggers :
INFO: rcu_sched self-detected stall on CPU { 8} (t=60000 jiffies)
After this splat, RTNL is locked and reboot is needed.
We must call cond_resched() to avoid this, even holding RTNL.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 20e2a86485 ]
When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system
receives a CIPSO tagged packet it is dropped (cipso_v4_validate()
returns non-zero). In most cases this is the correct and desired
behavior, however, in the case where we are simply forwarding the
traffic, e.g. acting as a network bridge, this becomes a problem.
This patch fixes the forwarding problem by providing the basic CIPSO
validation code directly in ip_options_compile() without the need for
the NetLabel or CIPSO code. The new validation code can not perform
any of the CIPSO option label/value verification that
cipso_v4_validate() does, but it can verify the basic CIPSO option
format.
The behavior when NetLabel is enabled is unchanged.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc9b17ad29 ]
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7deabca0ac upstream.
We can stall RCU processing on SMP platforms if a CPU sits in its idle
loop for a long time. This happens because we don't call irq_enter()
and irq_exit() around generic_smp_call_function_interrupt() and
friends. Add the necessary calls, and remove the one from within
ipi_timer(), so that they're all in a common place.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[add irq_enter()/irq_exit() in do_local_timer]
Signed-off-by: UCHINO Satoshi <satoshi.uchino@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc1d770291 upstream.
We have a bug report where the kernel hits a warning in the cpumask
code:
WARNING: at include/linux/cpumask.h:107
Which is:
WARN_ON_ONCE(cpu >= nr_cpumask_bits);
The backtrace is:
cpu_cmd
cmds
xmon_core
xmon
die
xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still
open coding this but iterating above nr_cpu_ids is definitely a bug.
This patch iterates through all possible cpus, in case we issue a
system reset and CPUs in an offline state call in.
Perhaps the old code was trying to handle CPUs that were in the
partition but were never started (eg kexec into a kernel with an
nr_cpus= boot option). They are going to die way before we get into
xmon since we haven't set any kernel state up for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3a3dd074f upstream.
TEAC's UD-H01 (and probably other devices) have a gap in the interface
number allocation of their descriptors:
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 220
bNumInterfaces 3
[...]
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
[...]
Interface Association:
bLength 8
bDescriptorType 11
bFirstInterface 2
bInterfaceCount 2
bFunctionClass 1 Audio
bFunctionSubClass 0
bFunctionProtocol 32
iFunction 4
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
[...]
Once a configuration is selected, usb_set_configuration() walks the
known interfaces of a given configuration and calls find_iad() on
each of them to set the interface association pointer the interface
is included in.
The problem here is that the loop variable is taken for the interface
number in the comparison logic that gathers the association. Which is
fine as long as the descriptors are sane.
In the case above, however, the logic gets out of sync and the
interface association fields of all interfaces beyond the interface
number gap are wrong.
Fix this by passing the interface's bInterfaceNumber to find_iad()
instead.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: bEN <ml_all@circa.be>
Reported-by: Ivan Perrone <ivanperrone@hotmail.com>
Tested-by: ivan perrone <ivanperrone@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 954c3f8a5f upstream.
We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.
An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:
May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected
sysfs view of the same problem:
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
So we end up with a mismatch between the USB driver and the
USB serial driver. The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.
This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver. This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:
[11811.646396] drivers/usb/serial/usb-serial.c: get_free_serial 1
[11811.646443] drivers/usb/serial/usb-serial.c: get_free_serial - minor base = 0
[11811.646460] drivers/usb/serial/usb-serial.c: usb_serial_probe - registering ttyUSB0
[11811.646766] usb 6-1: pl2303 converter now attached to ttyUSB0
[11812.264197] USB Serial deregistering driver FTDI USB Serial Device
[11812.264865] usbcore: deregistering interface driver ftdi_sio
[11812.282180] USB Serial deregistering driver pl2303
[11812.283141] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[11812.283272] usbcore: deregistering interface driver pl2303
[11812.301056] USB Serial deregistering driver generic
[11812.301186] usbcore: deregistering interface driver usbserial_generic
[11812.301259] drivers/usb/serial/usb-serial.c: usb_serial_disconnect
[11812.301823] BUG: unable to handle kernel paging request at f8e7438c
[11812.301845] IP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.301871] *pde = 357ef067 *pte = 00000000
[11812.301957] Oops: 0000 [#1] PREEMPT SMP
[11812.301983] Modules linked in: usbserial(-) [last unloaded: pl2303]
[11812.302008]
[11812.302019] Pid: 1323, comm: modprobe Tainted: G W 3.4.0-rc7+ #101 Dell Inc. Vostro 1520/0T816J
[11812.302115] EIP: 0060:[<f8e38445>] EFLAGS: 00010246 CPU: 1
[11812.302130] EIP is at usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.302141] EAX: f508a180 EBX: f508a180 ECX: 00000000 EDX: f8e74300
[11812.302151] ESI: f5050800 EDI: 00000001 EBP: f5141e78 ESP: f5141e58
[11812.302160] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11812.302170] CR0: 8005003b CR2: f8e7438c CR3: 34848000 CR4: 000007d0
[11812.302180] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11812.302189] DR6: ffff0ff0 DR7: 00000400
[11812.302199] Process modprobe (pid: 1323, ti=f5140000 task=f61e2bc0 task.ti=f5140000)
[11812.302209] Stack:
[11812.302216] f8e3be0f f8e3b29c f8e3ae00 00000000 f513641c f5136400 f513641c f507a540
[11812.302325] f5141e98 c133d2c1 00000000 00000000 f509c400 f513641c f507a590 f5136450
[11812.302372] f5141ea8 c12f0344 f513641c f507a590 f5141ebc c12f0c67 00000000 f507a590
[11812.302419] Call Trace:
[11812.302439] [<c133d2c1>] usb_unbind_interface+0x51/0x190
[11812.302456] [<c12f0344>] __device_release_driver+0x64/0xb0
[11812.302469] [<c12f0c67>] driver_detach+0x97/0xa0
[11812.302483] [<c12f001c>] bus_remove_driver+0x6c/0xe0
[11812.302500] [<c145938d>] ? __mutex_unlock_slowpath+0xcd/0x140
[11812.302514] [<c12f0ff9>] driver_unregister+0x49/0x80
[11812.302528] [<c1457df6>] ? printk+0x1d/0x1f
[11812.302540] [<c133c50d>] usb_deregister+0x5d/0xb0
[11812.302557] [<f8e37c55>] ? usb_serial_deregister+0x45/0x50 [usbserial]
[11812.302575] [<f8e37c8d>] usb_serial_deregister_drivers+0x2d/0x40 [usbserial]
[11812.302593] [<f8e3a6e2>] usb_serial_generic_deregister+0x12/0x20 [usbserial]
[11812.302611] [<f8e3acf0>] usb_serial_exit+0x8/0x32 [usbserial]
[11812.302716] [<c1080b48>] sys_delete_module+0x158/0x260
[11812.302730] [<c110594e>] ? mntput+0x1e/0x30
[11812.302746] [<c145c3c3>] ? sysenter_exit+0xf/0x18
[11812.302746] [<c107777c>] ? trace_hardirqs_on_caller+0xec/0x170
[11812.302746] [<c145c390>] sysenter_do_call+0x12/0x36
[11812.302746] Code: 24 02 00 00 e8 dd f3 20 c8 f6 86 74 02 00 00 02 74 b4 8d 86 4c 02 00 00 47 e8 78 55 4b c8 0f b6 43 0e 39 f8 7f a9 8b 53 04 89 d8 <ff> 92 8c 00 00 00 89 d8 e8 0e ff ff ff 8b 45 f0 c7 44 24 04 2f
[11812.302746] EIP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial] SS:ESP 0068:f5141e58
[11812.302746] CR2: 00000000f8e7438c
Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing. This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4707f3f8 upstream.
Currently CDC-ACM devices stay throttled when their TTY is closed while
throttled, stalling further communication attempts after the next open.
Unthrottling during open/activate got lost starting with kernel
3.0.0 and this patch reintroduces it.
Signed-off-by: Otto Meta <otto.patches@sister-shadow.de>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2fb8a3fa2 upstream.
This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers. Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.
After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state. Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.
The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep. Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.
A similar patch has already been applied as commit
151b612847 (USB: EHCI: fix crash during
suspend on ASUS computers). The patch supersedes that one and reverts
it. There are two differences:
The old patch added the flag at the USB level; this patch
adds it at the PCI level.
The old patch applied to all chipsets with the same vendor,
subsystem vendor, and product IDs; this patch makes an
exception for a known-good system (based on DMI information).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c41444ccfa upstream.
Some additional IDs found in the BSD/GPL licensed out-of-tree
GobiSerial driver from Sierra Wireless.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c87663ee upstream.
The __devinitconst section can't be referenced
from usb_serial_device structure. Thus removed it as
it done in other mos* device drivers.
Error itself:
WARNING: drivers/usb/serial/mos7840.o(.data+0x8): Section mismatch in reference
from the variable moschip7840_4port_device to the variable
.devinit.rodata:id_table
The variable moschip7840_4port_device references
the variable __devinitconst id_table
[v2] no attach now
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 622eb783fe upstream.
When system software decides to power down the xHC with the intent of
resuming operation at a later time, it will ask xHC to save the internal
state and restore it when resume to correctly recover from a power event.
Two bits are used to enable this operation: Save State and Restore State.
xHCI spec 4.23.2 says software should "Set the Controller Save/Restore
State flag in the USBCMD register and wait for the Save/Restore State
Status flag in the USBSTS register to transition to '0'". However, it does
not define how long software should wait for the SSS/RSS bit to transition
to 0.
Currently the timeout is set to 1ms. There is bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1002697)
indicates that the timeout is too short for ASMedia ASM1042 host controller
to save/restore the state successfully. Increase the timeout to 10ms helps to
resolve the issue.
This patch should be backported to stable kernels as old as 2.6.37, that
contain the commit 5535b1d5f8 "USB: xHCI:
PCI power management implementation"
Signed-off-by: Andiry Xu <andiry.xu@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6dc8c0421 upstream.
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.
Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4273f9878b upstream.
Commit 8b4c6a3ab5 ("USB: option: Use generic USB wwan code")
moved option port-data allocation to usb_wwan_startup but still cast the
port data to the old struct...
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c3aab315 upstream.
Fix memory leak introduced by commit 383cedc3bb ("USB: serial:
full autosuspend support for the option driver") which allocates
usb-serial data but never frees it.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42ca7da1c2 upstream.
Later firmwares for this device now have proper subclass and
protocol info so we can identify it nicely without needing to use
the blacklist. I'm not removing the old 0xff matching as there
may be devices in the field that still need that.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3b02ae586 upstream.
If the call to svc_process_common() fails, then the request
needs to be freed before we can exit bc_svc_process.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e62625420 upstream.
Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs
This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 350ab15bb2 upstream.
The statically defined I/O memory regions for the i.MX21 on chip
peripherals and the on board I/O peripherals of the i.MX21ADS board
overlap. This results in a kernel crash during startup. This is fixed
by reducing the memory range for the on board I/O peripherals to the
actually required range.
Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c50ac05081 and
4523e14585 upstream.
When called for anonymous (non-shared) mappings, hugetlb_reserve_pages()
does a resv_map_alloc(). It depends on code in hugetlbfs's
vm_ops->close() to release that allocation.
However, in the mmap() failure path, we do a plain unmap_region() without
the remove_vma() which actually calls vm_ops->close().
This is a decent fix. This leak could get reintroduced if new code (say,
after hugetlb_reserve_pages() in hugetlbfs_file_mmap()) decides to return
an error. But, I think it would have to unroll the reservation anyway.
Christoph's test case:
http://marc.info/?l=linux-mm&m=133728900729735
This patch applies to 3.4 and later. A version for earlier kernels is at
https://lkml.org/lkml/2012/5/22/418.
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: Christoph Lameter <cl@linux.com>
Tested-by: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbda591d92 upstream.
The transfer of ->flags causes some of the static mapping virtual
addresses to be prematurely freed (before the mapping is removed) because
VM_LAZY_FREE gets "set" if tmp->flags has VM_IOREMAP set. This might
cause subsequent vmalloc/ioremap calls to fail because it might allocate
one of the freed virtual address ranges that aren't unmapped.
va->flags has different types of flags from tmp->flags. If a region with
VM_IOREMAP set is registered with vm_area_add_early(), it will be removed
by __purge_vmap_area_lazy().
Fix vmalloc_init() to correctly initialize vmap_area for the given
vm_struct.
Also initialise va->vm. If it is not set, find_vm_area() for the early
vm regions will always fail.
Signed-off-by: KyongHo Cho <pullip.cho@samsung.com>
Cc: "Olav Haugan" <ohaugan@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31c15a2f24 upstream.
Virtual Machines with emulated e1000 network adapter running on Parallels'
server were seeing kernel panics due to the e1000 driver dereferencing an
unexpected NULL pointer retrieved from buffer_info->skb.
The problem has been addressed for the e1000e driver, but not for the e1000.
Since the two drivers share similar code in the affected area, a port of the
following e1000e driver commit solves the issue for the e1000 driver:
commit 9ed318d546
Author: Tom Herbert <therbert@google.com>
Date: Wed May 5 14:02:27 2010 +0000
e1000e: save skb counts in TX to avoid cache misses
In e1000_tx_map, precompute number of segements and bytecounts which
are derived from fields in skb; these are stored in buffer_info. When
cleaning tx in e1000_clean_tx_irq use the values in the associated
buffer_info for statistics counting, this eliminates cache misses
on skb fields.
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roman Kagan <rkagan@parallels.com>
commit 45c72cd73c upstream.
Now we store attr->ino at inode->i_ino, return attr->ino at the
first time and then return inode->i_ino if the attribute timeout
isn't expired. That's wrong on 32 bit platforms because attr->ino
is 64 bit and inode->i_ino is 32 bit in this case.
Fix this by saving 64 bit ino in fuse_inode structure and returning
it every time we call getattr. Also squash attr->ino into inode->i_ino
explicitly.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f227d4306c upstream.
Currently, the APIC LVT interrupt for error thresholding is implicitly
enabled. However, there are models in the F15h range which do not enable
it. Make the code machinery which sets up the APIC interrupt support
an optional setting and add an ->interrupt_capable member to the bank
representation mirroring that capability and enable the interrupt offset
programming only if it is true.
Simplify code and fixup comment style while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
commit d6ee27eb13 upstream.
When we remove a key, we put a key index which was supposed
to tell the fw that we are actually removing the key. But
instead the fw took that index as a valid index and messed
up the SRAM of the device.
This memory corruption on the device mangled the data of
the SCD. The impact on the user is that SCD queue 2 got
stuck after having removed keys.
The message is the log that was printed is:
Queue 2 stuck for 10000ms
This doesn't seem to fix the higher queues that get stuck
from time to time.
Reviewed-by: Meenakshi Venkataraman <meenakshi.venkataraman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a841f8cef4 upstream.
It does not get processed because sched_domain_level_max is 0 at the
time that setup_relax_domain_level() is run.
Simply accept the value as it is, as we don't know the value of
sched_domain_level_max until sched domain construction is completed.
Fix sched_relax_domain_level in cpuset. The build_sched_domain() routine calls
the set_domain_attribute() routine prior to setting the sd->level, however,
the set_domain_attribute() routine relies on the sd->level to decide whether
idle load balancing will be off/on.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120605184436.GA15668@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 941a956b0e upstream.
On high CPU load the accumulating values in the running_avg_cap
register are very low (below 10), so averaging them too early leads
to unnecessary poor output resolution. Since we pretend to output
micro-Watt we better keep all the bits we have as long as possible.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f461f27a44 upstream.
Fix the issue of C_CAN interrupts getting disabled forever when canconfig
utility is used multiple times. According to NAPI usage we disable all
the hardware interrupts in ISR and re-enable them in poll(). Current
implementation calls napi_enable() after hardware interrupts are enabled.
If we get any interrupts between these two steps then we do not process
those interrupts because napi is not enabled. Mostly these interrupts
come because of STATUS is not 0x7 or ERROR interrupts. If napi_enable()
happens before HW interrupts enabled then c_can_poll() function will be
called eventual re-enabling.
This patch moves the napi_enable() call before interrupts enabled.
Signed-off-by: AnilKumar Ch <anilkumar@ti.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 148c87c89e upstream.
This patch fixes an interrupt thrash issue with c_can driver.
In c_can_isr() function interrupts are disabled and enabled only in
c_can_poll() function. c_can_isr() & c_can_poll() both read the
irqstatus flag. However, irqstatus is always read as 0 in c_can_poll()
because all C_CAN interrupts are disabled in c_can_isr(). This causes
all interrupts to be re-enabled in c_can_poll() which in turn causes
another interrupt since the event is not really handled. This keeps
happening causing a flood of interrupts.
To fix this, read the irqstatus register in isr and use the same cached
value in the poll function.
Signed-off-by: AnilKumar Ch <anilkumar@ti.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 617caccebe upstream.
This patch fixes an issue with transmit routine, which causes
"can_put_echo_skb: BUG! echo_skb is occupied!" message when
using "cansequence -p" on D_CAN controller.
In c_can driver, while transmitting packets tx_echo flag holds
the no of can frames put for transmission into the hardware.
As the comment above c_can_do_tx() indicates, if we find any packet
which is not transmitted then we should stop looking for more.
In the current implementation this is not taken care of causing the
said message.
Also, fix the condition used to find if the packet is transmitted
or not. Current code skips the first tx message object and ends up
checking one extra invalid object.
While at it, fix the comment on top of c_can_do_tx() to use the
terminology "packet" instead of "package" since it is more
standard.
Signed-off-by: AnilKumar Ch <anilkumar@ti.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 463454b5db upstream.
If a given interface combination doesn't contain
a required interface type then we missed checking
that and erroneously allowed it even though iface
type wasn't there at all. Add a check that makes
sure that all interface types are accounted for.
Reported-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71ecfa1893 upstream.
When any interface goes down, it could be the one that we
were doing a remain-on-channel with. We therefore need to
cancel the remain-on-channel and flush the related work
structs so they don't run after the interface has been
removed or even destroyed.
It's also possible in this case that an off-channel SKB
was never transmitted, so free it if this is the case.
Note that this can also happen if the driver finishes
the off-channel period without ever starting it.
Reported-by: Nirav Shah <nirav.j2.shah@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c75296562 upstream.
This fixes a problem which can causes kernel oopses while loading
a kernel module.
According to the PowerPC EABI specification, GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), GPR r11 is also used
to generate trampoline code.
This combination crashes the kernel, in the case where the compiler
chooses to use a helper function for saving GPRs on entry, and the
module loader has placed the .init.text section far away from the
.text section, meaning that it has to generate a trampoline for
functions in the .init.text section to call the GPR save helper.
Because the trampoline trashes r11, references to the stack frame
using r11 can cause an oops.
The fix just uses GPR r12 instead of GPR r11 for generating the
trampoline code. According to the statements from Freescale, this is
safe from an EABI perspective.
I've tested the fix for kernel 2.6.33 on MPC8541.
Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com>
[paulus@samba.org: reworded the description]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbf8ae32f6 upstream.
The memory the parameter __key points to is used as an iterator in
btree_get_prev(), so if we save off a bkey() pointer in retry_key and
then assign that to __key, we'll end up corrupting the btree internals
when we do eg
longcpy(__key, bkey(geo, node, i), geo->keylen);
to return the key value. What we should do instead is use longcpy() to
copy the key value that retry_key points to __key.
This can cause a btree to get corrupted by seemingly read-only
operations such as btree_for_each_safe.
[akpm@linux-foundation.org: avoid the double longcpy()]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Joern Engel <joern@logfs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22b1f178f upstream.
Commit 7990696 uses the ext4_{set,clear}_inode_flags() functions to
change the i_flags automatically but fails to remove the error setting
of i_flags. So we still have the problem of trashing state flags.
Fix this by removing the assignment.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f380f2c4a1 upstream.
This driver disables interrupt just after requesting it and enables it
later, after interface is up. However currently there is a time window
between request_irq() and disable_irq() where if interrupt arrives, the
driver oopses because it's not yet ready to process it. This can be
reproduced by inserting the module, associating and removing the module
multiple times.
Eliminate this race by setting IRQF_NOAUTOEN flag before request_irq().
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c597145696 upstream.
We only need to regenerate the sysfs files when the capacity units
change, avoid the update otherwise.
The origin of this issue is dates way back to 2.6.38:
da8aeb92d4
(ACPI / Battery: Update information on info notification and resume)
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Tested-by: Ralf Jung <post@ralfj.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95599968d1 upstream.
We can't have references held on pages in the s_buddy_cache while we are
trying to truncate its pages and put the inode. All the pages must be
gone before we reach clear_inode. This can only be gauranteed if we
can prevent new users from grabbing references to s_buddy_cache's pages.
The original bug can be reproduced and the bug fix can be verified by:
while true; do mount -t ext4 /dev/ram0 /export/hda3/ram0; \
umount /export/hda3/ram0; done &
while true; do cat /proc/fs/ext4/ram0/mb_groups; done
Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02b7831019 upstream.
ext4_free_blocks fails to pair an ext4_mb_load_buddy with a matching
ext4_mb_unload_buddy when it fails a memory allocation.
Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79906964a1 upstream.
In commit 353eb83c we removed i_state_flags with 64-bit longs, But
when handling the EXT4_IOC_SETFLAGS ioctl, we replace i_flags
directly, which trashes the state flags which are stored in the high
32-bits of i_flags on 64-bit platforms. So use the the
ext4_{set,clear}_inode_flags() functions which use atomic bit
manipulation functions instead.
Reported-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3fc0210c0 upstream.
The ext4_error() function is missing a call to save_error_info().
Since this is the function which marks the file system as containing
an error, this oversight (which was introduced in 2.6.36) is quite
significant, and should be backported to older stable kernels with
high urgency.
Reported-by: Ken Sumrall <ksumrall@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ksumrall@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e84b62164 upstream.
If ext4_setup_super() fails i.e. due to a too-high revision,
the error is logged in dmesg but the fs is not mounted RO as
indicated.
Tested by:
# mkfs.ext4 -r 4 /dev/sdb6
# mount /dev/sdb6 /mnt/test
# dmesg | grep "too high"
[164919.759248] EXT4-fs (sdb6): revision level too high, forcing read-only mode
# grep sdb6 /proc/mounts
/dev/sdb6 /mnt/test2 ext4 rw,seclabel,relatime,data=ordered 0 0
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 91657eafb6 ]
Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.
According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 617c8c1123 ]
At the beginning of __skb_cow, headroom gets set to a minimum of
NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not
cloned and the headroom is just below NET_SKB_PAD, but still more than the
amount requested by the caller.
This was showing up frequently in my tests on VLAN tx, where
vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN).
Locally generated packets should have enough headroom, and for forward
paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to
add any extra space here.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59b9997bab ]
This reverts commit 8a83a00b07.
It causes regressions for S390 devices, because it does an
unconditional DST drop on SKBs for vlans and the QETH device
needs the neighbour entry hung off the DST for certain things
on transmit.
Arnd can't remember exactly why he even needed this change.
Conflicts:
drivers/net/macvlan.c
net/8021q/vlan_dev.c
net/core/dev.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d4b1133558 ]
commit c57b546840 (pktgen: fix crash at module unload) did a very poor
job with list primitives.
1) list_splice() arguments were in the wrong order
2) list_splice(list, head) has undefined behavior if head is not
initialized.
3) We should use the list_splice_init() variant to clear pktgen_threads
list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c51ce49735 ]
An application may call connect() to disconnect a socket using an
address with family AF_UNSPEC. The L2TP IP sockets were not handling
this case when the socket is not bound and an attempt to connect()
using AF_UNSPEC in such cases would result in an oops. This patch
addresses the problem by protecting the sk_prot->disconnect() call
against trying to unhash the socket before it is bound.
The patch also adds more checks that the sockaddr supplied to bind()
and connect() calls is valid.
RIP: 0010:[<ffffffff82e133b0>] [<ffffffff82e133b0>] inet_unhash+0x50/0xd0
RSP: 0018:ffff88001989be28 EFLAGS: 00010293
Stack:
ffff8800407a8000 0000000000000000 ffff88001989be78 ffffffff82e3a249
ffffffff82e3a050 ffff88001989bec8 ffff88001989be88 ffff8800407a8000
0000000000000010 ffff88001989bec8 ffff88001989bea8 ffffffff82e42639
Call Trace:
[<ffffffff82e3a249>] udp_disconnect+0x1f9/0x290
[<ffffffff82e42639>] inet_dgram_connect+0x29/0x80
[<ffffffff82d012fc>] sys_connect+0x9c/0x100
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c1833797a ]
Since commit ad0081e43a
"ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
the fragment of packets is incorrect.
because tunnel mode needs IPsec headers and trailer for all fragments,
while on transport mode it is sufficient to add the headers to the
first fragment and the trailer to the last.
so modify mtu and maxfraglen base on ipsec mode and if fragment is first
or last.
with my test,it work well(every fragment's size is the mtu)
and does not trigger slow fragment path.
Changes from v1:
though optimization, mtu_prev and maxfraglen_prev can be delete.
replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
add fuction ip6_append_data_mtu to make codes clearer.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e49cc0da72 ]
We hit a kernel OOPS.
<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416] [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357] [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764] [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024] [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955] [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450] [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312] [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730] [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261] [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960] [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834] [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224] [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817] [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538] [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111] [<c123925d>] ? sub_preempt_count+0x3d/0x50
Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.
With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.
Thank Eric for very detailed review/checking the issue.
Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Kun Jiang <kunx.jiang@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dccd9ecc37 ]
Due to RCU lookups and RCU based release, fib_info objects can
be found during lookup which have fi->fib_dead set.
We must ignore these entries, otherwise we risk dereferencing
the parts of the entry which are being torn down.
Reported-by: Yevgen Pronenko <yevgen.pronenko@sonymobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b8c30bc49 upstream.
Need to program an additional VM register. This doesn't not currently
cause any problems, but allows us to program the proper backend
map in a subsequent patch which should improve performance on these
asics.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34a5704d91 upstream.
It seems there is a bug in scan_read_raw_oob() in nand_bbt.c which
should cause wrong functioning of NAND_BBT_SCANALLPAGES option.
Artem: the patch did not apply and I had to amend it a bit.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63d37a84ab upstream.
__mnt_make_shortterm() in there undoes the effect of __mnt_make_longterm()
we'd done back when we set ->mnt_ns non-NULL; it should not be done to
vfsmounts that had never gone through commit_tree() and friends. Kudos to
lczerner for catching that one...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cd5d7c449 upstream.
The array of sample rates is reallocated every time when opening
the PCM device, but was freed only once when unplugging the device.
Reported-by: "Alexander E. Patrakov" <patrakov@gmail.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e8b506310 upstream.
I was trying to backport the following commit to RHEL-6
From 0cea73465cd22373c5cd43a3edd25fbd4bb532ef Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oliver@neukum.org>
Date: Wed, 21 Sep 2011 11:37:15 +0200
Subject: [PATCH] btusb: add device entry for Broadcom SoftSailing
and noticed it wasn't working on an HP Elitebook. Looking into the patch I
noticed a very subtle typo in the ids. The patch has '0x05ac' instead of
'0x0a5c'. A snippet of the lsusb -v output also shows this:
Bus 002 Device 003: ID 0a5c:21e1 Broadcom Corp.
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 255 Vendor Specific Class
bDeviceSubClass 1
bDeviceProtocol 1
bMaxPacketSize0 64
idVendor 0x0a5c Broadcom Corp.
idProduct 0x21e1
bcdDevice 1.12
iManufacturer 1 Broadcom Corp
iProduct 2 BCM20702A0
iSerial 3 60D819F0338C
bNumConfigurations 1
Looking at other Broadcom ids, the fix matches them whereas the original patch
matches Apple's ids.
Tested on an HP Elitebook 8760w. The btusb binds and the userspace stuff loads
correctly.
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf2125e2f7 upstream.
Otherwise the hw will get confused and result in a black screen.
This regression has been most likely introduce in
commit 974b93315b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sun Sep 5 00:44:20 2010 +0100
drm/i915/tv: Poll for DAC state change
That commit replace the first msleep(20) with a busy-loop, but failed
to keep the 2nd msleep around. Later on we've replaced all these
msleep(20) by proper vblanks.
For reference also see the commit in xf86-video-intel:
commit 1142be53eb8d2ee8a9b60ace5d49f0ba27332275
Author: Jesse Barnes <jbarnes@hobbes.lan>
Date: Mon Jun 9 08:52:59 2008 -0700
Fix TV programming: add vblank wait after TV_CTL writes
Fxies FDO bug #14000; we need to wait for vblank after
writing TV_CTL or following "DPMS on" calls may not actually enable the output.
v2: As suggested by Chris Wilson, add a small comment to ensure that
no one accidentally removes this vblank wait again - there really
seems to be no sane explanation for why we need it, but it is
required.
Launchpad: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/763688
Reported-and-Tested-by: Robert Lowery <rglowery@exemail.com.au>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59d92bfa5f upstream.
We've simply ignored this, which isn't too great. With this, interlaced
1080i works on my HDMI screen connected through sdvo. For no apparent
reason anything else still doesn't work as it should.
While at it, give these magic numbers in the dtd proper names and
add a comment that they match with EDID detailed timings.
v2: Actually use the right bit for interlaced.
Tested-by: Peter Ross <pross@xvid.org>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ebf169ad4 upstream.
Only override the ddc bus if the connector doesn't have
a valid one. The existing code overrode the ddc bus for
all connectors even if it had ddc bus.
Fixes ddc on another XFX card with the same pci ids that
was broken by the quirk overwriting the correct ddc bus.
Reported-by: Mehdi Aqadjani Memar <m.aqadjanimemar@student.ru.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b21aea04d upstream.
WLAN_STA_BLOCK_BA is set while suspending but doesn't get cleared
when resuming in case of wowlan. This causes further ADDBA requests
received to be rejected. Fix it by clearing it in the wowlan path
as well.
Signed-off-by: Eyal Shapira <eyal@wizery.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4bd8ad9bb upstream.
DMA support has finally made its way to the top of the TODO list, having
realised that a Geode using MMIO can't keep up with two ADSL2+ lines
each running at 21Mb/s.
This patch fixes a couple of bugs in the DMA support in the driver, so
once the corresponding FPGA update is complete and tested everything
should work properly.
We weren't storing the currently-transmitting skb, so we were never
unmapping it and never freeing/popping it when the TX was done.
And the addition of pci_set_master() is fairly self-explanatory.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f649c1f6f upstream.
commit 5e185581d7
Author: James Bottomley <JBottomley@Parallels.com>
[PARISC] fix PA1.1 oops on boot
Didn't quite fix the crash on boot. It moved it from PA1.1 processors to
PA2.0 narrow kernels. The final fix is to make sure the [id]tlb_miss_20 paths
also work. Even on narrow systems, these paths require using the wide
instructions becuase the tlb insertion format is wide. Fix this by
conditioning the dep[wd],z on whether we're being called from _11 or _20[w]
paths.
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed5fb2471b upstream.
In certain configurations, the resulting kernel becomes too large to boot
because the linker places the long branch stubs for the merged .text section
at the very start of the image. As a result, the initial transfer of control
jumps to an unexpected location. Fix this by placing the head text in a
separate section so the stubs for .text are not at the start of the image.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c0c2a08be upstream.
While traversing the linked list of open file handles, if the identfied
file handle is invalid, a reopen is attempted and if it fails, we
resume traversing where we stopped and cifs can oops while accessing
invalid next element, for list might have changed.
So mark the invalid file handle and attempt reopen if no
valid file handle is found in rest of the list.
If reopen fails, move the invalid file handle to the end of the list
and start traversing the list again from the begining.
Repeat this four times before giving up and returning an error if
file reopen keeps failing.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 882dde8eb0 upstream.
When BT traffic load changes from its
previous state, a new LQ command needs to be
sent down to the firmware. This needs to
be done only once per change. The state
variable that keeps track of this change is
last_bt_traffic_load. However, it was not
being updated when the change had been
handled. Not updating this variable was
causing a flood of advanced BT config
commands to be sent to the firmware. Fix
this.
Signed-off-by: Meenakshi Venkataraman <meenakshi.venkataraman@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26c191788f upstream.
When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.
PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic"
#0 [f06a9dd8] crash_kexec at c049b5ec
#1 [f06a9e2c] oops_end at c083d1c2
#2 [f06a9e40] no_context at c0433ded
#3 [f06a9e64] bad_area_nosemaphore at c043401a
#4 [f06a9e6c] __do_page_fault at c0434493
#5 [f06a9eec] do_page_fault at c083eb45
#6 [f06a9f04] error_code (via page_fault) at c083c5d5
EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
00000000
DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0
CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
#7 [f06a9f38] _spin_lock at c083bc14
#8 [f06a9f44] sys_mincore at c0507b7d
#9 [f06a9fb0] system_call at c083becd
start len
EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f
DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00
SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033
CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286
This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.
With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.
This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.
Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.
NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.
This bug was discovered and fully debugged by Ulrich, quote:
----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.
496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
*pmd)
497 {
498 /* depend on compiler for an atomic pmd read */
499 pmd_t pmdval = *pmd;
// edi = pmd pointer
0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi
...
// edx = PTE page table high address
0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx
...
// eax = PTE page table low address
0xc0507a8e <sys_mincore+574>: mov (%edi),%eax
[..]
Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.
- The PMD entry {high|low} is 0x0000000000000000.
The "mov" at 0xc0507a84 loads 0x00000000 into edx.
- A page fault (on another CPU) sneaks in between the two "mov"
instructions and instantiates the PMD.
- The PMD entry {high|low} is now 0x00000003fda38067.
The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e48982734e upstream.
Commit 6457474624 ("vmscan: detect mapped file pages used only once")
made mapped pages have another round in inactive list because they might
be just short lived and so we could consider them again next time. This
heuristic helps to reduce pressure on the active list with a streaming
IO worklods.
This patch fixes a regression introduced by this commit for heavy shmem
based workloads because unlike Anon pages, which are excluded from this
heuristic because they are usually long lived, shmem pages are handled
as a regular page cache.
This doesn't work quite well, unfortunately, if the workload is mostly
backed by shmem (in memory database sitting on 80% of memory) with a
streaming IO in the background (backup - up to 20% of memory). Anon
inactive list is full of (dirty) shmem pages when watermarks are hit.
Shmem pages are kept in the inactive list (they are referenced) in the
first round and it is hard to reclaim anything else so we reach lower
scanning priorities very quickly which leads to an excessive swap out.
Let's fix this by excluding all swap backed pages (they tend to be long
lived wrt. the regular page cache anyway) from used-once heuristic and
rather activate them if they are referenced.
The customer's workload is shmem backed database (80% of RAM) and they
are measuring transactions/s with an IO in the background (20%).
Transactions touch more or less random rows in the table. The
transaction rate fell by a factor of 3 (in the worst case) because of
commit 64574746. This patch restores the previous numbers.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7e94a1686 upstream.
block congestion control doesn't have any concept of fairness across
multiple queues. This means that if SCSI reports the host as busy in
the queue congestion control it can result in an unfair starvation
situation in dm-mp if there are multiple multipath devices on the same
host. For example:
http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html
The fix for this is to report only the sdev busy state (and ignore the
host busy state) in the block congestion control call back.
The host is still congested, but the SCSI subsystem will sort out the
congestion in a fair way because it knows the relation between the
queues and the host.
[jejb: fixed up trailing whitespace]
Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ff2f40305 upstream.
Commit c751085943
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date: Sun Apr 12 20:06:56 2009 +0200
PM/Hibernate: Wait for SCSI devices scan to complete during resume
Broke the scsi_wait_scan module in 2.6.30. Apparently debian still uses it so
fix it and backport to stable before removing it in 3.6.
The breakage is caused because the function template in
include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in.
That means that in the modular case (which is every distro), the
scsi_wait_scan module does a simple async_synchronize_full() instead of
waiting for scans.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9868a060cc upstream.
The freed IRQ is not necessary the one requested in probe.
Even if it was, with two or more i2c-controllers it will fails anyway.
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 435a7ef52d upstream.
We can't be holding the mmap_sem while calling flush_cache_user_range
because the flush can fault. If we fault on a user address, the
page fault handler will try to take mmap_sem again. Since both places
acquire the read lock, most of the time it succeeds. However, if another
thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
between the call to flush_cache_user_range and the fault, the down_read
in do_page_fault will deadlock.
[will: removed drop of vma parameter as already queued by rmk (7365/1)]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc25f79af3 upstream.
OEM parameters [1] are parsed from the platform option-rom / efi
driver. By default the driver was validating the parameters for the
dual-controller case, but in single-controller case only the first set
of parameters may be valid.
Limit the validation to the number of actual controllers detected
otherwise the driver may fail to parse the valid parameters leading to
driver-load or runtime failures.
[1] the platform specific set of phy address, configuration,and analog
tuning values
Reported-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f1d62bed7 upstream.
This is because __builtin_clz(0) returns 64 for the "undefined" case
of 0, since the builtin just does a right-shift 32 and "clz" instruction.
So, use the alpha approach of casting to u32 and using __builtin_clzll().
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbbc4c4d8c upstream.
Commit 06e8935feb ("optimized SDIO IRQ handling for single irq")
introduced some spurious calls to SDIO function interrupt handlers,
such as when the SDIO IRQ thread is started, or the safety check
performed upon a system resume. Let's add a flag to perform the
optimization only when a real interrupt is signaled by the host
driver and we know there is no point confirming it.
Reported-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 875e26648c upstream.
Linus pointed out that there was no value is checking whether m->ip
was zero - because zero is a legimate value. If we have a reliable
(or faked in the VM86 case) "m->cs" we can use it to tell whether we
were in user mode or kernelwhen the machine check hit.
Reported-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31c5f0c5e2 upstream.
Properly validate the user-supplied index against the number of inputs.
The code used the pin local variable instead of the index by mistake.
Reported-by: Jozef Vesely <vesely@gjh.sk>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9dcf84b14 upstream.
... we need it later on in the function to clean up pipe <-> plane
associations. This regression has been introduced in
commit f47166d2b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Mar 22 15:00:50 2012 +0000
drm/i915: Sanitize BIOS debugging bits from PIPECONF
Spotted by staring at debug output of an (as it turns out) totally
unrelated bug.
v2: I've totally failed to do the s/pipe/i/ correctly, spotted by
Chris Wilson.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1e969e033 upstream.
This originally started as a patch from Bernard as a way of simply
setting the VS scheduler. After submitting the RFC patch, we decided to
also modify the DS scheduler. To be most explicit, I've made the patch
explicitly set all scheduler modes, and included the defines for other
modes (in case someone feels frisky later).
The rest of the story gets a bit weird. The first version of the patch
showed an almost unbelievable performance improvement. Since rebasing my
branch it appears the performance improvement has gone, unfortunately.
But setting these bits seem to be the right thing to do given that the
docs describe corruption that can occur with the default settings.
In summary, I am seeing no more perf improvements (or regressions) in my
limited testing, but we believe this should be set to prevent rendering
corruption, therefore cc stable.
v1: Clear bit 4 also (Ken + Eugeni)
Do a full clear + set of the bits we want (Me).
Cc: Bernard Kilarski <bernard.r.kilarski@intel.com>
Reviewed-by (RFC): Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9adab8b5a7 upstream.
Currently the code re-reads PCH_IIR during the hotplug interrupt
processing. Not only is this a wasted read, but introduces a potential
for handling a spurious interrupt as we then may not clear all the
interrupts processed (since the re-read IIR may contains more interrupts
asserted than we clear using the result of the original read).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1530bbc627 upstream.
Sergio reported that when he recorded audio from a USB headset mic
plugged into the USB 3.0 port on his ASUS N53SV-DH72, the audio sounded
"robotic". When plugged into the USB 2.0 port under EHCI on the same
laptop, the audio sounded fine. The device is:
Bus 002 Device 004: ID 046d:0a0c Logitech, Inc. Clear Chat Comfort USB Headset
The problem was tracked down to the Fresco Logic xHCI host controller
not correctly reporting short transfers on isochronous IN endpoints.
The driver would submit a 96 byte transfer, the device would only send
88 or 90 bytes, and the xHCI host would report the transfer had a
"successful" completion code, with an untransferred buffer length of 8
or 6 bytes.
The successful completion code and non-zero untransferred length is a
contradiction. The xHCI host is supposed to only mark a transfer as
successful if all the bytes are transferred. Otherwise, the transfer
should be marked with a short packet completion code. Without the EHCI
bus trace, we wouldn't know whether the xHCI driver should trust the
completion code or the untransferred length. With it, we know to trust
the untransferred length.
Add a new xHCI quirk for the Fresco Logic host controller. If a
transfer is reported as successful, but the untransferred length is
non-zero, print a warning. For the Fresco Logic host, change the
completion code to COMP_SHORT_TX and process the transfer like a short
transfer.
This should be backported to stable kernels that contain the commit
f5182b4155 "xhci: Disable MSI for some
Fresco Logic hosts." That commit was marked for stable kernels as old
as 2.6.36.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Sergio Correia <lists@uece.net>
Tested-by: Sergio Correia <lists@uece.net>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33b2831ac8 upstream.
When the xHCI driver needs to clean up memory (perhaps due to a failed
register restore on resume from S3 or resume from S4), it needs to reset
the number of reserved TRBs on the command ring to zero. Otherwise,
several resume cycles (about 30) with a UAS device attached will
continually increment the number of reserved TRBs, until all command
submissions fail because there isn't enough room on the command ring.
This patch should be backported to kernels as old as 2.6.32,
that contain the commit 913a8a344f
"USB: xhci: Change how xHCI commands are handled."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c745995ae upstream.
While testing unplugging an UVC HD webcam with usb-redirection (so through
usbdevfs), my userspace usb-redir code was getting a value of -1 in
iso_frame_desc[n].status, which according to Documentation/usb/error-codes.txt
is not a valid value.
The source of this -1 is the default case in xhci-ring.c:process_isoc_td()
adding a kprintf there showed the value of trb_comp_code to be COMP_TX_ERR
in this case, so this patch adds handling for that completion code to
process_isoc_td().
This was observed and tested with the following xhci controller:
1033:0194 NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
Note: I also wonder if setting frame->status to -1 (-EPERM) is the best we can
do, but since I cannot come up with anything better I've left that as is.
This patch should be backported to kernels as old as 2.6.36, which contain the
commit 04e51901dd "USB: xHCI: Isochronous
transfer implementation".
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c12443ab8 upstream.
The upcoming Intel Lynx Point chipset includes an xHCI host controller
that can have ports switched from the EHCI host controller, just like
the Intel Panther Point xHCI host. This time, ports from both EHCI
hosts can be switched to the xHCI host controller. The PCI config
registers to do the port switching are in the exact same place in the
xHCI PCI configuration registers, with the same semantics.
Hooray for shipping patches for next-gen hardware before the current gen
hardware is even available for purchase!
This patch should be backported to stable kernels as old as 3.0,
that contain commit 69e848c209
"Intel xhci: Support EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d0947dec4 upstream.
dTD's next dtd pointer need to be updated once CPU writes it, or this
request may not be handled by controller, then host will get NAK from
device forever.
This problem occurs when there is a request is handling, we need to add
a new request to dTD list, if this new request is added before the current
one is finished, the new request is intended to added as next dtd pointer
at current dTD, but without wmb(), the dTD's next dtd pointer may not be
updated when the controller reads it. In that case, the controller will
still get Terminate Bit is 1 at dTD's next dtd pointer, that means there is
no next request, then this new request is missed by controller.
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Li Yang <leoli@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e09dcf20f upstream.
There exist races in devio.c, below is one case,
and there are similar races in destroy_async()
and proc_unlinkurb(). Remove these races.
cancel_bulk_urbs() async_completed()
------------------- -----------------------
spin_unlock(&ps->lock);
list_move_tail(&as->asynclist,
&ps->async_completed);
wake_up(&ps->wait);
Lead to free_async() be triggered,
then urb and 'as' will be freed.
usb_unlink_urb(as->urb);
===> refer to the freed 'as'
Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oncaphillis <oncaphillis@snafu.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a23ccd216 upstream.
bMaxPacketSize0 field for super speed is a power of 2, not a count.
The size itself is always 512.
Max packet size for a super speed bulk endpoint is 1024, so
allocate the urb size in halt_simple() accordingly.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bc3711cbb upstream.
Upgraded firmware on Smart Array P7xx (and some others) made them show up as
SCSI revision 5 devices and this caused the driver to fail to map MSA2xxx
logical drives to the correct bus/target/lun. A symptom of this would be that
the target ID of the logical drives as presented by the external storage array
is ignored, and all such logical drives are assigned to target zero,
differentiated only by LUN. Some multipath software reportedly does not deal
well with this behavior, failing to recognize different paths to the same
device as such.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Scott Teel <scott.teel@hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb9c583638 upstream.
The out functions should only handle actual available data instead of the complete buffer.
Otherwise for example the ep0_consume function will report ghost events since it tries to decode
the complete buffer - which may contain partly invalid data.
Signed-off-by: Matthias Fend <matthias.fend@wolfvision.net>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5e1b8cee7 upstream.
A flush request is usually issued in transaction commit code path, so
using GFP_KERNEL to allocate memory for flush request bio falls into
the classic deadlock issue.
This is suitable for any -stable kernel to which it applies as it
avoids a possible deadlock.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05f144a0d5 upstream.
Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled
=============================================================================
BUG numa_policy (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
__slab_alloc+0x3d3/0x445
kmem_cache_alloc+0x29d/0x2b0
mpol_new+0xa3/0x140
sys_mbind+0x142/0x620
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
__slab_free+0x2e/0x1de
kmem_cache_free+0x25a/0x260
__mpol_put+0x27/0x30
remove_vma+0x68/0x90
exit_mmap+0x118/0x140
mmput+0x73/0x110
exit_mm+0x108/0x130
do_exit+0x162/0xb90
do_group_exit+0x4f/0xc0
sys_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
This implied a reference counting bug and the problem happened during
mbind().
mbind() applies a new memory policy to a range and uses mbind_range() to
merge existing VMAs or split them as necessary. In the event of splits,
mpol_dup() will allocate a new struct mempolicy and maintain existing
reference counts whose rules are documented in
Documentation/vm/numa_memory_policy.txt .
The problem occurs with shared memory policies. The vm_op->set_policy
increments the reference count if necessary and split_vma() and
vma_merge() have already handled the existing reference counts.
However, policy_vma() screws it up by replacing an existing
vma->vm_policy with one that potentially has the wrong reference count
leading to a premature free. This patch removes the damage caused by
policy_vma().
With this patch applied Dave's trinity tool runs an mbind test for 5
minutes without error. /proc/slabinfo reported that there are no
numa_policy or shared_policy_node objects allocated after the test
completed and the shared memory region was deleted.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 544ecf310f upstream.
worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
isn't zero when every worker is idle. This can trigger spuriously
while a cpu is going down due to the way trustee sets %WORKER_ROGUE
and zaps nr_running.
It first sets %WORKER_ROGUE on all workers without updating
nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
then zaps nr_running. If the last running worker enters idle
inbetween, it would see stale nr_running which hasn't been zapped yet
and trigger the WARN_ON_ONCE().
Fix it by performing the sanity check iff the trustee is idle.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 616b6937e3 upstream.
Else the poll will be restarted indefinitely in a tight loop,
preventing final device cleanup.
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 591bfc6bf9 upstream.
The HOWTO document needed updating for the new kernel versioning. The
git URI for -next was updated as well.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f15b9000eb upstream.
UML uses the _PAGE_NEWPAGE flag to mark pages which are not jet
installed on the host side using mmap().
pte_same() has to ignore this flag, otherwise unuse_pte_range()
is unable to unuse the page because two identical
page tables entries with different _PAGE_NEWPAGE flags would not
match and swapoff() would never return.
Analyzed-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b76ebaa72 upstream.
The current __swp_type() function uses a too small bitshift.
Using more than one swap files causes bad pages because
the type bits clash with other page flags.
Analyzed-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 642d892522 upstream.
The Marvell 88SE9172 SATA controller (PCI ID 1b4b 917a) already worked
once it was detected, but was missing an ahci_pci_tbl entry.
Boot tested on a Gigabyte Z68X-UD3H-B3 motherboard.
Signed-off-by: Matt Johnson <johnso87@illinois.edu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abae41e643 upstream.
aux_free is freed on all other exits from the function. By removing the
return, we can benefit from the vfree already at the end of the function.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 154c50ca4e upstream.
We reset the bool names and values array to NULL, but do not reset the
number of entries in these arrays to 0. If we error out and then get back
into this function we will walk these NULL pointers based on the belief
that they are non-zero length.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45de6767dc upstream.
Use the 32-bit compat keyctl() syscall wrapper on Sparc64 for Sparc32 binary
compatibility.
Without this, keyctl(KEYCTL_INSTANTIATE_IOV) is liable to malfunction as it
uses an iovec array read from userspace - though the kernel should survive this
as it checks pointers and sizes anyway.
I think all the other keyctl() function should just work, provided (a) the top
32-bits of each 64-bit argument register are cleared prior to invoking the
syscall routine, and the 32-bit address space is right at the 0-end of the
64-bit address space. Most of the arguments are 32-bit anyway, and so for
those clearing is not required.
Signed-off-by: David Howells <dhowells@redhat.com
cc: "David S. Miller" <davem@davemloft.net>
cc: sparclinux@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e42fafc25f upstream.
The ioc->pfacts member in the IOC structure is getting set to zero
following a call to _base_get_ioc_facts due to the memset in that routine.
So if the ioc->pfacts was read after a host reset, there would be a NULL
pointer dereference. The routine _base_get_ioc_facts is called from context
of host reset. The problem in _base_get_ioc_facts is the size of
Mpi2IOCFactsReply is 64, whereas the sizeof "struct mpt2sas_facts" is 60,
so there is a four byte overflow resulting from the memset.
Also, there is memset in _base_get_port_facts using the incorrect structure,
it should be "struct mpt2sas_port_facts" instead of Mpi2PortFactsReply.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5e50a51cc upstream.
When setting the current task state to TASK_UNINTERRUPTIBLE this can
race with a different cpu. The other cpu could set the task state after
it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which
would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again.
This race was always present in the pfault interrupt code but didn't
cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug
vs missing completion interrupts" which relied on the fact that after
setting the task state to TASK_UNINTERRUPTIBLE the task would really
sleep.
Since this is not necessarily the case the result may be a list corruption
of the pfault_list or, as observed, a use-after-free bug while trying to
access the task_struct of a task which terminated itself already.
To fix this, we need to get a reference of the affected task when receiving
the initial pfault interrupt and add special handling if we receive yet
another initial pfault interrupt when the task is already enqueued in the
pfault list.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31a67102f4 upstream.
During early boot, when the scheduler hasn't really been fully set up,
we really can't do blocking allocations because with certain (dubious)
configurations the "might_resched()" calls can actually result in
scheduling events.
We could just make such users always use GFP_ATOMIC, but quite often the
code that does the allocation isn't really aware of the fact that the
scheduler isn't up yet, and forcing that kind of random knowledge on the
initialization code is just annoying and not good for anybody.
And we actually have a the 'gfp_allowed_mask' exactly for this reason:
it's just that the kernel init sequence happens to set it to allow
blocking allocations much too early.
So move the 'gfp_allowed_mask' initialization from 'start_kernel()'
(which is some of the earliest init code, and runs with preemption
disabled for good reasons) into 'kernel_init()'. kernel_init() is run
in the newly created thread that will become the 'init' process, as
opposed to the early startup code that runs within the context of what
will be the first idle thread.
So by the time we reach 'kernel_init()', we know that the scheduler must
be at least limping along, because we've already scheduled from the idle
thread into the init thread.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a70b52ec1a upstream.
We had for some reason overlooked the AIO interface, and it didn't use
the proper rw_verify_area() helper function that checks (for example)
mandatory locking on the file, and that the size of the access doesn't
cause us to overflow the provided offset limits etc.
Instead, AIO did just the security_file_permission() thing (that
rw_verify_area() also does) directly.
This fixes it to do all the proper helper functions, which not only
means that now mandatory file locking works with AIO too, we can
actually remove lines of code.
Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e618aad53 upstream.
Introduce a global ratelimit for CAPI message dumps to protect
against possible log flood.
Drop the ratelimit for ignored messages which is now covered by the
global one.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3cb867481 upstream.
Due to an errata, the PA7300LC generates a TLB miss interruption even on the
prefetch instruction. This means that prefetch(NULL), which is supposed to be
a nop on linux actually generates a NULL deref fault. Fix this by testing the
address of prefetch against NULL before doing the prefetch.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 207f583d71 upstream.
As pointed out by serveral people, PA1.1 only has a type 26 instruction
meaning that the space register must be explicitly encoded. Not giving an
explicit space means that the compiler uses the type 24 version which is PA2.0
only resulting in an illegal instruction crash.
This regression was caused by
commit f311847c2f
Author: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Wed Dec 22 10:22:11 2010 -0600
parisc: flush pages through tmpalias space
Reported-by: Helge Deller <deller@gmx.de>
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e185581d7 upstream.
All PA1.1 systems have been oopsing on boot since
commit f311847c2f
Author: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Wed Dec 22 10:22:11 2010 -0600
parisc: flush pages through tmpalias space
because a PA2.0 instruction was accidentally introduced into the PA1.1 TLB
insertion interruption path when it was consolidated with the do_alias macro.
Fix the do_alias macro only to use PA2.0 instructions if compiled for 64 bit.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 080399aaaf upstream.
Hi,
We have a bug report open where a squashfs image mounted on ppc64 would
exhibit errors due to trying to read beyond the end of the disk. It can
easily be reproduced by doing the following:
[root@ibm-p750e-02-lp3 ~]# ls -l install.img
-rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
[root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
[root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
dd: reading `/dev/loop0': Input/output error
277376+0 records in
277376+0 records out
142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s
In dmesg, you'll find the following:
squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 43.106012] attempt to access beyond end of device
[ 43.106029] loop0: rw=0, want=277410, limit=277408
[ 43.106039] Buffer I/O error on device loop0, logical block 138704
[ 43.106053] attempt to access beyond end of device
[ 43.106057] loop0: rw=0, want=277412, limit=277408
[ 43.106061] Buffer I/O error on device loop0, logical block 138705
[ 43.106066] attempt to access beyond end of device
[ 43.106070] loop0: rw=0, want=277414, limit=277408
[ 43.106073] Buffer I/O error on device loop0, logical block 138706
[ 43.106078] attempt to access beyond end of device
[ 43.106081] loop0: rw=0, want=277416, limit=277408
[ 43.106085] Buffer I/O error on device loop0, logical block 138707
[ 43.106089] attempt to access beyond end of device
[ 43.106093] loop0: rw=0, want=277418, limit=277408
[ 43.106096] Buffer I/O error on device loop0, logical block 138708
[ 43.106101] attempt to access beyond end of device
[ 43.106104] loop0: rw=0, want=277420, limit=277408
[ 43.106108] Buffer I/O error on device loop0, logical block 138709
[ 43.106112] attempt to access beyond end of device
[ 43.106116] loop0: rw=0, want=277422, limit=277408
[ 43.106120] Buffer I/O error on device loop0, logical block 138710
[ 43.106124] attempt to access beyond end of device
[ 43.106128] loop0: rw=0, want=277424, limit=277408
[ 43.106131] Buffer I/O error on device loop0, logical block 138711
[ 43.106135] attempt to access beyond end of device
[ 43.106139] loop0: rw=0, want=277426, limit=277408
[ 43.106143] Buffer I/O error on device loop0, logical block 138712
[ 43.106147] attempt to access beyond end of device
[ 43.106151] loop0: rw=0, want=277428, limit=277408
[ 43.106154] Buffer I/O error on device loop0, logical block 138713
[ 43.106158] attempt to access beyond end of device
[ 43.106162] loop0: rw=0, want=277430, limit=277408
[ 43.106166] attempt to access beyond end of device
[ 43.106169] loop0: rw=0, want=277432, limit=277408
...
[ 43.106307] attempt to access beyond end of device
[ 43.106311] loop0: rw=0, want=277470, limit=2774
Squashfs manages to read in the end block(s) of the disk during the
mount operation. Then, when dd reads the block device, it leads to
block_read_full_page being called with buffers that are beyond end of
disk, but are marked as mapped. Thus, it would end up submitting read
I/O against them, resulting in the errors mentioned above. I fixed the
problem by modifying init_page_buffers to only set the buffer mapped if
it fell inside of i_size.
Cheers,
Jeff
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Nick Piggin <npiggin@kernel.dk>
--
Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05c69d298c upstream.
6d1d8050b4 "block, partition: add partition_meta_info to hd_struct"
added part_unpack_uuid() which assumes that the passed in buffer has
enough space for sprintfing "%pU" - 37 characters including '\0'.
Unfortunately, b5af921ec0 "init: add support for root devices
specified by partition UUID" supplied 33 bytes buffer to the function
leading to the following panic with stackprotector enabled.
Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e
[<ffffffff815e226b>] panic+0xba/0x1c6
[<ffffffff81b14c7e>] ? printk_all_partitions+0x259/0x26xb
[<ffffffff810566bb>] __stack_chk_fail+0x1b/0x20
[<ffffffff81b15c7e>] printk_all_paritions+0x259/0x26xb
[<ffffffff81aedfe0>] mount_block_root+0x1bc/0x27f
[<ffffffff81aee0fa>] mount_root+0x57/0x5b
[<ffffffff81aee23b>] prepare_namespace+0x13d/0x176
[<ffffffff8107eec0>] ? release_tgcred.isra.4+0x330/0x30
[<ffffffff81aedd60>] kernel_init+0x155/0x15a
[<ffffffff81087b97>] ? schedule_tail+0x27/0xb0
[<ffffffff815f4d24>] kernel_thread_helper+0x5/0x10
[<ffffffff81aedc0b>] ? start_kernel+0x3c5/0x3c5
[<ffffffff815f4d20>] ? gs_change+0x13/0x13
Increase the buffer size, remove the dangerous part_unpack_uuid() and
use snprintf() directly from printk_all_partitions().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Szymon Gruszczynski <sz.gruszczynski@googlemail.com>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6d9668e11 upstream.
Some discussion with the glibc mailing lists revealed that this was
necessary for 64-bit platforms with MIPS-like sign-extension rules
for 32-bit values. The original symptom was that passing (uid_t)-1 to
setreuid() was failing in programs linked -pthread because of the "setxid"
mechanism for passing setxid-type function arguments to the syscall code.
SYSCALL_WRAPPERS handles ensuring that all syscall arguments end up with
proper sign-extension and is thus the appropriate fix for this problem.
On other platforms (s390, powerpc, sparc64, and mips) this was fixed
in 2.6.28.6. The general issue is tracked as CVE-2009-0029.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73f98eab9b upstream.
pch_gbe_validate_option() modifies 32 bits of memory but we pass
&hw->phy.autoneg_advertised which only has 16 bits and &hw->mac.fc
which only has 8 bits.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b53d07891 upstream.
If the MAC is invalid or not implemented, do not abort the probe. Issue
a warning and prevent bringing the interface up until a MAC is set manually
(via ifconfig $IFACE hw ether $MAC).
Tested on two platforms, one with a valid MAC, the other without a MAC. The real
MAC is used if present, the interface fails to come up until the MAC is set on
the other. They successfully get an IP over DHCP and pass a simple ping and
login over ssh test.
This is meant to allow the Inforce SYS940X development board:
http://www.inforcecomputing.com/SYS940X_ECX.html
(and others suffering from a missing MAC) to work with the mainline kernel.
Without this patch, the probe will fail and the interface will not be created,
preventing the user from configuring the MAC manually.
This does not make any attempt to address a missing or invalid MAC for the
pch_phub driver.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Alan Cox <alan@linux.intel.com>
CC: Tomoya MORINAGA <tomoya.rohm@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Jon Mason <jdmason@kudzu.us>
CC: netdev@vger.kernel.org
CC: Mark Brown <broonie@opensource.wolfsonmicro.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
commit 7756332f5b upstream.
Support new device OKI SEMICONDUCTOR ML7831 IOH(Input/Output Hub)
ML7831 is for general purpose use.
ML7831 is companion chip for Intel Atom E6xx series.
ML7831 is completely compatible for Intel EG20T PCH.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5229d87edc upstream.
This patch fixed the issue which receives an unnecessary packet before link
When using PHY of GMII, an unnecessary packet is received,
And it becomes impossible to receive a packet after link up.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1616300a2 upstream.
dd slept infinitely when fsfeeze failed because of EIO.
To fix this problem, if ->freeze_fs fails, freeze_super() wakes up
the tasks waiting for the filesystem to become unfrozen.
When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(),
the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen.
However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then
freeze_super() returns an error number. In this case, FITHAW ioctl returns
EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up
s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45bcf018d1 upstream.
IRQF_SHARED is required for older controllers that don't support MSI(X)
and which may end up sharing an interrupt. All the controllers hpsa
normally supports have MSI(X) capability, but older controllers may be
encountered via the hpsa_allow_any=1 module parameter.
Also remove deprecated IRQF_DISABLED.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acd6ad8351 upstream.
When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1415dd8705 upstream.
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7dafa0ef3 upstream.
compat_sys_sigprocmask reads a smaller signal mask from userspace than
sigprogmask accepts for setting. So the high word of blocked.sig[0]
will be cleared, releasing any potentially blocked RT signal.
This was discovered via userspace code that relies on get/setcontext.
glibc's i386 versions of those functions use sigprogmask instead of
rt_sigprogmask to save/restore signal mask and caused RT signal
unblocking this way.
As suggested by Linus, this replaces the sys_sigprocmask based compat
version with one that open-codes the required logic, including the merge
of the existing blocked set with the new one provided on SIG_SETMASK.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a shorter (and more appropriate for stable kernels) analog to
the following upstream commit:
commit 6926afd192
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Sat Jan 7 13:22:46 2012 -0500
NFSv4: Save the owner/group name string when doing open
...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Without this patch, logging into two different machines with home
directories mounted over NFS4 and then running "vim" and typing ":q"
in each reliably produces the following error on the second machine:
E137: Viminfo file is not writable: /users/system/rtheys/.viminfo
This regression was introduced by 80e52aced1 ("NFSv4: Don't do
idmapper upcalls for asynchronous RPC calls", merged during the 2.6.32
cycle) --- after the OPEN call, .viminfo has the default values for
st_uid and st_gid (0xfffffffe) cached because we do not want to let
rpciod wait for an idmapper upcall to fill them in.
The fix used in mainline is to save the owner and group as strings and
perform the upcall in _nfs4_proc_open outside the rpciod context,
which takes about 600 lines. For stable, we can do something similar
with a one-liner: make open check for the stale fields and make a
(synchronous) GETATTR call to fill them when needed.
Trond dictated the patch, I typed it in, and Rik tested it.
Addresses http://bugs.debian.org/659111 and
https://bugzilla.redhat.com/789298
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Explained-by: David Flyn <davidf@rd.bbc.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1bb05a657 upstream.
Processes hang forever on a sync-mounted ext2 file system that
is mounted with the ext4 module (default in Fedora 16).
I can reproduce this reliably by mounting an ext2 partition with
"-o sync" and opening a new file an that partition with vim. vim
will hang in "D" state forever. The same happens on ext4 without
a journal.
I am attaching a small patch here that solves this issue for me.
In the sync mounted case without a journal,
ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
can't be called with buffer lock held.
Also move mb_cache_entry_release inside lock to avoid race
fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
Note too that ext2 fixed this same problem in 2006 with
b2f49033 [PATCH] fix deadlock in ext2
Signed-off-by: Martin.Wilck@ts.fujitsu.com
[sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 377485f624 upstream.
Currently, we'll try mounting any device who's major device number is
UNNAMED_MAJOR as NFS root. This would happen for non-NFS devices as
well (such as 9p devices) but it wouldn't cause any issues since
mounting the device as NFS would fail quickly and the code proceeded to
doing the proper mount:
[ 101.522716] VFS: Unable to mount root fs via NFS, trying floppy.
[ 101.534499] VFS: Mounted root (9p filesystem) on device 0:18.
Commit 6829a048102a ("NFS: Retry mounting NFSROOT") introduced retries
when mounting NFS root, which means that now we don't immediately fail
and instead it takes an additional 90+ seconds until we stop retrying,
which has revealed the issue this patch fixes.
This meant that it would take an additional 90 seconds to boot when
we're not using a device type which gets detected in order before NFS.
This patch modifies the NFS type check to require device type to be
'Root_NFS' instead of requiring the device to have an UNNAMED_MAJOR
major. This makes boot process cleaner since we now won't go through
the NFS mounting code at all when the device isn't an NFS root
("/dev/nfs").
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bad115cfe5 upstream.
Since recent changes on TCP splicing (starting with commits 2f533844
"tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
tcp_sendpages() should call tcp_push() once"), I started seeing
massive stalls when forwarding traffic between two sockets using
splice() when pipe buffers were larger than socket buffers.
Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.
The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.
After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.
The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.
This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d9f4f135e upstream.
Use del_timer_sync to remove timer before mddev_suspend finishes.
We don't want a timer going off after an mddev_suspend is called. This is
especially true with device-mapper, since it can call the destructor function
immediately following a suspend. This results in the removal (kfree) of the
structures upon which the timer depends - resulting in a very ugly panic.
Therefore, we add a del_timer_sync to mddev_suspend to prevent this.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a134d22829 upstream.
This passes siginfo and mcontext to tilegx32 signal handlers that
don't have SA_SIGINFO set just as we have been doing for tilegx64.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 226bb7df3d upstream.
The locking policy is such that the erase_complete_block spinlock is
nested within the alloc_sem mutex. This fixes a case in which the
acquisition order was erroneously reversed. This issue was caught by
the following lockdep splat:
=======================================================
[ INFO: possible circular locking dependency detected ]
3.0.5 #1
-------------------------------------------------------
jffs2_gcd_mtd6/299 is trying to acquire lock:
(&c->alloc_sem){+.+.+.}, at: [<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
but task is already holding lock:
(&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&c->erase_completion_lock)->rlock){+.+...}:
[<c008bec4>] validate_chain+0xe6c/0x10bc
[<c008c660>] __lock_acquire+0x54c/0xba4
[<c008d240>] lock_acquire+0xa4/0x114
[<c046780c>] _raw_spin_lock+0x3c/0x4c
[<c01f744c>] jffs2_garbage_collect_pass+0x4c/0x890
[<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
[<c0071a68>] kthread+0x98/0xa0
[<c000f264>] kernel_thread_exit+0x0/0x8
-> #0 (&c->alloc_sem){+.+.+.}:
[<c008ad2c>] print_circular_bug+0x70/0x2c4
[<c008c08c>] validate_chain+0x1034/0x10bc
[<c008c660>] __lock_acquire+0x54c/0xba4
[<c008d240>] lock_acquire+0xa4/0x114
[<c0466628>] mutex_lock_nested+0x74/0x33c
[<c01f7714>] jffs2_garbage_collect_pass+0x314/0x890
[<c01f937c>] jffs2_garbage_collect_thread+0x1b4/0x1cc
[<c0071a68>] kthread+0x98/0xa0
[<c000f264>] kernel_thread_exit+0x0/0x8
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&c->erase_completion_lock)->rlock);
lock(&c->alloc_sem);
lock(&(&c->erase_completion_lock)->rlock);
lock(&c->alloc_sem);
*** DEADLOCK ***
1 lock held by jffs2_gcd_mtd6/299:
#0: (&(&c->erase_completion_lock)->rlock){+.+...}, at: [<c01f7708>] jffs2_garbage_collect_pass+0x308/0x890
stack backtrace:
[<c00155dc>] (unwind_backtrace+0x0/0x100) from [<c0463dc0>] (dump_stack+0x20/0x24)
[<c0463dc0>] (dump_stack+0x20/0x24) from [<c008ae84>] (print_circular_bug+0x1c8/0x2c4)
[<c008ae84>] (print_circular_bug+0x1c8/0x2c4) from [<c008c08c>] (validate_chain+0x1034/0x10bc)
[<c008c08c>] (validate_chain+0x1034/0x10bc) from [<c008c660>] (__lock_acquire+0x54c/0xba4)
[<c008c660>] (__lock_acquire+0x54c/0xba4) from [<c008d240>] (lock_acquire+0xa4/0x114)
[<c008d240>] (lock_acquire+0xa4/0x114) from [<c0466628>] (mutex_lock_nested+0x74/0x33c)
[<c0466628>] (mutex_lock_nested+0x74/0x33c) from [<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890)
[<c01f7714>] (jffs2_garbage_collect_pass+0x314/0x890) from [<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc)
[<c01f937c>] (jffs2_garbage_collect_thread+0x1b4/0x1cc) from [<c0071a68>] (kthread+0x98/0xa0)
[<c0071a68>] (kthread+0x98/0xa0) from [<c000f264>] (kernel_thread_exit+0x0/0x8)
This was introduce in '81cfc9f jffs2: Fix serious write stall due to erase'.
Signed-off-by: Josh Cartwright <joshc@linux.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bc2e853c6 upstream.
Systems with 8 TBytes of memory or greater can hit a problem where only
the the first 8 TB of memory shows up. This is due to "int i" being
smaller than "unsigned long start_aligned", causing the high bits to be
dropped.
The fix is to change `i' to unsigned long to match start_aligned
and end_aligned.
Thanks to Jack Steiner for assistance tracking this down.
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4998a6c0ed upstream.
Commit 66aebce747 ("hugetlb: fix race condition in hugetlb_fault()")
added code to avoid a race condition by elevating the page refcount in
hugetlb_fault() while calling hugetlb_cow().
However, one code path in hugetlb_cow() includes an assertion that the
page count is 1, whereas it may now also have the value 2 in this path.
The consensus is that this BUG_ON has served its purpose, so rather than
extending it to cover both cases, we just remove it.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42b6428145 upstream.
pcpu_embed_first_chunk() allocates memory for each node, copies percpu
data and frees unused portions of it before proceeding to the next
group. This assumes that allocations for different nodes doesn't
overlap; however, depending on memory topology, the bootmem allocator
may end up allocating memory from a different node than the requested
one which may overlap with the portion freed from one of the previous
percpu areas. This leads to percpu groups for different nodes
overlapping which is a serious bug.
This patch separates out copy & partial free from the allocation loop
such that all allocations are complete before partial frees happen.
This also fixes overlapping frees which could happen on allocation
failure path - out_free_areas path frees whole groups but the groups
could have portions freed at that point.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: "Pavel V. Panteleev" <pp_84@mail.ru>
Tested-by: "Pavel V. Panteleev" <pp_84@mail.ru>
LKML-Reference: <E1SNhwY-0007ui-V7.pp_84-mail-ru@f220.mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6eddcb4c82 upstream.
Some RNDIS devices include a bogus CDC Union descriptor pointing
to non-existing interfaces. The RNDIS code is already prepared
to handle devices without a CDC Union descriptor by hardwiring
the driver to use interfaces 0 and 1, which is correct for the
devices with the bogus descriptor as well. So we can reuse the
existing workaround.
Cc: Markus Kolb <linux-201011@tower-net.de>
Cc: Iker Salmón San Millán <shaola@esdebian.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: 655387@bugs.debian.org
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ef449c6b3 upstream.
An early registration of an ISR was causing a crash to several users (for
example, with the ite-cir driver: http://bugs.launchpad.net/bugs/972723).
The reason was that IRQs were being triggered before a driver
initialisation was completed.
This patch fixes this by moving the invocation to request_irq() and to
request_region() to a later stage on the driver probe function.
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a5a737e090 ]
%g2 is meant to hold the CPUID number throughout this routine, since
at the very beginning, and at the very end, we use %g2 to calculate
indexes into per-cpu arrays.
However we erroneously clobber it in order to hold the %cwp register
value mid-stream.
Fix this code to use %g3 for the %cwp read and related calulcations
instead.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b6e9bcdeb upstream.
Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock before unlink, but may
cause skb traversing races:
- after URB is unlinked and the queue lock is released,
the refered skb and skb->next may be moved to done queue,
even be released
- in skb_queue_walk_safe, the next skb is still obtained
by next pointer of the last skb
- so maybe trigger oops or other problems
This patch extends the usage of entry->state to describe 'start_unlink'
state, so always holding the queue(rx/tx) lock to change the state if
the referd skb is in rx or tx queue because we need to know if the
refered urb has been started unlinking in unlink_urbs.
The other part of this patch is based on Huajun's patch:
always traverse from head of the tx/rx queue to get skb which is
to be unlinked but not been started unlinking.
Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32cf4023e6 upstream.
When an IRQ for some reason gets lost, we wait up to a second using
udelay, which is CPU intensive. This patch improves the situation by
waiting about 30 ms in the CPU intensive mode, then stepping down to
using msleep(2) instead. In essence, we trade some granularity in
exchange for less CPU consumption when the waiting time is a bit longer.
As a result, PulseAudio should no longer be killed by the kernel
for taking up to much RT-prio CPU time. At least not for *this* reason.
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Tested-by: Arun Raghavan <arun.raghavan@collabora.co.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c914f55f7c upstream.
This assertion seems to imply that chip->dsp_code_to_load is a pointer.
It's actually an integer handle on the actual firmware, and 0 has no
special meaning.
The assertion prevents initialisation of a Darla20 card, but would also
affect other models. It seems it was introduced in commit dd7b254d.
ALSA sound/pci/echoaudio/echoaudio.c:2061 Echoaudio driver starting...
ALSA sound/pci/echoaudio/echoaudio.c:1969 chip=ebe4e000
ALSA sound/pci/echoaudio/echoaudio.c:2007 pci=ed568000 irq=19 subdev=0010 Init hardware...
ALSA sound/pci/echoaudio/darla20_dsp.c:36 init_hw() - Darla20
------------[ cut here ]------------
WARNING: at sound/pci/echoaudio/echoaudio_dsp.c:478 init_hw+0x1d1/0x86c [snd_darla20]()
Hardware name: Dell DM051
BUG? (!chip->dsp_code_to_load || !chip->comm_page)
Signed-off-by: Mark Hills <mark@pogo.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6fe6ae56a7 upstream.
When the keyboard backlight support was originally added, the commit said
to default it to on with a 10 second timeout. That actually wasn't the
case, as the default value is commented out for the kbd_backlight parameter.
Because it is a static variable, it gets set to 0 by default without some
other form of initialization.
However, it seems the function to set the value wasn't actually called
immediately, so whatever state the keyboard was in initially would remain.
Then commit df410d5224 was introduced during the 2.6.39 timeframe to
immediately set whatever value was present (as well as attempt to
restore/reset the state on module removal or resume). That seems to have
now forced the light off immediately when the module is loaded unless
the option kbd_backlight=1 is specified.
Let's enable it by default again (for the first time). This should solve
https://bugzilla.redhat.com/show_bug.cgi?id=728478
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b49960a05e ]
tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)
In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.
With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728
(GRO can escape this problem because it build skbs with a too low
truesize.)
This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.
We should adjust the len/truesize ratio to 50% instead of 75%
This patch :
1) changes tcp_adv_win_scale default to 1 instead of 2
2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a8887d39e ]
WakeOnLan was broken in this driver because gp->asleep_wol is a 1-bit
bitfield and it was being assigned WAKE_MAGIC, which is (1 << 5).
gp->asleep_wol remains 0 and the machine never wakes up. Fixed by casting
gp->wake_on_lan to bool. Tested on an iBook G4.
Signed-off-by: Gerard Lledo <gerard.lledo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f891ea1634 ]
When RSS is enabled, interrupt vector 0 does not receive any rx traffic.
The rx producer index fields for vector 0's status block should be
considered reserved in this case. This patch changes the code to
respect these reserved fields, which avoids a kernel panic when these
fields take on non-zero values.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e072b3fad5 ]
Bug: The VLAN bit of the MAC RX Status Word is unreliable in several older
supported chips. Sometimes the VLAN bit is not set for valid VLAN packets
and also sometimes the VLAN bit is set for non-VLAN packets that came after
a VLAN packet. This results in a receive length error when VLAN hardware
tagging is enabled.
Fix: Variation on original fix proposed by Mirko.
The VLAN information is decoded in the status loop, and can be
applied to the received SKB there. This eliminates the need for the
separate tag field in the interface data structure. The tag has to
be copied and cleared if packet is copied. This version checked out
with vlan and normal traffic.
Note: vlan_tx_tag_present should be renamed vlan_tag_present, but that
is outside scope of this.
Reported-by: Mirko Lindner <mlindner@marvell.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f42941b5d ]
When a small packet is received, the driver copies it to a new skb to allow
reusing the full size Rx buffer. The copy was propogating the checksum offload
but not the receive hash information. The bug is impact was mostly harmless
and therefore not observed until reviewing this area of code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 84768edbb2 ]
l2tp_ip_sendmsg could return without releasing socket lock, making it all the
way to userspace, and generating the following warning:
[ 130.891594] ================================================
[ 130.894569] [ BUG: lock held when returning to user space! ]
[ 130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G W
[ 130.900336] ------------------------------------------------
[ 130.902996] trinity/8384 is leaving the kernel with locks still held!
[ 130.906106] 1 lock held by trinity/8384:
[ 130.907924] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550
Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c").
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7d3d43dab4 ]
We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.
This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 116a0fc31c ]
skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2a5809499e ]
The asix.c USB Ethernet driver avoids ending a tx transfer with a zero-
length packet by appending a four-byte padding to transfers whose length
is a multiple of maxpacket. However, the hard-coded 512 byte maxpacket
length is valid for high-speed USB only; full-speed USB uses 64 byte
packets.
Signed-off-by: Ingo van Lil <inguin@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48d99f47a8 upstream.
Commit 554cdaefd1 ('ARM: orion5x: Refactor
mpp code to use common orion platform mpp.') seems to have accidentally
inverted the GPIO valid bits for MPP9 (only). For the mv2120 platform
which uses MPP9 as a GPIO LED device, this results in the error:
[ 12.711476] leds-gpio: probe of leds-gpio failed with error -22
Reported-by: Henry von Tresckow <hvontres@gmail.com>
References: http://bugs.debian.org/667446
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Hans Henry von Tresckow <hvontres@gmail.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fde165b2a2 upstream.
Commit 4e8ee7de22 (ARM: SMP: use
idmap_pgd for mapping MMU enable during secondary booting)
switched secondary boot to use idmap_pgd, which is initialized
during early_initcall, instead of a page table initialized during
__cpu_up. This causes idmap_pgd to contain the static mappings
but be missing all dynamic mappings.
If a console is registered that creates a dynamic mapping, the
printk in secondary_start_kernel will trigger a data abort on
the missing mapping before the exception handlers have been
initialized, leading to a hang. Initial boot is not affected
because no consoles have been registered, and resume is usually
not affected because the offending console is suspended.
Onlining a cpu with hotplug triggers the problem.
A workaround is to the printk in secondary_start_kernel until
after the page tables have been switched back to init_mm.
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e787ec1376 upstream.
The inline assembly in kernel_execve() uses r8 and r9. Since this
code sequence does not return, it usually doesn't matter if the
register clobber list is accurate. However, I saw a case where a
particular version of gcc used r8 as an intermediate for the value
eventually passed to r9. Because r8 is used in the inline
assembly, and not mentioned in the clobber list, r9 was set
to an incorrect value.
This resulted in a kernel panic on execution of the first user-space
program in the system. r9 is used in ret_to_user as the thread_info
pointer, and if it's wrong, bad things happen.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f62427862 upstream.
We really need to use a ACCESS_ONCE() on the sequence value read in
__read_seqcount_begin(), because otherwise the compiler might end up
reloading the value in between the test and the return of it. As a
result, it might end up returning an odd value (which means that a write
is in progress).
If the reader is then fast enough that that odd value is still the
current one when the read_seqcount_retry() is done, we might end up with
a "successful" read sequence, even despite the concurrent write being
active.
In practice this probably never really happens - there just isn't
anything else going on around the read of the sequence count, and the
common case is that we end up having a read barrier immediately
afterwards.
So the code sequence in which gcc might decide to reaload from memory is
small, and there's no reason to believe it would ever actually do the
reload. But if the compiler ever were to decide to do so, it would be
incredibly annoying to debug. Let's just make sure.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5e28005a1 upstream.
With the embed percpu first chunk allocator, x86 uses either PAGE_SIZE
or PMD_SIZE for atom_size. PMD_SIZE is used when CPU supports PSE so
that percpu areas are aligned to PMD mappings and possibly allow using
PMD mappings in vmalloc areas in the future. Using larger atom_size
doesn't waste actual memory; however, it does require larger vmalloc
space allocation later on for !first chunks.
With reasonably sized vmalloc area, PMD_SIZE shouldn't be a problem
but x86_32 at this point is anything but reasonable in terms of
address space and using larger atom_size reportedly leads to frequent
percpu allocation failures on certain setups.
As there is no reason to not use PMD_SIZE on x86_64 as vmalloc space
is aplenty and most x86_64 configurations support PSE, fix the issue
by always using PMD_SIZE on x86_64 and PAGE_SIZE on x86_32.
v2: drop cpu_has_pse test and make x86_64 always use PMD_SIZE and
x86_32 PAGE_SIZE as suggested by hpa.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yanmin Zhang <yanmin.zhang@intel.com>
Reported-by: ShuoX Liu <shuox.liu@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4F97BA98.6010001@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76a8df7b49 upstream.
The accessing PCI configuration space with the PCI BIOS32 service does
not work in PV guests.
On systems without MMCONFIG or where the BIOS hasn't marked the
MMCONFIG region as reserved in the e820 map, the BIOS service is
probed (even though direct access is preferred) and this hangs.
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Fixed compile error when CONFIG_PCI is not set]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b7e5ffe5d8 upstream.
If I try to do "cat /sys/kernel/debug/kernel_page_tables"
I end up with:
BUG: unable to handle kernel paging request at ffffc7fffffff000
IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
PGD 0
Oops: 0000 [#1] SMP
CPU 0
.. snip..
RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000
which is due to the fact we are trying to access a PFN that is not
accessible to us. The reason (at least in this case) was that
PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
hypervisor) to point to a read-only linear map of the MFN->PFN array.
During our parsing we would get the MFN (a valid one), try to look
it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
Then pte_mfn_to_pfn would happilly feed that in, attach the flags
and return it back to the caller. 'ptdump_show' bitshifts it and
gets and invalid value that it tries to dereference.
Instead of doing all of that, we detect the ~0 case and just
return !_PAGE_PRESENT.
This bug has been in existence .. at least until 2.6.37 (yikes!)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07d69d4238 upstream.
Without this patch sysfs reports the cable as present
flag@flag-desktop:~$ cat /sys/class/net/eth0/carrier
1
while it's not:
flag@flag-desktop:~$ sudo mii-tool eth0
eth0: no link
Tested on my Beagle XM.
v2: added mantainer to the list of recipient
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c1bcdb5a3 upstream.
This driver currently leaves elp_work behind when stopping, which
occasionally results in data corruption because work function ends
up accessing freed memory, typical symptoms of this are various
worker_thread crashes. Fix it by cancelling elp_work.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 328c32f0f8 upstream.
Currently SDIO glue frees it's own structure before calling
wl1251_free_hw(), which in turn calls ieee80211_unregister_hw().
The later call may result in a need to communicate with the chip
to stop it (as it happens now if the interface is still up before
rmmod), which means calls are made back to the glue, resulting in
freed memory access.
Fix this by freeing glue data last.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66f2c99af3 upstream.
EAP frames for stations in an AP VLAN are sent on the main AP interface
to avoid race conditions wrt. moving stations.
For that to work properly, sta_info_get_bss must be used instead of
sta_info_get when sending EAP packets.
Previously this was only done for cooked monitor injected packets, so
this patch adds a check for tx->skb->protocol to the same place.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd44731989 upstream.
Driver incorrectly validates command completion: instead of waiting
for a command to be acknowledged it continues execution. Most of the
time driver gets acknowledge of the command completion in a tasklet
before it executes the next one. But sometimes it sends the next
command before it gets acknowledge for the previous one. In such a
case one of the following error messages appear in the log:
Failed to send SYSTEM_CONFIG: Already sending a command.
Failed to send ASSOCIATE: Already sending a command.
Failed to send TX_POWER: Already sending a command.
After that you need to reload the driver to get it working again.
This bug occurs during roaming (reported by Sam Varshavchik)
https://bugzilla.redhat.com/show_bug.cgi?id=738508
and machine booting (reported by Tom Gundersen and Mads Kiilerich)
https://bugs.archlinux.org/task/28097https://bugzilla.redhat.com/show_bug.cgi?id=802106
This patch doesn't fix the delay issue during firmware load.
But at least device now works as usual after boot.
Signed-off-by: Stanislav Yakovlev <stas.yakovlev@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c557cfee0 upstream.
In the driver's suspend function, clk_enable() was used instead of
clk_disable(). This is corrected with this patch.
Signed-off-by: Roland Stigge <stigge@antcom.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
[wsa: reworded commit header slightly]
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6868225e3e upstream.
Commit d902747("[libata] Add ATA transport class") introduced
ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared.
But ata_count_probe_trials_cb() didn't check this flag and it still
counts the old error history. So wrong probe trials count is returned
and it causes problem, for example, SATA link speed is slowed down from
3.0Gbps to 1.5Gbps.
Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb().
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bdc71c9a87 upstream.
CPU core ID is used to index the core_data[] array. The core ID is, however, not
sequential; 10-core CPUS can have a core ID as high as 25. Increase the limit to
32 to be able to deal with current CPUs.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 54b3a4d311 upstream.
Ben Hutchings pointed out that the validation in efivars was inadequate -
most obviously, an entry with size 0 would server as a DoS against the
kernel. Improve this based on his suggestions.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fec6c20b57 upstream.
A common flaw in UEFI systems is a refusal to POST triggered by a malformed
boot variable. Once in this state, machines may only be restored by
reflashing their firmware with an external hardware device. While this is
obviously a firmware bug, the serious nature of the outcome suggests that
operating systems should filter their variable writes in order to prevent
a malicious user from rendering the machine unusable.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b728a5c806 upstream.
drivers/firmware/efivars.c:161: warning: ‘utf16_strlen’ defined but not used
utf16_strlen() is only used inside CONFIG_PSTORE - make this "static inline"
to shut the compiler up [thanks to hpa for the suggestion].
drivers/firmware/efivars.c:602: warning: initialization from incompatible pointer type
Between v1 and v2 of this patch series we decided to make the "part" number
unsigned - but missed fixing the stub version of efi_pstore_write()
Acked-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Mike Waychison <mikew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[took the static part of the patch, not the pstore part, for 3.0-stable,
to fix the compiler warning we had - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a294090839 upstream.
Fix the string functions in the efivars driver to be called utf16_*
instead of utf8_* as the encoding is utf16, not utf8.
As well, rename utf16_strlen to utf16_strnlen as it takes a maxlength
argument and the name should be consistent with the standard C function
names. utf16_strlen is still provided for convenience in a subsequent
patch.
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d1d865181 upstream.
Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:
sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)
Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1699490db3 upstream.
If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery. Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.
Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a1c53124a upstream.
TPIDRURW is a user read/write register forming part of the group of
thread registers in more recent versions of the ARM architecture (~v6+).
Currently, the kernel does not touch this register, which allows tasks
to communicate covertly by reading and writing to the register without
context-switching affecting its contents.
This patch clears TPIDRURW when TPIDRURO is updated via the set_tls
macro, which is called directly from __switch_to. Since the current
behaviour makes the register useless to userspace as far as thread
pointers are concerned, simply clearing the register (rather than saving
and restoring it) will not cause any problems to userspace.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64f371bc31 upstream.
The autofs packet size has had a very unfortunate size problem on x86:
because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
because the packet data was not 8-byte aligned, the size of the autofsv5
packet structure differed between 32-bit and 64-bit modes despite
looking otherwise identical (300 vs 304 bytes respectively).
We first fixed that up by making the 64-bit compat mode know about this
problem in commit a32744d4ab ("autofs: work around unhappy compat
problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
64-bit kernel because everything then worked the same way as on a 32-bit
kernel.
But it turned out that 'automount' had actually known and worked around
this problem in user space, so fixing the kernel to do the proper 32-bit
compatibility handling actually *broke* 32-bit automount on a 64-bit
kernel, because it knew that the packet sizes were wrong and expected
those incorrect sizes.
As a result, we ended up reverting that compatibility mode fix, and
thus breaking systemd again, in commit fcbf94b9de.
With both automount and systemd doing a single read() system call, and
verifying that they get *exactly* the size they expect but using
different sizes, it seemed that fixing one of them inevitably seemed to
break the other. At one point, a patch I seriously considered applying
from Michael Tokarev did a "strcmp()" to see if it was automount that
was doing the operation. Ugly, ugly.
However, a prettier solution exists now thanks to the packetized pipe
mode. By marking the communication pipe as being packetized (by simply
setting the O_DIRECT flag), we can always just write the bigger packet
size, and if user-space does a smaller read, it will just get that
partial end result and the extra alignment padding will simply be thrown
away.
This makes both automount and systemd happy, since they now get the size
they asked for, and the kernel side of autofs simply no longer needs to
care - it could pad out the packet arbitrarily.
Of course, if there is some *other* user of autofs (please, please,
please tell me it ain't so - and we haven't heard of any) that tries to
read the packets with multiple writes, that other user will now be
broken - the whole point of the packetized mode is that one system call
gets exactly one packet, and you cannot read a packet in pieces.
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9883035ae7 upstream.
The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.
When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own. The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).
End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time. You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.
NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops. Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.
The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes). But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f6543f53f upstream.
The field is used to pass the UVC request data length, but can also be
used to signal an error when setting it to a negative value. Switch from
unsigned int to __s32.
Reported-by: Fernandez Gonzalo <gfernandez@copreci.es>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c85dcdac58 upstream.
This patch (as1539) fixes a minor bug in the mass-storage gadget
drivers. When an unknown command is received, the error code sent
back is "Invalid Field in CDB" rather than "Invalid Command". This is
because the bitmask of CDB bytes allowed to be nonzero is incorrect.
When handling an unknown command, we don't care which command bytes
are nonzero. All the bits in the mask should be set, not just eight
of them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 151b612847 upstream.
This patch (as1545) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers. Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.
After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state. Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.
The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep. Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.
This fixes Bugzilla #42728.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Oleksij Rempel (fishor) <bug-track@fisher-privat.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c22837adc upstream.
This patch fixes a race whereby a pointer to a buffer
would be overwritten while the buffer was in use leading
to a double free and a memory leak. This causes crashes.
This bug was introduced in 2.6.34
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04da6e9d63 upstream.
nfsd_open() already returns an NFS error value; only vfs_test_lock()
result needs to be fed through nfserrno(). Broken by commit 55ef12
(nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT)
three years ago...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b89152824f upstream.
This was broken by me in 37865fe915
("mmc: sdhci-esdhc-imx: fix timeout on i.MX's sdhci") where more
extensive tests would have shown that read or write of data to the
card were failing (even if the partition table was correctly read).
Signed-off-by: Eric Bénard <eric@eukrea.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32f6daad46 upstream.
We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown. A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.
Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa. This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e88aa7bbbe upstream.
The symbol table on x86-64 starts to have entries that have names
like:
_GLOBAL__sub_I_65535_0___mod_x86cpu_device_table
They are of type STT_FUNCTION and this one had a length of 18. This
matched the device ID validation logic and it barfed because the
length did not meet the device type's criteria.
--------------------
FATAL: arch/x86/crypto/aesni-intel: sizeof(struct x86cpu_device_id)=16 is not a modulo of the size of section __mod_x86cpu_device_table=18.
Fix definition of struct x86cpu_device_id in mod_devicetable.h
--------------------
These are some kind of compiler tool internal stuff being emitted and
not something we want to inspect in modpost's device ID table
validation code.
So skip the symbol if it is not of type STT_OBJECT.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit badc4f0762 upstream.
There have been reports about not being able to use access-points
on channel 12 and 13 or having connectivity issues when these channels
were part of the selected regulatory domain. Upon switching to these
channels the brcmsmac driver suspends the transmit dma fifos. This
patch resumes them upon handing over the first received beacon to
mac80211.
This patch is to be applied to the stable tree for kernel versions
3.2 and 3.3.
Tested-by: Francesco Saverio Schiavarelli <fschiava@libero.it>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Reviewed-by: Brett Rudley <brudley@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc75ce9d92 upstream.
This patch (as1542) changes the criterion ehci-hcd uses to tell when
it needs to resume the controller's root hub. A resume is needed when
a port status change is detected, obviously, but only if the root hub
is currently suspended.
Right now the driver tests whether the root hub is running, and that
is not the correct test. In particular, if the controller has died
then the root hub should not be restarted. In addition, some buggy
hardware occasionally requires the root hub to be running and
sending out SOF packets even while it is nominally supposed to be
suspended.
In the end, the test needs to be changed. Rather than checking whether
the root hub is currently running, the driver will now check whether
the root hub is currently suspended. This will yield the correct
behavior in all cases.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Peter Chen <B29397@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 2b5f8b0b44 upstream.
[backported by Ben Greear]
The nl80211 handling code should ensure as much as
it can that the interface is in a valid state, it
can certainly ensure the interface is running.
Not doing so can cause calls through mac80211 into
the driver that result in warnings and unspecified
behaviour in the driver.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44afb3a043 upstream.
On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed8cd3b2cd upstream.
On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.
This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6651819b4b upstream.
We seem to have a decent confusion between the output timings and the
input timings of the sdvo encoder. If I understand the code correctly,
we use the original mode unchanged for the output timings, safe for
the lvds case. And we should use the adjusted mode for input timings.
Clarify the situation by adding an explicit output_dtd to the sdvo
mode_set function and streamline the code-flow by moving the input and
output mode setting in the sdvo encode together.
Furthermore testing showed that the sdvo input timing needs the
unadjusted dotclock, the sdvo chip will automatically compute the
required pixel multiplier to get a dotclock above 100 MHz.
Fix this up when converting a drm mode to an sdvo dtd.
This regression was introduced in
commit c74696b9c8
Author: Pavel Roskin <proski@gnu.org>
Date: Thu Sep 2 14:46:34 2010 -0400
i915: revert some checks added by commit 32aad86f
particularly the following hunk:
# diff --git a/drivers/gpu/drm/i915/intel_sdvo.c
# b/drivers/gpu/drm/i915/intel_sdvo.c
# index 093e914..62d22ae 100644
# --- a/drivers/gpu/drm/i915/intel_sdvo.c
# +++ b/drivers/gpu/drm/i915/intel_sdvo.c
# @@ -1122,11 +1123,9 @@ static void intel_sdvo_mode_set(struct drm_encoder *encoder,
#
# /* We have tried to get input timing in mode_fixup, and filled into
# adjusted_mode */
# - if (intel_sdvo->is_tv || intel_sdvo->is_lvds) {
# - intel_sdvo_get_dtd_from_mode(&input_dtd, adjusted_mode);
# + intel_sdvo_get_dtd_from_mode(&input_dtd, adjusted_mode);
# + if (intel_sdvo->is_tv || intel_sdvo->is_lvds)
# input_dtd.part2.sdvo_flags = intel_sdvo->sdvo_flags;
# - } else
# - intel_sdvo_get_dtd_from_mode(&input_dtd, mode);
#
# /* If it's a TV, we already set the output timing in mode_fixup.
# * Otherwise, the output timing is equal to the input timing.
Due to questions raised in review, below a more elaborate analysis of
the bug at hand:
Sdvo seems to have two timings, one is the output timing which will be
sent over whatever is connected on the other side of the sdvo chip (panel,
hdmi screen, tv), the other is the input timing which will be generated by
the gmch pipe. It looks like sdvo is expected to scale between the two.
To make things slightly more complicated, we have a bunch of special
cases:
- For lvds panel we always use a fixed output timing, namely
intel_sdvo->sdvo_lvds_fixed_mode, hence that special case.
- Sdvo has an interface to generate a preferred input timing for a given
output timing. This is the confusing thing that I've tried to clear up
with the follow-on patches.
- A special requirement is that the input pixel clock needs to be between
100MHz and 200MHz (likely to keep it within the electromechanical design
range of PCIe), 270MHz on later gen4+. Lower pixel clocks are
doubled/quadrupled.
The thing this patch tries to fix is that the pipe needs to be
explicitly instructed to double/quadruple the pixels and needs the
correspondingly higher pixel clock, whereas the sdvo adaptor seems to
do that itself and needs the unadjusted pixel clock. For the sdvo
encode side we already set the pixel mutliplier with a different
command (0x21).
This patch tries to fix this mess by:
- Keeping the output mode timing in the unadjusted plain mode, safe
for the lvds case.
- Storing the input timing in the adjusted_mode with the adjusted
pixel clock. This way we don't need to frob around with the core
crtc mode set code.
- Fixing up the pixelclock when constructing the sdvo dtd timing
struct. This is why the first hunk of the patch is an integral part
of the series.
- Dropping the is_tv special case because input_dtd is equivalent to
adjusted_mode after these changes. Follow-up patches clear this up
further (by simply ripping out intel_sdvo->input_dtd because it's
not needed).
v2: Extend commit message with an in-depth bug analysis.
Reported-and-Tested-by: Bernard Blackham <b-linuxgit@largestprime.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48157
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00250ec909 upstream.
Newer BKDG[1] versions recommend a different initialization value for
the running average range register in the northbridge. This improves
the power reading by avoiding counter saturations resulting in bogus
values for anything below about 80% of TDP power consumption.
Updated BIOSes will have this new value set up from the beginning,
but meanwhile we correct this value ourselves.
This needs to be done on all northbridges, even on those where the
driver itself does not register at.
This fixes the driver on all current machines to provide proper
values for idle load.
[1]
http://support.amd.com/us/Processor_TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf
Chapter 3.8: D18F5xE0 Processor TDP Running Average (p. 452)
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
[guenter.roeck@ericsson.com: Removed unnecessary return statement]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed8b0d67f3 upstream.
This loop on EBCISR register was designed to clear IRQ sources before enabling
a DMA channel. This register is clear-on-read so a race condition can appear if
another channel is already active and has just finished its transfer.
Removing this read on EBCISR is fixing the issue as there is no case where an IRQ
could be pending: we already make sure that this register is drained at probe()
time and during resume.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e1f7c8a6e upstream.
Line widgets had not been included in either the power up or power down
sequences so if a widget had an event associated with it that event would
never be run. Fix this minimally by adding them to the sequences, we
should probably be doing away with the specific widget types as they all
have the same priority anyway.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf405ae612 upstream.
When we boot on a machine that can hotplug CPUs and we
are using 'dom0_max_vcpus=X' on the Xen hypervisor line
to clip the amount of CPUs available to the initial domain,
we get this:
(XEN) Command line: com1=115200,8n1 dom0_mem=8G noreboot dom0_max_vcpus=8 sync_console mce_verbosity=verbose console=com1,vga loglvl=all guest_loglvl=all
.. snip..
DMI: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x032.072520111118 07/25/2011
.. snip.
SMP: Allowing 64 CPUs, 32 hotplug CPUs
installing Xen timer for CPU 7
cpu 7 spinlock event irq 361
NMI watchdog: disabled (cpu7): hardware events not enabled
Brought up 8 CPUs
.. snip..
[acpi processor finds the CPUs are not initialized and starts calling
arch_register_cpu, which creates /sys/devices/system/cpu/cpu8/online]
CPU 8 got hotplugged
CPU 9 got hotplugged
CPU 10 got hotplugged
.. snip..
initcall 1_acpi_battery_init_async+0x0/0x1b returned 0 after 406 usecs
calling erst_init+0x0/0x2bb @ 1
[and the scheduler sticks newly started tasks on the new CPUs, but
said CPUs cannot be initialized b/c the hypervisor has limited the
amount of vCPUS to 8 - as per the dom0_max_vcpus=8 flag.
The spinlock tries to kick the other CPU, but the structure for that
is not initialized and we crash.]
BUG: unable to handle kernel paging request at fffffffffffffed8
IP: [<ffffffff81035289>] xen_spin_lock+0x29/0x60
PGD 180d067 PUD 180e067 PMD 0
Oops: 0002 [#1] SMP
CPU 7
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc2upstream-00001-gf5154e8 #1 Intel Corporation S2600CP/S2600CP
RIP: e030:[<ffffffff81035289>] [<ffffffff81035289>] xen_spin_lock+0x29/0x60
RSP: e02b:ffff8801fb9b3a70 EFLAGS: 00010282
With this patch, we cap the amount of vCPUS that the initial domain
can run, to exactly what dom0_max_vcpus=X has specified.
In the future, if there is a hypercall that will allow a running
domain to expand past its initial set of vCPUS, this patch should
be re-evaluated.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7eb7ce4d2e upstream.
In xen_restore_fl_direct(), xen_force_evtchn_callback() was being
called even if no events were pending. This resulted in (depending on
workload) about a 100 times as many xen_version hypercalls as
necessary.
Fix this by correcting the sense of the conditional jump.
This seems to give a significant performance benefit for some
workloads.
There is some subtle tricksy "..since the check here is trying to
check both pending and masked in a single cmpw, but I think this is
correct. It will call check_events now only when the combined
mask+pending word is 0x0001 (aka unmasked, pending)." (Ian)
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcbf94b9de upstream.
This reverts commit a32744d4ab.
While that commit was technically the right thing to do, and made the
x86-64 compat mode work identically to native 32-bit mode (and thus
fixing the problem with a 32-bit systemd install on a 64-bit kernel), it
turns out that the automount binaries had workarounds for this compat
problem.
Now, the workarounds are disgusting: doing an "uname()" to find out the
architecture of the kernel, and then comparing it for the 64-bit cases
and fixing up the size of the read() in automount for those. And they
were confused: it's not actually a generic 64-bit issue at all, it's
very much tied to just x86-64, which has different alignment for an
'u64' in 64-bit mode than in 32-bit mode.
But the end result is that fixing the compat layer actually breaks the
case of a 32-bit automount on a x86-64 kernel.
There are various approaches to fix this (including just doing a
"strcmp()" on current->comm and comparing it to "automount"), but I
think that I will do the one that teaches pipes about a special "packet
mode", which will allow user space to not have to care too deeply about
the padding at the end of the autofs packet.
That change will make the compat workaround unnecessary, so let's revert
it first, and get automount working again in compat mode. The
packetized pipes will then fix autofs for systemd.
Reported-and-requested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbf2829b61 upstream.
Current APIC code assumes MSR_IA32_APICBASE is present for all systems.
Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE
was introduced as an architectural MSR by Intel @ P6.
Code paths that can touch this MSR invalidly are when vendor == Intel &&
cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass
lapic on the kernel command line, on a P5.
The below patch stops Linux incorrectly interfering with the
MSR_IA32_APICBASE for P5 class machines. Other code paths exist that
touch the MSR - however those paths are not currently reachable for a
conformant P5.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linux.intel.com>
Link: http://lkml.kernel.org/r/4F8EEDD3.1080404@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55725513b5 upstream.
Since we may be simulating flock() locks using NFS byte range locks,
we can't rely on the VFS having checked the file open mode for us.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05ffe24f52 upstream.
All callers of nfs4_handle_exception() that need to handle
NFS4ERR_OPENMODE correctly should set exception->inode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98a2139f4f upstream.
When hostname contains colon (e.g. when it is an IPv6 address) it needs
to be enclosed in brackets to make parsing of NFS device string possible.
Fix nfs_do_root_mount() to enclose hostname properly when needed. NFS code
actually does not need this as it does not parse the string passed by
nfs_do_root_mount() but the device string is exposed to userspace in
/proc/mounts.
CC: Josh Boyer <jwboyer@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d135c522f1 ]
Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6
TCP server sockets that used TCP_MAXSEG would find that the advmss of
child sockets would be incorrect. This commit mirrors the advmss logic
from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this
logic should probably be shared between IPv4 and IPv6, but this at
least fixes this issue.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3adadc08cc ]
While reviewing the sysctl code in ax25 I spotted races in ax25_exit
where it is possible to receive notifications and packets after already
freeing up some of the data structures needed to process those
notifications and updates.
Call unregister_netdevice_notifier early so that the rest of the cleanup
code does not need to deal with network devices. This takes advantage
of my recent enhancement to unregister_netdevice_notifier to send
unregister notifications of all network devices that are current
registered.
Move the unregistration for packet types, socket types and protocol
types before we cleanup any of the ax25 data structures to remove the
possibilities of other races.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 716af4abd6 ]
MAX_ADDR_LEN is 32. ETH_ALEN is 6. mac->sa_data is a 14 byte array, so
the memcpy() is doing a read past the end of the array. I asked about
this on netdev and Ben Hutchings told me it's supposed to be copying
ETH_ALEN bytes (thanks Ben).
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b922934d01 ]
ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4d846f0239 ]
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.
tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.
This patch fixes one of the performance issue we noticed with GRO on.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 890fdf2a0c ]
In register_netdevice(), when ndo_init() is successful and later
some error occurred, ndo_uninit() will be called.
So dummy deivce is desirable to implement ndo_uninit() method
to free percpu stats for this case.
And, ndo_uninit() is also called along with dev->destructor() when
device is unregistered, so in order to prevent dev->dstats from
being freed twice, dev->destructor is modified to free_netdev().
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a99ff7d012 ]
Make smsc75xx recalculate the hard_mtu after adjusting the
hard_header_len.
Without this, usbnet adjusts the MTU down to 1492 bytes, and the host is
unable to receive standard 1500-byte frames from the device.
Inspired by same fix on cdc_eem 78fb72f793.
Tested on ARM/Omap3 with EVB-LAN7500-LC.
Signed-off-by: Stephane Fillod <fillods@users.sf.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 244b65dbfe ]
A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).
However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:
tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS
This fixes gred_dump() in WRED mode to use the values held in wred_set.
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a9a0ea603 ]
At the beginning of ks_rcv(), a for loop retrieves the
header information relevant to all the frames stored
in the mac's internal buffers. The number of pending
frames is stored as an 8 bits field in KS_RXFCTR.
If interrupts are disabled long enough to allow for more than
32 frames to accumulate in the MAC's internal buffers, a buffer
overflow occurs.
This patch fixes the problem by making the
driver's frame_head_info buffer big enough.
Well actually, since the chip appears to have 12K of
internal rx buffers and the shortest ethernet frame should
be 64 bytes long, maybe the limit could be set to
12*1024/64 = 192 frames, but 255 should be safer.
Signed-off-by: Davide Ciminaghi <ciminaghi@gnudd.com>
Signed-off-by: Raffaele Recalcati <raffaele.recalcati@bticino.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c5e979bd0 ]
The SMSC911x driver resets the ->head, ->data and ->tail pointers in the
skb on the reset path in order to avoid buffer overflow due to packet
padding performed by the hardware.
This patch fixes the receive path so that the skb pointers are fixed up
after the data has been read from the device, The error path is also
fixed to use number of words consistently and prevent erroneous FIFO
fastforwarding when skipping over bad data.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a8c9cb106f ]
We set intr mask before its handler is registered, this does not work well when
8139cp is sharing irq line with other devices. As the irq could be enabled by
the device before 8139cp's hander is registered which may lead unhandled
irq. Fix this by introducing an helper cp_irq_enable() and call it after
request_irq().
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 03662e41c7 ]
Problem:
There was two separate work_struct structures which share one
handler. Unfortunately getting atl1_adapter structure from
work_struct in case of DMA error was done from incorrect
offset which cause kernel panics.
Solution:
The useless work_struct for DMA error removed and
handler name changed to more generic one.
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 18a223e0b9 ]
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
unscaled RTT samples.
The intent in the code was to only use the 'm' measurement if it was a
new minimum. However, since 'm' had not yet been shifted left 3 bits
but 'new_sample' had, this comparison would nearly always succeed,
leading us to erroneously set our receive-side RTT estimate to the 'm'
sample when that sample could be nearly 8x too high to use.
The overall effect is to often cause the receive-side RTT estimate to
be significantly too large (up to 40% too large for brief periods in
my tests).
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 110c43304d ]
As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a7e7c2ad5 ]
As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4eee6a3a04 ]
This happened on a machine with a custom hotplug script calling nameif,
probably due to slow firmware loading. At the time nameif uses ethtool
to gather interface information, i2400m->fw_name is zero and so a null
pointer dereference occurs from within i2400m_get_drvinfo().
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a4309746c ]
When a slave comes up, we're unsetting the current_arp_slave without
removing active flags from it, which can lead to situations where we have
more than one slave with active flags in active-backup mode.
To avoid this situation we must remove the active flags from a slave before
removing it as a current_arp_slave.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78d50217ba ]
Convert array index from the loop bound to the loop index.
And remove the void type conversion to ip6_mc_del1_src() return
code, seem it is unnecessary, since ip6_mc_del1_src() does not
use __must_check similar attribute, no compiler will report the
warning when it is removed.
v2: enrich the commit header
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 996304bbea ]
As it stands the bridge IGMP snooping system will respond to
group leave messages with queries for remaining membership.
This is both unnecessary and undesirable. First of all any
multicast routers present should be doing this rather than us.
What's more the queries that we send may end up upsetting other
multicast snooping swithces in the system that are buggy.
In fact, we can simply remove the code that send these queries
because the existing membership expiry mechanism doesn't rely
on them anyway.
So this patch simply removes all code associated with group
queries in response to group leave messages.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acdd598536 ]
getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
an error if the user provides less bytes than the size of struct
sctp_event_subscribe.
Struct sctp_event_subscribe needs to be extended by an u8 for every
new event or notification type that is added.
This obviously makes getsockopt fail for binaries that are compiled
against an older versions of <net/sctp/user.h> which do not contain
all event types.
This patch changes getsockopt behaviour to no longer return an error
if not enough bytes are being provided by the user. Instead, it
returns as much of sctp_event_subscribe as fits into the provided buffer.
This leads to the new behavior that users see what they have been aware
of at compile time.
The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ This combines upstream commit
2f53384424 and the follow-on bug fix
commit 35f9c09fe9 ]
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ This combines upstream commit
e675f0cc9a and follow-on bug fix
commit 9a5d2bd99e ]
For every transmitted packet, ppp_start_xmit() will stop the netdev
queue and then, if appropriate, restart it. This causes the TX softirq
to run, entirely gratuitously.
This is "only" a waste of CPU time in the normal case, but it's actively
harmful when the PPP device is a TEQL slave — the wakeup will cause the
offending device to receive the next TX packet from the TEQL queue, when
it *should* have gone to the next slave in the list. We end up seeing
large bursts of packets on just *one* slave device, rather than using
the full available bandwidth over all slaves.
This patch fixes the problem by *not* unconditionally stopping the queue
in ppp_start_xmit(). It adds a return value from ppp_xmit_process()
which indicates whether the queue should be stopped or not.
It *doesn't* remove the call to netif_wake_queue() from
ppp_xmit_process(), because other code paths (especially from
ppp_output_wakeup()) need it there and it's messy to push it out to the
other callers to do it based on the return value. So we leave it in
place — it's a no-op in the case where the queue wasn't stopped, so it's
harmless in the TX path.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ed3cf2cdf upstream.
->root_flags is __le64 and all accesses to it go through the helpers
that do proper conversions. Except for btrfs_root_readonly(), which
checks bit 0 as in host-endian...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efe39651f0 upstream.
Restore the original logics ("fail on mountpoints, negatives and in
case of fh_compose() failures"). Since commit 8177e (nfsd: clean up
readdirplus encoding) that got broken -
rv = fh_compose(fhp, exp, dchild, &cd->fh);
if (rv)
goto out;
if (!dchild->d_inode)
goto out;
rv = 0;
out:
is equivalent to
rv = fh_compose(fhp, exp, dchild, &cd->fh);
out:
and the second check has no effect whatsoever...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43bf8c2452 upstream.
Belkin's Connect N150 Wireless USB Adapter, model F7D1101 version 2, uses ID 0x945b.
Chipset info: rt: 3390, rf: 000b, rev: 3213.
I have just bought one, which started to work perfectly after the ID was added through this patch.
Signed-off-by: Eduardo Bacchi Kienetz <eduardo@kienetz.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 178db7d30f upstream.
Device are added as children of the bus master's parent device, but
spi_unregister_master() looks for devices to unregister in the bus
master's children. This results in the child devices not being
unregistered.
Fix this by registering devices as direct children of the bus master.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Takahiro AKASHI <akashi@jp.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93dc6107a7 upstream.
Commit 28d82dc1c4 ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).
The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.
This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f67fd55fa9 upstream.
Some BIOS implementations leave the Intel GPU interrupts enabled,
even though no one is handling them (f.e. i915 driver is never loaded).
Additionally the interrupt destination is not set up properly
and the interrupt ends up -somewhere-.
These spurious interrupts are "sticky" and the kernel disables
the (shared) interrupt line after 100.000+ generated interrupts.
Fix it by disabling the still enabled interrupts.
This resolves crashes often seen on monitor unplug.
Tested on the following boards:
- Intel DH61CR: Affected
- Intel DH67BL: Affected
- Intel S1200KP server board: Affected
- Asus P8H61-M LE: Affected, but system does not crash.
Probably the IRQ ends up somewhere unnoticed.
According to reports on the net, the Intel DH61WW board is also affected.
Many thanks to Jesse Barnes from Intel for helping
with the register configuration and to Intel in general
for providing public hardware documentation.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Charlie Suffin <charlie.suffin@stratus.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad579699c4 upstream.
pm_runtime_get_sync returns a signed integer. In case of errors
it returns a negative value. This patch fixes the error check
by making it signed instead of unsigned thus preventing register
access if get_sync_fails. Also passes the error cause to the
debug message.
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3006dc8c62 upstream.
pm_runtime_enable is being called after omap2430_musb_init. Hence
pm_runtime_get_sync in omap2430_musb_init does not have any effect (does
not enable clocks) resulting in a crash during register access. It is
fixed here.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92b0abf80c upstream.
usb: gadget: eliminate NULL pointer dereference (bugfix)
This patch fixes a bug which causes NULL pointer dereference in
ffs_ep0_ioctl. The bug happens when the FunctionFS is not bound (either
has not been bound yet or has been bound and then unbound) and can be
reproduced with running the following commands:
$ insmod g_ffs.ko
$ mount -t functionfs func /dev/usbgadget
$ ./null
where null.c is:
#include <fcntl.h>
#include <linux/usb/functionfs.h>
int main(void)
{
int fd = open("/dev/usbgadget/ep0", O_RDWR);
ioctl(fd, FUNCTIONFS_CLEAR_HALT);
return 0;
}
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8963c487a8 upstream.
This patch (as154) fixes a self-deadlock that occurs when userspace
writes to the bConfigurationValue sysfs attribute for a hub with
children. The task tries to lock the bandwidth_mutex at a time when
it already owns the lock:
The attribute's method calls usb_set_configuration(),
which calls usb_disable_device() with the bandwidth_mutex
held.
usb_disable_device() unregisters the existing interfaces,
which causes the hub driver to be unbound.
The hub_disconnect() routine calls hub_quiesce(), which
calls usb_disconnect() for each of the hub's children.
usb_disconnect() attempts to acquire the bandwidth_mutex
around a call to usb_disable_device().
The solution is to make usb_disable_device() acquire the mutex for
itself instead of requiring the caller to hold it. Then the mutex can
cover only the bandwidth deallocation operation and not the region
where the interfaces are unregistered.
This has the potential to change system behavior slightly when a
config change races with another config or altsetting change. Some of
the bandwidth released from the old config might get claimed by the
other config or altsetting, make it impossible to restore the old
config in case of a failure. But since we don't try to recover from
config-change failures anyway, this doesn't matter.
[This should be marked for stable kernels that contain the commit
fccf4e8620 "USB: Free bandwidth when
usb_disable_device is called."
That commit was marked for stable kernels as old as 2.6.32.]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2fbe2bf1fd upstream.
This patch (as1544) fixes a problem affecting some EHCI controllers.
They can generate interrupts whenever the STS_FLR status bit is turned
on, even though that bit is masked out in the Interrupt Enable
register.
Since the driver doesn't use STS_FLR anyway, the patch changes the
interrupt routine to clear that bit whenever it is set, rather than
leaving it alone.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 749541d19e upstream.
These devices have a number of non serial interfaces as well. Use
the existing "Direct IP" blacklist to prevent binding to interfaces
which are handled by other drivers.
We also extend the "Direct IP" blacklist with with interfaces only
seen in "QMI" mode, assuming that these devices use the same
interface numbers for serial interfaces both in "Direct IP" and in
"QMI" mode.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af6d17cdc8 upstream.
This driver anticipates pch_uart_verify_port() is not called
during installation.
However, actually pch_uart_verify_port() is called during
installation.
As a result, memory access violation occurs like below.
0. initial value: use_dma=0
1. starup()
- dma channel is not allocated because use_dma=0
2. pch_uart_verify_port()
- Set use_dma=1
3. UART processing acts DMA mode because use_dma=1
- memory access violation occurs!
This patch fixes the issue.
Solution:
Whenever pch_uart_verify_port() is called and then
dma channel is not allocated, the channel should be allocated.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d5733fcd3 upstream.
Fixed too small hardcoded timeout values for usb_control_msg
in driver for SiliconLabs cp210x-based usb-to-serial adapters.
Replaced with USB_CTRL_GET_TIMEOUT/USB_CTRL_SET_TIMEOUT.
Signed-off-by: Yuri Matylitski <ym@tekinsoft.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99aa784667 upstream.
flush request is issued in transaction commit code path, so looks using
GFP_KERNEL to allocate memory for flush request bio falls into the classic
deadlock issue. I saw btrfs and dm get it right, but ext4, xfs and md are
using GFP.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aca50bd3b4 upstream.
Mel reports a BUG_ON(slot == NULL) in radix_tree_tag_set() on s390
3.0.13: called from __set_page_dirty_nobuffers() when page_remove_rmap()
tries to transfer dirty flag from s390 storage key to struct page and
radix_tree.
That would be because of reclaim's shrink_page_list() calling
add_to_swap() on this page at the same time: first PageSwapCache is set
(causing page_mapping(page) to appear as &swapper_space), then
page->private set, then tree_lock taken, then page inserted into
radix_tree - so there's an interval before taking the lock when the
radix_tree slot is empty.
We could fix this by moving __add_to_swap_cache()'s spin_lock_irq up
before the SetPageSwapCache. But a better fix is simply to do what's
five years overdue: Ken Chen introduced __set_page_dirty_no_writeback()
(if !PageDirty TestSetPageDirty) for tmpfs to skip all the radix_tree
overhead, and swap is just the same - it ignores the radix_tree tag, and
does not participate in dirty page accounting, so should be using
__set_page_dirty_no_writeback() too.
s390 testing now confirms that this does indeed fix the problem.
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9b786955f upstream.
Setting the correct mode is required by rc-core or scancodes won't be
generated (which isn't very user-friendly).
This one-line fix should be suitable for 3.4-rc2.
Signed-off-by: David Härdeman <david@hardeman.nu>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b76d0600b upstream.
Under heavy load (flood ping) it is possible for the MDIO timeout to
expire before the loop checks the GO bit again. This patch adds an
additional check whether the operation was done before actually
returning -ETIMEDOUT.
To reproduce this bug, flood ping the device, e.g., ping -f -l 1000
After some time, a "timed out waiting for user access" warning
may appear. And even worse, link may go down since the PHY reported a
timeout.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5bd7b419ef upstream.
Fatal errors such as a device disconnect must not trigger
error handling. The error returns must be checked.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9426cd0568 upstream.
del_timer_sync() cannot be used in interrupt.
Replace it with del_timer() and a flag
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 532f17b5d5 upstream.
Current probing code is setting URB_NO_TRANSFER_DMA_MAP flag into a wrong urb
structure, and this causes BUG_ON with some USB host implementations.
This patch fixes the issue.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 523fc5c14f upstream.
Removes allocation of coherent buffer for the control-request setup-packet
buffer from the yurex driver. Using coherent buffers for setup-packet is
obsolete and does not work with some USB host implementations.
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3066616ce2 upstream.
A rather annoying and common case is when booting a PVonHVM guest
and exposing the PV KBD and PV VFB - as broken toolstacks don't
always initialize the backends correctly.
Normally The HVM guest is using the VGA driver and the emulated
keyboard for this (though upstream version of QEMU implements
PV KBD, but still uses a VGA driver). We provide a very basic
two-stage wait mechanism - where we wait for 30 seconds for all
devices, and then for 270 for all them except the two mentioned.
That allows us to wait for the essential devices, like network
or disk for the full 6 minutes.
To trigger this, put this in your guest config:
vfb = [ 'vnc=1, vnclisten=0.0.0.0 ,vncunused=1']
instead of this:
vnc=1
vnclisten="0.0.0.0"
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v3: Split delay in non-essential (30 seconds) and essential
devices per Ian and Stefano suggestion]
[v4: Added comments per Stefano suggestion]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8e937be97 upstream.
Since we are using the m2p_override we do have struct pages
corresponding to the user vma mmap'ed by gntdev.
Removing the VM_PFNMAP flag makes get_user_pages work on that vma.
An example test case would be using a Xen userspace block backend
(QDISK) on a file on NFS using O_DIRECT.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7a6fbc9a88 upstream.
Since 2.6.30-rc1 clps711x serial driver hungs system. This is a result
of call disable_irq from ISR. synchronize_irq waits for end of interrupt
and goes to infinite loop. This patch fix this problem.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca3649de02 upstream.
Some output pins on Conexant chips have no HP control bit, but the
auto-parser initializes these pins unconditionally with PIN_HP.
Check the pin-capability and avoid the HP bit if not supported.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25c3d30c91 upstream.
The current code only increments the upper 64 bits of the SHA-512 byte
counter when the number of bytes hashed happens to hit 2^64 exactly.
This patch increments the upper 64 bits whenever the lower 64 bits
overflows.
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Patch not needed upstream as this is a backport build bugfix - gregkh
gcc correctly complains:
util/hist.c: In function ‘__hists__add_entry’:
util/hist.c:240:27: error: invalid type argument of ‘->’ (have ‘struct hist_entry’)
util/hist.c:241:23: error: invalid type argument of ‘->’ (have ‘struct hist_entry’)
for this new code:
+ if (he->ms.map != entry->ms.map) {
+ he->ms.map = entry->ms.map;
+ if (he->ms.map)
+ he->ms.map->referenced = true;
+ }
because "entry" is a "struct hist_entry", not a pointer to a struct.
In mainline, "entry" is a pointer to struct passed as argument to the function.
So this is broken during backporting. But obviously not compile tested.
Signed-off-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Cc: Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd94154cc6 upstream.
Git commit 36409f6353 "use generic RCU
page-table freeing code" introduced a tlb flushing bug. Partially revert
the above git commit and go back to s390 specific page table flush code.
For s390 the TLB can contain three types of entries, "normal" TLB
page-table entries, TLB combined region-and-segment-table (CRST) entries
and real-space entries. Linux does not use real-space entries which
leaves normal TLB entries and CRST entries. The CRST entries are
intermediate steps in the page-table translation called translation paths.
For example a 4K page access in a three-level page table setup will
create two CRST TLB entries and one page-table TLB entry. The advantage
of that approach is that a page access next to the previous one can reuse
the CRST entries and needs just a single read from memory to create the
page-table TLB entry. The disadvantage is that the TLB flushing rules are
more complicated, before any page-table may be freed the TLB needs to be
flushed.
In short: the generic RCU page-table freeing code is incorrect for the
CRST entries, in particular the check for mm_users < 2 is troublesome.
This is applicable to 3.0+ kernels.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a09d431f34 upstream.
When the force changes went in back in 3.3.0, we ended up returning
disconnected in the !force case, and the connected in when forced,
as it hit the hardcoded check.
Fix it so all exits go via the hardcoded check and stop spurious
modesets on platforms with hardcoded EDIDs.
Reported-by: Evan McNabb (Red Hat)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 16a5e32b83 upstream.
My rv515 card is very flaky with msi enabled. Every so often it loses a rearm
and never comes back, manually banging the rearm brings it back.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e363250718 upstream.
The check of the encoder type in the commit [e00e8b5e: drm/radeon/kms:
fix analog load detection on DVI-I connectors] is obviously wrong, and
it's the culprit of the regression on my workstation with DVI-analog
connection resulting in the blank output.
Fixed the typo now.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afbaa90b80 upstream.
If a bitmap is added while the array is active, it is possible
for bitmap_daemon_work to run while the bitmap is being
initialised.
This is particularly a problem if bitmap_daemon_work sees
bitmap->filemap as non-NULL before it has been filled in properly.
So hold bitmap_info.mutex while filling in ->filemap
to prevent problems.
This patch is suitable for any -stable kernel, though it might not
apply cleanly before about 3.1.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b052f4a08 upstream.
Currently, Mode-Control register is accessed by read-modify-write.
According to DMA hardware specifications datasheet, prohibits this method.
Because this register resets to 0 by DMA HW after DMA transfer completes.
Thus, current read-modify-write processing can cause unexpected behavior.
The datasheet says in case of writing Mode-Control register, set the value for only target channel, the others must set '11b'.
e.g. Set DMA0=01b DMA11=10b
CTL0=33333331h
CTL2=00002333h
NOTE:
CTL0 includes DMA0~7 Mode-Control register.
CTL2 includes DMA8~11 Mode-Control register.
This patch modifies the issue.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3d4913cd4 upstream.
ISSUE: In case PCH_DMA with I2S communications with ch8~ch11, sometimes I2S data
is not send correctly.
CAUSE: The following patch I submitted before was not enough modification for
supporting DMA ch8~ch11. The modification for status register of ch8~11 was not
enough.
pch_dma: Support I2S for ML7213 IOH
author Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Mon, 9 May 2011 07:09:38 +0000 (16:09 +0900)
committer Vinod Koul <vinod.koul@intel.com>
Mon, 9 May 2011 11:42:23 +0000 (16:42 +0530)
commit 194f5f2706
tree c9d4903ea0
parent 60092d0bde
This patch fixes the issue.
We can confirm PCH_DMA with I2S communications with ch8~ch11 works well.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64d91cfaad upstream.
Currently, ".setup" function is not set.
As a result, when detecting our IOH's uart device without pch_uart, kernel panic
occurs at the following of pciserial_init_ports().
for (i = 0; i < nr_ports; i++) {
if (quirk->setup(priv, board, &serial_port, i))
break;
So, this patch adds the ".setup" function.
We can use pci_default_setup because our IOH's uart is compatible with 16550.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.lapis-semi.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4b47d243 upstream.
Currently, PCIe bus number is set as fixed value "2".
However, PCIe bus number is not always "2".
This patch sets bus number using probe() parameter.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51b79bee62 upstream.
Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 833310402c upstream.
ISSUE:
USB Suspend interrupts occur frequently.
CAUSE:
When it is called pch_udc_reconnect() in USB Suspend, it repeats reset and
Suspend.
SOLUTION:
pch_udc_reconnect() does not enable all interrupts. When an enumeration event
occurred the driver enables all interrupts.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c575d2d2e upstream.
ISSUE:
After a USB cable is connect/disconnected, the system rarely freezes.
CAUSE:
Since the USB device controller cannot know to disconnect the USB cable, when
it is used without detecting VBUS by GPIO, the UDC driver does not notify to
USB Gadget.
Since USB Gadget cannot know to disconnect, a false setting occurred when the
USB cable is connected/disconnect repeatedly.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84566abba0 upstream.
ISSUE:
After USB Suspend, a system rarely freezes.
CAUSE:
When USB Suspend occurred, the driver is not notifying
a gadget of the event.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c802672cd3 upstream.
ISSUE:
If the return value of pch_udc_pcd_init() is False, the return value of
this function is unsettled.
Since pch_udc_pcd_init() always returns 0, there is not actually the issue.
CAUSE:
If pch_udc_pcd_init() is True, the variable, retval, is not set for an
appropriate value.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c50a3bff0e upstream.
ISSUE:
When the driver notifies a gadget of a disconnect event, a system
rarely freezes.
CAUSE:
When the driver calls dev->driver->disconnect(), it is not calling
spin_unlock().
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9914a0de7a upstream.
Currently, external ROM access is enabled/disabled in probe()/remove().
So, when a buggy software access unanticipated memory area,
in case of enabling this ADE bit,
external ROM memory area can be broken.
This patch enables the ADE bit only accessing external ROM area.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Cc: Masayuki Ohtak <masa-korg@dsn.okisemi.com>
Cc: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Denis Turischev <denis@compulab.co.il>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit dd7d7fea29 upstream.
Only ML7213/ML7223(Bus-n) has this register.
Currently,this driver doesn't care register "FUNCSEL" in suspend/resume.
This patch saves/restores FUNCSEL register only when the device is ML7213 or
ML7223(Bus-n).
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 20ae6d0b30 upstream.
Register "interrupt delay value" is for GbE which is connected to Bus-m of PCIe.
However currently, the value is set for Bus-n.
As a result, the value is not set correctly.
This patch moves setting the value processing of Bus-n to Bus-m.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c7713e7365 upstream.
The xHCI 1.0 spec errata released on June 13, 2011, changes the ordering
that the xHCI registers are saved and restored in. It moves the
interrupt pending (IMAN) and interrupt control (IMOD) registers to be
saved and restored last. I believe that's because the host controller
may attempt to fetch the event ring table when interrupts are
re-enabled. Therefore we need to restore the event ring registers
before we re-enable interrupts.
This should be backported to kernels as old as 2.6.37, that contain the
commit 5535b1d5f8 "USB: xHCI: PCI power
management implementation"
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Elric Fu <elricfu1@gmail.com>
Cc: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ee0a07028 upstream.
Currently the maximum noise floor limit is set as too high (-60dB). The
assumption of having a higher threshold limit is that it would help
de-sensitize the receiver (reduce phy errors) from continuous
interference. But when we have a bursty interference where there are
collisions and then free air time and if the receiver is desensitized too
much, it will miss the normal packets too. Lets make use of chips
specific min, nom and max limits always. This patch helps to improve the
connection stability in congested networks.
Cc: Paul Stewart <pstew@google.com>
Tested-by: Gary Morain <gmorain@google.com>
Signed-off-by: Madhan Jaganathan <madhanj@qca.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[bwh: Backported to 3.0/3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d52fc5dde1 upstream.
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b96fbacda upstream.
Chanho Min reported that when the boot loader transfers
control to the kernel, there may be pending interrupts
causing the UART to lock up in an eternal loop trying to
pick tokens from the FIFO (since the RX interrupt flag
indicates there are tokens) while in practice there are
no tokens - in fact there is only a pending IRQ flag.
This patch address the issue with a combination of two
patches suggested by Russell King that clears and mask
all interrupts at probe() and clears any pending error
and RX interrupts at port startup time.
We suspect the spurious interrupts are a side-effect of
switching the UART from FIFO to non-FIFO mode.
Cc: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
Reported-by: Chanho Min <chanho0207@gmail.com>
Suggested-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Jong-Sung Kim <neidhard.kim@lge.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 457a4f61f9 upstream.
The suspend operation of VIA xHCI host have some issues and
hibernate operation works fine, so The XHCI_RESET_ON_RESUME
quirk is added for it.
This patch should base on "xHCI: Don't write zeroed pointer
to xHC registers" that is released by Sarah. Otherwise, the
host system error will ocurr in the hibernate operation
process.
This should be backported to stable kernels as old as 2.6.37,
that contain the commit c877b3b2ad
"xhci: Add reset on resume quirk for asrock p67 host".
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95018a53f7 upstream.
Re-define XHCI_LEGACY_DISABLE_SMI and used it in right way. All SMI enable
bits will be cleared to zero and flag bits 29:31 are also cleared to zero.
Other bits should be presvered as Table 146.
This patch should be backported to kernels as old as 2.6.31.
Signed-off-by: Alex He <alex.he@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb3d85bc71 upstream.
The xhci_save_registers() function saved the event ring dequeue pointer
in the s3 register structure, but xhci_restore_registers() never
restored it. No other code in the xHCI successful resume path would
ever restore it either. Fix that.
This should be backported to kernels as old as 2.6.37, that contain the
commit 5535b1d5f8 "USB: xHCI: PCI power
management implementation".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Elric Fu <elricfu1@gmail.com>
Cc: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 159e1fcc9a upstream.
When xhci_mem_cleanup() is called, we can't be sure if the xHC is
actually halted. We can ask the xHC to halt by writing to the RUN bit
in the command register, but that might timeout due to a HW hang.
If the host controller is still running, we should not write zeroed
values to the event ring dequeue pointers or base tables, the DCBAA
pointers, or the command ring pointers. Eric Fu reports his VIA VL800
host accesses the event ring pointers after a failed register restore on
resume from suspend. The hypothesis is that the host never actually
halted before the register write to change the event ring pointer to
zero.
Remove all writes of zeroed values to pointer registers in
xhci_mem_cleanup(). Instead, make all callers of the function reset the
host controller first, which will reset those registers to zero.
xhci_mem_init() is the only caller that doesn't first halt and reset the
host controller before calling xhci_mem_cleanup().
This should be backported to kernels as old as 2.6.32.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e833c0b87 upstream.
While we're at that, define IMAN bitfield to aid readability.
The interrupt enable bit should be set once on driver init, and we
shouldn't need to continually re-enable it. Commit c21599a3 introduced
a read of the irq_pending register, and that allows us to preserve the
state of the IE bit. Before that commit, we were blindly writing 0x3 to
the register.
This patch should be backported to kernels as old as 2.6.36, or ones
that contain the commit c21599a361 "USB:
xhci: Reduce reads and writes of interrupter registers".
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcf3985376 upstream.
This patch (as1517b) fixes an error in the USB scatter-gather library.
The library code uses urb->dev to determine whether or nor an URB is
currently active; the completion handler sets urb->dev to NULL.
However the core unlinking routines need to use urb->dev. Since
unlinking always racing with completion, the completion handler must
not clear urb->dev -- it can lead to invalid memory accesses when a
transfer has to be cancelled.
This patch fixes the problem by getting rid of the lines that clear
urb->dev after urb has been submitted. As a result we may end up
trying to unlink an URB that failed in submission or that has already
completed, so an extra check is added after each unlink to avoid
printing an error message when this happens. The checks are updated
in both sg_complete() and sg_cancel(), and the second is updated to
match the first (currently it prints out unnecessary warning messages
if a device is unplugged while a transfer is in progress).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Illia Zaitsev <I.Zaitsev@adbglobal.com>
CC: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce5c985185 upstream.
DTR/RTS should only be raised when changing baudrate from B0 and not on
any baud rate change (> B0).
Reported-by: Søren Holm <sgh@sgh.dk>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f103929f8 upstream.
Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
calling tick_do_update_jiffies64(now).
If we reach this point in the loop it means that we crossed a tick
boundary since we grabbed the "now" timestamp, so at this point "now"
refers to a time in the old jiffy, so using the old value for "now" is
incorrect, and is likely to give us a stale jiffies value.
In particular, the first time through the loop the
tick_do_update_jiffies64(now) call is always a no-op, since the
caller, tick_nohz_restart_sched_tick(), will have already called
tick_do_update_jiffies64(now) with that "now" value.
Note that tick_nohz_stop_sched_tick() already uses the correct
approach: when we notice we cross a jiffy boundary, grab a new
timestamp with ktime_get(), and *then* update jiffies.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63fa471dd4 upstream.
When a process exec()'s, all the maps are retired, but we keep the hist
entries around which hold references to those outdated maps.
If the same library gets mapped in for which we have hist entries, a new
map will be created. But when we take a perf entry hit within that map,
we'll find the existing hist entry with the older map.
This causes symbol translations to be done incorrectly. For example,
the perf entry processing will lookup the correct uptodate map entry and
use that to calculate the symbol and DSO relative address. But later
when we update the histogram we'll translate the address using the
outdated map file instead leading to conditions such as out-of-range
offsets in symbol__inc_addr_samples().
Therefore, update the map of the hist_entry dynamically at lookup/
creation time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20120327.031418.1220315351537060808.davem@davemloft.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc67f63650 upstream.
The total number of scatter gather elements in the CISS command
used by the scsi tape code was being cast to a u8, which can hold
at most 255 scatter gather elements. It should have been cast to
a u16. Without this patch the command gets rejected by the controller
since the total scatter gather count did not add up to the right
value resulting in an i/o error.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 395d287526 upstream.
The default is too small (1024 blocks), use h->cciss_max_sectors (8192 blocks)
Without this change, if you try to set the block size of a tape drive above
512*1024, via "mt -f /dev/st0 setblk nnn" where nnn is greater than 524288,
it won't work right.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e0daff30f upstream.
The DS driver registers as a subsys_initcall() but this can be too
early, in particular this risks registering before we've had a chance
to allocate and setup module_kset in kernel/params.c which is
performed also as a subsyts_initcall().
Register DS using device_initcall() insteal.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d3eeb2ef2 upstream.
The invocation of softirq is now handled by irq_exit(), so there is no
need for sparc64 to invoke it on the trap-return path. In fact, doing so
is a bug because if the trap occurred in the idle loop, this invocation
can result in lockdep-RCU failures. The problem is that RCU ignores idle
CPUs, and the sparc64 trap-return path to the softirq handlers fails to
tell RCU that the CPU must be considered non-idle while those handlers
are executing. This means that RCU is ignoring any RCU read-side critical
sections in those handlers, which in turn means that RCU-protected data
can be yanked out from under those read-side critical sections.
The shiny new lockdep-RCU ability to detect RCU read-side critical sections
that RCU is ignoring located this problem.
The fix is straightforward: Make sparc64 stop manually invoking the
softirq handlers.
Reported-by: Meelis Roos <mroos@linux.ee>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66aebce747 upstream.
The race is as follows:
Suppose a multi-threaded task forks a new process (on cpu A), thus
bumping up the ref count on all the pages. While the fork is occurring
(and thus we have marked all the PTEs as read-only), another thread in
the original process (on cpu B) tries to write to a huge page, taking an
access violation from the write-protect and calling hugetlb_cow(). Now,
suppose the fork() fails. It will undo the COW and decrement the ref
count on the pages, so the ref count on the huge page drops back to 1.
Meanwhile hugetlb_cow() also decrements the ref count by one on the
original page, since the original address space doesn't need it any
more, having copied a new page to replace the original page. This
leaves the ref count at zero, and when we call unlock_page(), we panic.
fork on CPU A fault on CPU B
============= ==============
...
down_write(&parent->mmap_sem);
down_write_nested(&child->mmap_sem);
...
while duplicating vmas
if error
break;
...
up_write(&child->mmap_sem);
up_write(&parent->mmap_sem); ...
down_read(&parent->mmap_sem);
...
lock_page(page);
handle COW
page_mapcount(old_page) == 2
alloc and prepare new_page
...
handle error
page_remove_rmap(page);
put_page(page);
...
fold new_page into pte
page_remove_rmap(page);
put_page(page);
...
oops ==> unlock_page(page);
up_read(&parent->mmap_sem);
The solution is to take an extra reference to the page while we are
holding the lock on it.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c76f39bddb upstream.
Michel Lespinasse cleaned up the futex calling conventions in commit
37a9d912b2 ("futex: Sanitize cmpxchg_futex_value_locked API").
But the ia64 implementation was subtly broken. Gcc does not know that
register "r8" will be updated by the fault handler if the cmpxchg
instruction takes an exception. So it feels safe in letting the
initialization of r8 slide to after the cmpxchg. Result: we always
return 0 whether the user address faulted or not.
Fix by moving the initialization of r8 into the __asm__ code so gcc
won't move it.
Reported-by: <emeric.maschino@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42757
Tested-by: <emeric.maschino@gmail.com>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a partial, self-contained, minimal backport of commit
797fe796c4 upstream which fixes the memory
leak:
Bluetooth: uart-ldisc: Fix memory leak and remove destruct cb
We currently leak the hci_uart object if HCI_UART_PROTO_SET is never set
because the hci-destruct callback will then never be called. This fix
removes the hci-destruct callback and frees the driver internal private
hci_uart object directly on tty-close. We call hci_unregister_dev() here
so the hci-core will never call our callbacks again (except destruct).
Therefore, we can safely free the driver internal data right away and
set the destruct callback to NULL.
Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 078c04545b upstream.
Currently when ThumbEE is not enabled (!CONFIG_ARM_THUMBEE) the ThumbEE
register states are not saved/restored at context switch. The default state
of the ThumbEE Ctrl register (TEECR) allows userspace accesses to the
ThumbEE Base Handler register (TEEHBR). This can cause unexpected behaviour
when people use ThumbEE on !CONFIG_ARM_THUMBEE kernels, as well as allowing
covert communication - eg between userspace tasks running inside chroot
jails.
This patch sets up TEECR in order to prevent user-space access to TEEHBR
when !CONFIG_ARM_THUMBEE. In this case, tasks are sent SIGILL if they try to
access TEEHBR.
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonathan Austin <jonathan.austin@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27c1cbd06a upstream.
The 845g shares the errata with i830 whereby executing a command
within 2 cachelines of the end of the ringbuffer may cause a GPU hang.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18daf1644e upstream
Commit 330605423c fixed l2cap conn establishment for non-ssp remote
devices by not setting HCI_CONN_ENCRYPT_PEND every time conn security
is tested (which was always returning failure on any subsequent
security checks).
However, this broke l2cap conn establishment for ssp remote devices
when an ACL link was already established at SDP-level security. This
fix ensures that encryption must be pending whenever authentication
is also pending.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
commit df91e49477 upstream.
Userspace can pass in arbitrary combinations of MS_* flags to mount().
If both MS_BIND and one of MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE are
passed, device name which should be checked for MS_BIND was not checked because
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE had higher priority than MS_BIND.
If both one of MS_BIND/MS_MOVE and MS_REMOUNT are passed, device name which
should not be checked for MS_REMOUNT was checked because MS_BIND/MS_MOVE had
higher priority than MS_REMOUNT.
Fix these bugs by changing priority to MS_REMOUNT -> MS_BIND ->
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE -> MS_MOVE as with do_mount() does.
Also, unconditionally return -EINVAL if more than one of
MS_SHARED/MS_PRIVATE/MS_SLAVE/MS_UNBINDABLE is passed so that TOMOYO will not
generate inaccurate audit logs, for commit 7a2e8a8f "VFS: Sanity check mount
flags passed to change_mnt_propagation()" clarified that these flags must be
exclusively passed.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ddd592a19 upstream.
Unfortunatly the interrupts for the event log and the
peripheral page-faults are only enabled at boot but not
re-enabled at resume. Fix that.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
[bwh: Backport to 3.0:
- Drop change to PPR log which was added in 3.3
- Source is under arch/x86/kernel]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79549c6dfd upstream.
keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().
However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).
change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2daf26310 upstream.
Added Vendor/Device Id of Motorola Rokr E6 (22b8:6027) so it can be
recognized by the "zaurus" USBNet driver.
Applies to Linux 3.2.13 and 2.6.39.4.
Signed-off-by: Guan Xin <guanx.bac@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f8349e6e9 upstream.
TWL6030 family of PMIC use a shadow interrupt status register
while kernel processes the current interrupt event.
However, any write(0 or 1) to register INT_STS_A, INT_STS_B or
INT_STS_C clears all 3 interrupt status registers.
Since clear of the interrupt is done on 32k clk, depending on I2C
bus speed, we could in-adverently clear the status of a interrupt
status pending on shadow register in the current implementation.
This is due to the fact that multi-byte i2c write operation into
three seperate status register could result in multiple load
and clear of status and result in lost interrupts.
Instead, doing a single byte write to INT_STS_A register with 0x0
will clear all three interrupt status registers without the related
risk.
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9993bc635d upstream.
When a machine boots up, the TSC generally gets reset. However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel. The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec. The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.
We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.
Refactor code to share the calculation with the previous
fix in __cycles_2_ns().
Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a97f4f5e52 upstream.
Carlos was getting
WARNING: at drivers/pci/pci.c:118 pci_ioremap_bar+0x24/0x52()
when probing his sound card, and sound did not work. After adding
pci=use_crs to the kernel command line, no more trouble.
Ok, we can add a quirk. dmidecode output reveals that this is an MSI
MS-7253, for which we already have a quirk, but the short-sighted
author tied the quirk to a single BIOS version, making it not kick in
on Carlos's machine with BIOS V1.2. If a later BIOS update makes it
no longer necessary to look at the _CRS info it will still be
harmless, so let's stop trying to guess which versions have and don't
have accurate _CRS tables.
Addresses https://bugtrack.alsa-project.org/alsa-bug/view.php?id=5533
Also see <https://bugzilla.kernel.org/show_bug.cgi?id=42619>.
Reported-by: Carlos Luna <caralu74@gmail.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8411371709 upstream.
In the spirit of commit 29cf7a30f8 ("x86/PCI: use host bridge _CRS
info on ASUS M2V-MX SE"), this DMI quirk turns on "pci_use_crs" by
default on a board that needs it.
This fixes boot failures and oopses introduced in 3e3da00c01
("x86/pci: AMD one chain system to use pci read out res"). The quirk
is quite targetted (to a specific board and BIOS version) for two
reasons:
(1) to emphasize that this method of tackling the problem one quirk
at a time is a little insane
(2) to give BIOS vendors an opportunity to use simpler tables and
allow us to return to generic behavior (whatever that happens to
be) with a later BIOS update
In other words, I am not at all happy with having quirks like this.
But it is even worse for the kernel not to work out of the box on
these machines, so...
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42619
Reported-by: Svante Signell <svante.signell@telia.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 258f742635 upstream.
Commit f02e8a6596 ("module: Sort exported symbols") sorts symbols
placing each of them in its own elf section. This sorting and merging
into the canonical sections are done by the linker.
Unfortunately modpost to generate Module.symvers file parses vmlinux.o
(which is not linked yet) and all modules object files (which aren't
linked yet). These aren't sanitized by the linker yet. That breaks
modpost that can't detect license properly for modules.
This patch makes modpost aware of the new exported symbols structure.
[ This above is a slightly corrected version of the explanation of the
problem, copied from commit 62a2635610 ("modpost: Fix modpost's
license checking V3"). That commit fixed the problem for module
object files, but not for vmlinux.o. This patch fixes modpost for
vmlinux.o. ]
Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62a2635610 upstream.
The commit f02e8a6 sorts symbols placing each of them in its own elf section.
The sorting and merging into the canonical sections are done by the linker.
Unfortunately modpost to generate Module.symvers file parses vmlinux
(already linked) and all modules object files (which aren't linked yet).
These aren't sanitized by the linker yet. That breaks modpost that can't
detect license properly for modules. This patch makes modpost aware of
the new exported symbols structure.
Thanks to Arnaud Lacombe <lacombar@gmail.com> and Anders Kaseorg
<andersk@ksplice.com> for providing useful suggestions about code.
This work was supported by a hardware donation from the CE Linux Forum.
Reported-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 620f6e8e85 upstream.
Commit bfdc0b4 adds code to restrict access to dmesg_restrict,
however, it incorrectly alters kptr_restrict rather than
dmesg_restrict.
The original patch from Richard Weinberger
(https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
expected, and so the patch seems to have been misapplied.
This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
kptr_restrict, since both are sensitive.
Reported-by: Phillip Lougher <plougher@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3751d3e85c upstream.
There has long been a limitation using software breakpoints with a
kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For
this particular patch, it will apply cleanly and has been tested all
the way back to 2.6.36.
The kprobes code uses the text_poke() function which accommodates
writing a breakpoint into a read-only page. The x86 kgdb code can
solve the problem similarly by overriding the default breakpoint
set/remove routines and using text_poke() directly.
The x86 kgdb code will first attempt to use the traditional
probe_kernel_write(), and next try using a the text_poke() function.
The break point install method is tracked such that the correct break
point removal routine will get called later on.
Cc: x86@kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Inspried-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23bbd8e346 upstream.
The do_fork and sys_open tests have never worked properly on anything
other than a UP configuration with the kgdb test suite. This is
because the test suite did not fully implement the behavior of a real
debugger. A real debugger tracks the state of what thread it asked to
single step and can correctly continue other threads of execution or
conditionally stop while waiting for the original thread single step
request to return.
Below is a simple method to cause a fatal kernel oops with the kgdb
test suite on a 2 processor ARM system:
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
echo V1I1F100 > /sys/module/kgdbts/parameters/kgdbts
Very soon after starting the test the kernel will start warning with
messages like:
kgdbts: BP mismatch c002487c expected c0024878
------------[ cut here ]------------
WARNING: at drivers/misc/kgdbts.c:317 check_and_rewind_pc+0x9c/0xc4()
[<c01f6520>] (check_and_rewind_pc+0x9c/0xc4)
[<c01f595c>] (validate_simple_test+0x3c/0xc4)
[<c01f60d4>] (run_simple_test+0x1e8/0x274)
The kernel will eventually recovers, but the test suite has completely
failed to test anything useful.
This patch implements behavior similar to a real debugger that does
not rely on hardware single stepping by using only software planted
breakpoints.
In order to mimic a real debugger, the kgdb test suite now tracks the
most recent thread that was continued (cont_thread_id), with the
intent to single step just this thread. When the response to the
single step request stops in a different thread that hit the original
break point that thread will now get continued, while the debugger
waits for the thread with the single step pending. Here is a high
level description of the sequence of events.
cont_instead_of_sstep = 0;
1) set breakpoint at do_fork
2) continue
3) Save the thread id where we stop to cont_thread_id
4) Remove breakpoint at do_fork
5) Reset the PC if needed depending on kernel exception type
6) soft single step
7) Check where we stopped
if current thread != cont_thread_id {
if (here for more than 2 times for the same thead) {
### must be a really busy system, start test again ###
goto step 1
}
goto step 5
} else {
cont_instead_of_sstep = 0;
}
8) clean up and run test again if needed
9) Clear out any threads that were waiting on a break point at the
point in time the test is ended with get_cont_catch(). This
happens sometimes because breakpoints are used in place of single
stepping and some threads could have been in the debugger exception
handling queue because breakpoints were hit concurrently on
different CPUs. This also means we wait at least one second before
unplumbing the debugger connection at the very end, so as respond
to any debug threads waiting to be serviced.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 486c5987a0 upstream.
The do_fork and sys_open tests have never worked properly on anything
other than a UP configuration with the kgdb test suite. This is
because the test suite did not fully implement the behavior of a real
debugger. A real debugger tracks the state of what thread it asked to
single step and can correctly continue other threads of execution or
conditionally stop while waiting for the original thread single step
request to return.
Below is a simple method to cause a fatal kernel oops with the kgdb
test suite on a 4 processor x86 system:
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
while [ 1 ] ; do ls > /dev/null 2> /dev/null; done&
echo V1I1F1000 > /sys/module/kgdbts/parameters/kgdbts
Very soon after starting the test the kernel will oops with a message like:
kgdbts: BP mismatch 3b7da66480 expected ffffffff8106a590
WARNING: at drivers/misc/kgdbts.c:303 check_and_rewind_pc+0xe0/0x100()
Call Trace:
[<ffffffff812994a0>] check_and_rewind_pc+0xe0/0x100
[<ffffffff81298945>] validate_simple_test+0x25/0xc0
[<ffffffff81298f77>] run_simple_test+0x107/0x2c0
[<ffffffff81298a18>] kgdbts_put_char+0x18/0x20
The warn will turn to a hard kernel crash shortly after that because
the pc will not get properly rewound to the right value after hitting
a breakpoint leading to a hard lockup.
This change is broken up into 2 pieces because archs that have hw
single stepping (2.6.26 and up) need different changes than archs that
do not have hw single stepping (3.0 and up). This change implements
the correct behavior for an arch that supports hw single stepping.
A minor defect was fixed where sys_open should be do_sys_open
for the sys_open break point test. This solves the problem of running
a 64 bit with a 32 bit user space. The sys_open() never gets called
when using the 32 bit file system for the kgdb testsuite because the
32 bit binaries invoke the compat_sys_open() call leading to the test
never completing.
In order to mimic a real debugger, the kgdb test suite now tracks the
most recent thread that was continued (cont_thread_id), with the
intent to single step just this thread. When the response to the
single step request stops in a different thread that hit the original
break point that thread will now get continued, while the debugger
waits for the thread with the single step pending. Here is a high
level description of the sequence of events.
cont_instead_of_sstep = 0;
1) set breakpoint at do_fork
2) continue
3) Save the thread id where we stop to cont_thread_id
4) Remove breakpoint at do_fork
5) Reset the PC if needed depending on kernel exception type
6) if (cont_instead_of_sstep) { continue } else { single step }
7) Check where we stopped
if current thread != cont_thread_id {
cont_instead_of_sstep = 1;
goto step 5
} else {
cont_instead_of_sstep = 0;
}
8) clean up and run test again if needed
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 456ca7ff24 upstream.
On x86 the kgdb test suite will oops when the kernel is compiled with
CONFIG_DEBUG_RODATA and you run the tests after boot time. This is
regression has existed since 2.6.26 by commit: b33cb815 (kgdbts: Use
HW breakpoints with CONFIG_DEBUG_RODATA).
The test suite can use hw breakpoints for all the tests, but it has to
execute the hardware breakpoint specific tests first in order to
determine that the hw breakpoints actually work. Specifically the
very first test causes an oops:
# echo V1I1 > /sys/module/kgdbts/parameters/kgdbts
kgdb: Registered I/O driver kgdbts.
kgdbts:RUN plant and detach test
Entering kdb (current=0xffff880017aa9320, pid 1078) on processor 0 due to Keyboard Entry
[0]kdb> kgdbts: ERROR PUT: end of test buffer on 'plant_and_detach_test' line 1 expected OK got $E14#aa
WARNING: at drivers/misc/kgdbts.c:730 run_simple_test+0x151/0x2c0()
[...oops clipped...]
This commit re-orders the running of the tests and puts the RODATA
check into its own function so as to correctly avoid the kernel oops
by detecting and using the hw breakpoints.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98b54aa1a2 upstream.
There is extra state information that needs to be exposed in the
kgdb_bpt structure for tracking how a breakpoint was installed. The
debug_core only uses the the probe_kernel_write() to install
breakpoints, but this is not enough for all the archs. Some arch such
as x86 need to use text_poke() in order to install a breakpoint into a
read only page.
Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
kgdb_arch_remove_breakpoint() allows other archs to set the type
variable which indicates how the breakpoint was installed.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 927a2f119e upstream.
i915_drm_thaw was not locking the mode_config lock when calling
drm_helper_resume_force_mode. When there were multiple wake sources,
this caused FDI training failure on SNB which in turn corrupted the
display.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62fb376e21 upstream.
mplayer -vo fbdev tries to create a screen that is twice as tall as the
allocated framebuffer for "doublebuffering". By default, and all in-tree
users, only sufficient memory is allocated and mapped to satisfy the
smallest framebuffer and the virtual size is no larger than the actual.
For these users, we should therefore reject any userspace request to
create a screen that requires a buffer larger than the framebuffer
originally allocated.
References: https://bugs.freedesktop.org/show_bug.cgi?id=38138
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d72308bff5 upstream.
Is possible that we will arm the tid_rx->reorder_timer after
del_timer_sync() in ___ieee80211_stop_rx_ba_session(). We need to stop
timer after RCU grace period finish, so move it to
ieee80211_free_tid_rx(). Timer will not be armed again, as
rcu_dereference(sta->ampdu_mlme.tid_rx[tid]) will return NULL.
Debug object detected problem with the following warning:
ODEBUG: free active (active state 0) object type: timer_list hint: sta_rx_agg_reorder_timer_expired+0x0/0xf0 [mac80211]
Bug report (with all warning messages):
https://bugzilla.redhat.com/show_bug.cgi?id=804007
Reported-by: "jan p. springer" <jsd@igroup.org>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cfeba5391 upstream.
On multi-platform kernels, the Mac platform devices should be registered
when running on Mac only. Else it may crash later.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5cb92ac82 upstream.
irq_move_masked_irq() checks the return code of
chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
also a valid return code, which is there to avoid a redundant copy of
the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
the redundant copy, we also fail to adjust the thread affinity of an
eventually threaded interrupt handler.
Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
values correctly by checking the valid return values seperately.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Keping Chen <chenkeping@huawei.com>
Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e80acd1af upstream.
commit 64b3db22c0 (2.6.39),
"Remove use of unreliable FADT revision field" causes regression
for old P4 systems because now cst_control and other fields are
not reset to 0.
The effect is that acpi_processor_power_init will notice
cst_control != 0 and a write to CST_CNT register is performed
that should not happen. As result, the system oopses after the
"No _CST, giving up" message, sometimes in acpi_ns_internalize_name,
sometimes in acpi_ns_get_type, usually at random places. May be
during migration to CPU 1 in acpi_processor_get_throttling.
Every one of these settings help to avoid this problem:
- acpi=off
- processor.nocst=1
- maxcpus=1
The fix is to update acpi_gbl_FADT.header.length after
the original value is used to check for old revisions.
https://bugzilla.kernel.org/show_bug.cgi?id=42700https://bugzilla.redhat.com/show_bug.cgi?id=727865
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e96ada57 upstream.
During testing pci root bus removal, found some root bus bridge is not freed.
If booting with pnpacpi=off, those hostbridge could be freed without problem.
It turns out that some devices reference are not released during acpi_pnp_match.
that match should not hold one device ref during every calling.
Add pu_device calling before returning.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2815ab92ba upstream.
On Intel CPUs the processor typically uses the highest frequency
set by any logical CPU. When the system overheats
Linux first forces the frequency to the lowest available one
to lower the temperature.
However this was done only per logical CPU, which means all
logical CPUs in a package would need to go through this before
the frequency is actually lowered.
Worse this delay actually prevents real throttling, because
the real throttle code only proceeds when the lowest frequency
is already reached.
So when a throttle event happens force the lowest frequency
for all CPUs in the package where it happened. The per CPU
state is now kept per package, not per logical CPU. An alternative
would be to do it per cpufreq unit, but since we want to bring
down the temperature of the complete chip it's better
to do it for all.
In principle it may even make sense to do it for all CPUs,
but I kept it on the package for now.
With this change the frequency is actually lowered, which
in terms also allows real throttling to proceed.
I also removed an unnecessary per cpu variable initialization.
v2: Fix package mapping
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b54f47c8bc upstream.
Using UBI on m25p80 can give messages like:
UBI error: io_init: bad write buffer size 0 for 1 min. I/O unit
We need to initialize writebufsize; I think "page_size" is the correct
"bufsize", although I'm not sure. Comments?
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fcc44a07da upstream.
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
Set writebufsize to 4 because this drivers writes at max 4 bytes at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b604387411 upstream.
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
However, we forgot to set this parameter for block2mtd. Set it to PAGE_SIZE
because this is actually the amount of data we write at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Joern Engel <joern@lazybastard.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4cc625ea5 upstream.
The writebufsize concept was introduce by commit
"0e4ca7e mtd: add writebufsize field to mtd_info struct" and it represents
the maximum amount of data the device writes to the media at a time. This is
an important parameter for UBIFS which is used during recovery and which
basically defines how big a corruption caused by a power cut can be.
Set writebufsize to the flash page size because it is the maximum amount of
data it writes at a time.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 78fb72f793 ]
Make CDC EEM recalculate the hard_mtu after adjusting the
hard_header_len.
Without this, usbnet adjusts the MTU down to 1494 bytes, and the host is
unable to receive standard 1500-byte frames from the device.
Tested with the Linux USB Ethernet gadget.
Cc: Oliver Neukum <oliver@neukum.name>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 81213b5e8a ]
If both addresses equal, nothing needs to be done. If the device is down,
then we simply copy the new address to dev->dev_addr. If the device is up,
then we add another loopback device with the new address, and if that does
not fail, we remove the loopback device with the old address. And only
then, we update the dev->dev_addr.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1d24fb3684 ]
When K >= 0xFFFF0000, AND needs the two least significant bytes of K as
its operand, but EMIT2() gives it the least significant byte of K and
0x2. EMIT() should be used here to replace EMIT2().
Signed-off-by: Feiran Zhuang <zhuangfeiran@ict.ac.cn>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9651e70ad upstream.
Since 3.2.12 and 3.3, some systems are failing to boot with a BUG_ON.
Some other systems using the pata_jmicron driver fail to boot because no
disks are detected. Passing pcie_aspm=force on the kernel command line
works around it.
The cause: commit 4949be1682 ("PCI: ignore pre-1.1 ASPM quirking when
ASPM is disabled") changed the behaviour of pcie_aspm_sanity_check() to
always return 0 if aspm is disabled, in order to avoid cases where we
changed ASPM state on pre-PCIe 1.1 devices.
This skipped the secondary function of pcie_aspm_sanity_check which was
to avoid us enabling ASPM on devices that had non-PCIe children, causing
trouble later on. Move the aspm_disabled check so we continue to honour
that scenario.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=42979 and
http://bugs.debian.org/665420
Reported-by: Romain Francoise <romain@orebokech.com> # kernel panic
Reported-by: Chris Holland <bandidoirlandes@gmail.com> # disk detection trouble
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Tested-by: Hatem Masmoudi <hatem.masmoudi@gmail.com> # Dell Latitude E5520
Tested-by: janek <jan0x6c@gmail.com> # pata_jmicron with JMB362/JMB363
[jn: with more symptoms in log message]
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49d4bcaddc upstream.
When DMA is enabled, sh-sci transfer begins with
uart_start()
sci_start_tx()
if (cookie_tx < 0) schedule_work()
Then, starts DMA when wq scheduled, -- (A)
process_one_work()
work_fn_rx()
cookie_tx = desc->submit_tx()
And finishes when DMA transfer ends, -- (B)
sci_dma_tx_complete()
async_tx_ack()
cookie_tx = -EINVAL
(possible another schedule_work())
This A to B sequence is not reentrant, since controlling variables
(for example, cookie_tx above) are not queues nor lists. So, they
must be invoked as A B A B..., otherwise results in kernel crash.
To ensure the sequence, sci_start_tx() seems to test if cookie_tx < 0
(represents "not used") to call schedule_work().
But cookie_tx will not be set (to a cookie, also means "used") until
in the middle of work queue scheduled function work_fn_tx().
This gap between the test and set allows the breakage of the sequence
under the very frequently call of uart_start().
Another gap between async_tx_ack() and another schedule_work() results
in the same issue, too.
This patch introduces a new condition "cookie_tx == 0" just to mark
it is "busy" and assign it within spin-locked region to fill the gaps.
Signed-off-by: Takashi Yoshii <takashi.yoshii.zj@renesas.com>
Reviewed-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d8d174998 upstream.
There is no point in passing a zero length string here and quite a
few of that cache_parse() implementations will Oops if count is
zero.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1631fcea83 upstream.
<asm-generic/unistd.h> was set up to use sys_sendfile() for the 32-bit
compat API instead of sys_sendfile64(), but in fact the right thing to
do is to use sys_sendfile64() in all cases. The 32-bit sendfile64() API
in glibc uses the sendfile64 syscall, so it has to be capable of doing
full 64-bit operations. But the sys_sendfile() kernel implementation
has a MAX_NON_LFS test in it which explicitly limits the offset to 2^32.
So, we need to use the sys_sendfile64() implementation in the kernel
for this case.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57779dc2b3 upstream.
While running the latest Linux as guest under VMware in highly
over-committed situations, we have seen cases when the refined TSC
algorithm fails to get a valid tsc_start value in
tsc_refine_calibration_work from multiple attempts. As a result the
kernel keeps on scheduling the tsc_irqwork task for later. Subsequently
after several attempts when it gets a valid start value it goes through
the refined calibration and either bails out or uses the new results.
Given that the kernel originally read the TSC frequency from the
platform, which is the best it can get, I don't think there is much
value in refining it.
So for systems which get the TSC frequency from the platform we
should skip the refined tsc algorithm.
We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is
set only on VMware and for Moorestown Penwell both of which have there
own TSC calibration methods.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Dirk Brandewie <dirk.brandewie@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
[jstultz: Reworked to simply not schedule the refining work,
rather then scheduling the work and bombing out later]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit de5b8e8e04 upstream.
If you try to set grace_period or timeout via a module parameter
to lockd, and do this on a big-endian machine where
sizeof(int) != sizeof(unsigned long)
it won't work. This number given will be effectively shifted right
by the difference in those two sizes.
So cast kp->arg properly to get correct result.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1265fd6167 ]
We call the wrong replay notify function when we use ESN replay
handling. This leads to the fact that we don't send notifications
if we use ESN. Fix this by calling the registered callbacks instead
of xfrm_replay_notify().
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5676cc7bfe ]
Some BIOS's don't setup power management correctly (what else is
new) and don't allow use of PCI Express power control. Add a special
exception module parameter to allow working around this issue.
Based on slightly different patch by Knut Petersen.
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2a2a459eee ]
napi->skb is allocated in napi_get_frags() using
netdev_alloc_skb_ip_align(), with a reserve of NET_SKB_PAD +
NET_IP_ALIGN bytes.
However, when such skb is recycled in napi_reuse_skb(), it ends with a
reserve of NET_IP_ALIGN which is suboptimal.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 94f826b807 ]
Commit f2c31e32b3 (net: fix NULL dereferences in check_peer_redir() )
added a regression in rt6_fill_node(), leading to rcu_read_lock()
imbalance.
Thats because NLA_PUT() can make a jump to nla_put_failure label.
Fix this by using nla_put()
Many thanks to Ben Greear for his help
Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dc72d99dab ]
Matt Evans spotted that x86 bpf_jit was incorrectly handling negative
constant offsets in BPF_S_LDX_B_MSH instruction.
We need to abort JIT compilation like we do in common_load so that
filter uses the interpreter code and can call __load_pointer()
Reference: http://lists.openwall.net/netdev/2011/07/19/11
Thanks to Indan Zupancic to bring back this issue.
Reported-by: Matt Evans <matt@ozlabs.org>
Reported-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bbdb32cb5b ]
While testing L2TP functionality, I came across a bug in getsockname(). The
IP address returned within the pppol2tp_addr's addr memember was not being
set to the IP address in use. This bug is caused by using inet_sk() on the
wrong socket (the L2TP socket rather than the underlying UDP socket), and was
likely introduced during the addition of L2TPv3 support.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fa016a0b5 upstream.
Looking at hibernate overwriting I though it looked like a cursor,
so I tracked down this missing piece to stop the cursor blink
timer. I've no idea if this is sufficient to fix the hibernate
problems people are seeing, but please test it.
Both radeon and nouveau have done this for a long time.
I've run this personally all night hib/resume cycles with no fails.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reported-by: Petr Tesarik <kernel@tesarici.cz>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Reported-by: Lots of misc segfaults after hibernate across the world.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=37142
Tested-by: Dave Airlie <airlied@redhat.com>
Tested-by: Bojan Smojver <bojan@rexursive.com>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fa0fb93f2a upstream.
For high-speed/super-speed isochronous endpoints, the bInterval
value is used as exponent, 2^(bInterval-1). Luckily we have
usb_fill_int_urb() function that handles it correctly. So we just
call this function to fill in the RX URB.
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f946eeb931 upstream.
Module size was limited to 64MB, this was legacy limitation due to vmalloc()
which was removed a while ago.
Limiting module size to 64MB is both pointless and affects real world use
cases.
Cc: Tim Abbott <tim.abbott@oracle.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66c4c35c6b upstream.
sysfs_slab_add() calls various sysfs functions that actually may
end up in userspace doing all sorts of things.
Release the slub_lock after adding the kmem_cache structure to the list.
At that point the address of the kmem_cache is not known so we are
guaranteed exlusive access to the following modifications to the
kmem_cache structure.
If the sysfs_slab_add fails then reacquire the slub_lock to
remove the kmem_cache structure from the list.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d97d32edcd upstream.
When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
agi = XFS_BUF_TO_AGI(agibp);
Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72c6e7afc4 upstream.
Always set io->error to -EIO when an error is detected in dm-crypt.
There were cases where an error code would be set only if we finish
processing the last sector. If there were other encryption operations in
flight, the error would be ignored and bio would be returned with
success as if no error happened.
This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
and kcryptd_async_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aeb2deae26 upstream.
This patch fixes a possible deadlock in dm-crypt's mempool use.
Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
cannot fail and waits until the mempool is refilled). Further pages are
allocated with different gfp flags that allow failing.
Because allocations may be done in parallel, this code can deadlock. Example:
There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
run simultaneously.
It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
pages. The mempool is exhausted. Each process waits for more pages to be freed
to the mempool, which never happens.
To avoid this deadlock scenario, this patch changes the code so that only
the first page is allocated with non-failing gfp mask. Allocation of further
pages may fail.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0391a3ae9 upstream.
udf_release_file() can be called from munmap() path with mmap_sem held. Thus
we cannot take i_mutex there because that ranks above mmap_sem. Luckily,
i_mutex is not needed in udf_release_file() anymore since protection by
i_data_sem is enough to protect from races with write and truncate.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b18dafc86b upstream.
In d_materialise_unique() there are 3 subcases to the 'aliased dentry'
case; in two subcases the inode i_lock is properly released but this
does not occur in the -ELOOP subcase.
This seems to have been introduced by commit 1836750115 ("fix loop
checks in d_materialise_unique()").
Signed-off-by: Michel Lespinasse <walken@google.com>
[ Added a comment, and moved the unlock to where we generate the -ELOOP,
which seems to be more natural.
You probably can't actually trigger this without a buggy network file
server - d_materialize_unique() is for finding aliases on non-local
filesystems, and the d_ancestor() case is for a hardlinked directory
loop.
But we should be robust in the case of such buggy servers anyway. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31d4f3a2f3 upstream.
Explicitly test for an extent whose length is zero, and flag that as a
corrupted extent.
This avoids a kernel BUG_ON assertion failure.
Tested: Without this patch, the file system image found in
tests/f_ext_zero_len/image.gz in the latest e2fsprogs sources causes a
kernel panic. With this patch, an ext4 file system error is noted
instead, and the file system is marked as being corrupted.
https://bugzilla.kernel.org/show_bug.cgi?id=42859
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d2b158262 upstream.
Ext4 does not support data journalling with delayed allocation enabled.
We even do not allow to mount the file system with delayed allocation
and data journalling enabled, however it can be set via FS_IOC_SETFLAGS
so we can hit the inode with EXT4_INODE_JOURNAL_DATA set even on file
system mounted with delayed allocation (default) and that's where
problem arises. The easies way to reproduce this problem is with the
following set of commands:
mkfs.ext4 /dev/sdd
mount /dev/sdd /mnt/test1
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4
chattr +j /mnt/test1/file
dd if=/dev/zero of=/mnt/test1/file bs=1M count=4 conv=notrunc
chattr -j /mnt/test1/file
Additionally it can be reproduced quite reliably with xfstests 272 and
269. In fact the above reproducer is a part of test 272.
To fix this we should ignore the EXT4_INODE_JOURNAL_DATA inode flag if
the file system is mounted with delayed allocation. This can be easily
done by fixing ext4_should_*_data() functions do ignore data journal
flag when delalloc is set (suggested by Ted). We also have to set the
appropriate address space operations for the inode (again, ignoring data
journal flag if delalloc enabled).
Additionally this commit introduces ext4_inode_journal_mode() function
because ext4_should_*_data() has already had a lot of common code and
this change is putting it all into one function so it is easier to
read.
Successfully tested with xfstests in following configurations:
delalloc + data=ordered
delalloc + data=writeback
data=journal
nodelalloc + data=ordered
nodelalloc + data=writeback
nodelalloc + data=journal
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15291164b2 upstream.
journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.
This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.
Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf
and 189e868fa8 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dccaf33fa3 upstream.
(backported to 3.0 by mjt)
There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.
This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05b4877f6a upstream.
If create_basic_memory_bitmaps() fails, usermodehelpers are not re-enabled
before returning. Fix this. And while at it, reword the goto labels so that
they look more meaningful.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ab2393fc3 upstream.
The D1F5 revision of the WinTV HVR-1900 uses a tda18271c2 tuner
instead of a tda18271c1 tuner as used in revision D1E9. To
account for this, we must hardcode the frontend configuration
to use the same IF frequency configuration for both revisions
of the device.
6MHz DVB-T is unaffected by this issue, as the recommended
IF Frequency configuration for 6MHz DVB-T is the same on both
c1 and c2 revisions of the tda18271 tuner.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Cc: Mike Isely <isely@pobox.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34817174fc upstream.
The error handling in lgdt3303_read_status() and lgdt330x_read_ucblocks()
doesn't work, because i2c_read_demod_bytes() returns a u8 and (err < 0)
is always false.
err = i2c_read_demod_bytes(state, 0x58, buf, 1);
if (err < 0)
return err;
Change the return type of i2c_read_demod_bytes() to int. Also change
the return value on error to -EIO to make (err < 0) work.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b26c9b334 upstream.
The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace. Keeping that network namespace from being freed
when the last user goes away. Leaving things like vlan devices in the
leaked network namespace.
If you use ip netns add for much real work this problem becomes apparent
pretty quickly. It light testing the problem hides because frequently
you simply don't notice the leak.
Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.
This issue exists back to 3.0.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29a2e2836f upstream.
The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task
returns from a system call with a pending signal.
A real-life scenario is a child of 'khelper' returning from a failed
kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
kernel_execve() fails due to a pending SIGKILL, which is the result of
"kill -9 -1" (at least, busybox's init does it upon reboot).
The loop is as follows:
* syscall_exit_work:
- work_pending: // start_of_the_loop
- work_notify_sig:
- do_notify_resume()
- do_signal()
- if (!user_mode(regs)) return;
- resume_userspace // TIF_SIGPENDING is still set
- work_pending // so we call work_pending => goto
// start_of_the_loop
More information can be found in another LKML thread:
http://www.serverphorums.com/read.php?12,457826
[1] the problem was also seen on MIPS.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d5440a835 upstream.
URB unlinking is always racing with its completion and tx_complete
may be called before or during running usb_unlink_urb, so tx_complete
must not clear urb->dev since it will be used in unlink path,
otherwise invalid memory accesses or usb device leak may be caused
inside usb_unlink_urb.
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0956a8c20b upstream.
Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock, but it makes usb_unlink_urb
racing with defer_bh, and the URB to being unlinked may be freed before
or during calling usb_unlink_urb, so use-after-free problem may be
triggerd inside usb_unlink_urb.
The patch fixes the use-after-free problem by increasing URB
reference count with skb queue lock held before calling
usb_unlink_urb, so the URB won't be freed until return from
usb_unlink_urb.
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oliver@neukum.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 540a0f7584 upstream.
The problem is that for the case of priority queues, we
have to assume that __rpc_remove_wait_queue_priority will move new
elements from the tk_wait.links lists into the queue->tasks[] list.
We therefore cannot use list_for_each_entry_safe() on queue->tasks[],
since that will skip these new tasks that __rpc_remove_wait_queue_priority
is adding.
Without this fix, rpc_wake_up and rpc_wake_up_status will both fail
to wake up all functions on priority wait queues, which can result
in some nasty hangs.
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7eb3aa6585 upstream.
The 'find_wl_entry()' function expects the maximum difference as the second
argument, not the maximum absolute value. So the "unknown" eraseblock picking
was incorrect, as Shmulik Ladkani spotted. This patch fixes the issue.
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a29852be49 upstream.
Two bad things can happen in ubi_scan():
1. If kmem_cache_create() fails we jump to out_si and call
ubi_scan_destroy_si() which calls kmem_cache_destroy().
But si->scan_leb_slab is NULL.
2. If process_eb() fails we jump to out_vidh, call
kmem_cache_destroy() and ubi_scan_destroy_si() which calls
again kmem_cache_destroy().
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1daaae8fa4 upstream.
This patch fixes an issue when cifs_mount receives a
STATUS_BAD_NETWORK_NAME error during cifs_get_tcon but is able to
continue after an DFS ROOT referral. In this case, the return code
variable is not reset prior to trying to mount from the system referred
to. Thus, is_path_accessible is not executed and the final DFS referral
is not performed causing a mount error.
Use case: In DNS, example.com resolves to the secondary AD server
ad2.example.com Our primary domain controller is ad1.example.com and has
a DFS redirection set up from \\ad1\share\Users to \\files\share\Users.
Mounting \\example.com\share\Users fails.
Regression introduced by commit 724d9f1.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru
Signed-off-by: Thomas Hadig <thomas@intapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f30d500f80 upstream.
When we get concurrent lookups of the same inode that is not in the
per-AG inode cache, there is a race condition that triggers warnings
in unlock_new_inode() indicating that we are initialising an inode
that isn't in a the correct state for a new inode.
When we do an inode lookup via a file handle or a bulkstat, we don't
serialise lookups at a higher level through the dentry cache (i.e.
pathless lookup), and so we can get concurrent lookups of the same
inode.
The race condition is between the insertion of the inode into the
cache in the case of a cache miss and a concurrently lookup:
Thread 1 Thread 2
xfs_iget()
xfs_iget_cache_miss()
xfs_iread()
lock radix tree
radix_tree_insert()
rcu_read_lock
radix_tree_lookup
lock inode flags
XFS_INEW not set
igrab()
unlock inode flags
rcu_read_unlock
use uninitialised inode
.....
lock inode flags
set XFS_INEW
unlock inode flags
unlock radix tree
xfs_setup_inode()
inode flags = I_NEW
unlock_new_inode()
WARNING as inode flags != I_NEW
This can lead to inode corruption, inode list corruption, etc, and
is generally a bad thing to occur.
Fix this by setting XFS_INEW before inserting the inode into the
radix tree. This will ensure any concurrent lookup will find the new
inode with XFS_INEW set and that forces the lookup to wait until the
XFS_INEW flag is removed before allowing the lookup to succeed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3114ea7a24 upstream.
If a setattr() fails because of an NFS4ERR_OPENMODE error, it is
probably due to us holding a read delegation. Ensure that the
recovery routines return that delegation in this case.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a1d0b5eebc upstream.
If we know that the delegation stateid is bad or revoked, we need to
remove that delegation as soon as possible, and then mark all the
stateids that relied on that delegation for recovery. We cannot use
the delegation as part of the recovery process.
Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED)
to indicate that the delegation was revoked.
Finally, ensure that setlk() and setattr() can both recover safely from
a revoked delegation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2226fc9e8 upstream.
On hosts without this patch, 32bit guests will crash (and 64bit guests
may behave in a wrong way) for example by simply executing following
nasm-demo-application:
[bits 32]
global _start
SECTION .text
_start: syscall
(I tested it with winxp and linux - both always crashed)
Disassembly of section .text:
00000000 <_start>:
0: 0f 05 syscall
The reason seems a missing "invalid opcode"-trap (int6) for the
syscall opcode "0f05", which is not available on Intel CPUs
within non-longmodes, as also on some AMD CPUs within legacy-mode.
(depending on CPU vendor, MSR_EFER and cpuid)
Because previous mentioned OSs may not engage corresponding
syscall target-registers (STAR, LSTAR, CSTAR), they remain
NULL and (non trapping) syscalls are leading to multiple
faults and finally crashs.
Depending on the architecture (AMD or Intel) pretended by
guests, various checks according to vendor's documentation
are implemented to overcome the current issue and behave
like the CPUs physical counterparts.
[mtosatti: cleanup/beautify code]
Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bdb42f5afe upstream.
In order to be able to proceed checks on CPU-specific properties
within the emulator, function "get_cpuid" is introduced.
With "get_cpuid" it is possible to virtually call the guests
"cpuid"-opcode without changing the VM's context.
[mtosatti: cleanup/beautify code]
Signed-off-by: Stephan Baerwolf <stephan.baerwolf@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c0efbacab upstream.
handle_ir_buffer_fill() assumed that a completed descriptor would be
indicated by a non-zero transfer_status (as in most other descriptors).
However, this field is written by the controller as soon as (the end of)
the first packet has been written into the buffer. As a consequence, if
we happen to run into such a descriptor when the interrupt handler is
executed after such a packet has completed, the descriptor would be
taken out of the list of active descriptors as soon as the buffer had
been partially filled, so the event for the buffer being completely
filled would never be sent.
To fix this, handle descriptors only when they have been completely
filled, i.e., when res_count == 0. (This also matches the condition
that is reported by the controller with an interrupt.)
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9716387311 upstream.
According to the HT6560H datasheet, the recovery timing field is 4-bit wide,
with a value of 0 meaning 16 cycles. Correct obvious thinko in the recovery
field mask.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c30d5a532 upstream.
Add support for the camera key. The hotkey for
Asus S.H.E(Super Hybrid Engine) mode is mapped to KEY_KEY_PROG1
just for notifying the userspace.
Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3596bb929f upstream.
The Asus All-In-One PC has a wireless keyboard with wifi toggle,
brightness up, brightness down and display off hotkeys.
This patch adds suppoort for these hotkeys.
Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33395fb8a1 upstream.
The old code did (MSB << 8) & 0xff, which always evaluates to 0. Just use
get_unaligned_be16() so we don't have to worry about whether our open-coded
version is correct or not.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit effc6cc882 upstream.
SPC-4 says about the WBUS16 and SYNC bits:
The meanings of these fields are specific to SPI-5 (see 6.4.3).
For SCSI transport protocols other than the SCSI Parallel
Interface, these fields are reserved.
We don't have a SPI fabric module, so we should never set these bits.
(The comment was misleading, since it only mentioned Sync but the
actual code set WBUS16 too).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6b42dcb99 upstream.
If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10. The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list. A resync request then starts
which will wait for those requests and block new IO.
But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.
So allow that second request to proceed even though there is
a resync request about to start.
This is suitable for any -stable kernel.
Reported-by: Ray Morris <support@bettercgi.com>
Tested-by: Ray Morris <support@bettercgi.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4474ca42e2 upstream.
When commit 69e51b449d (md/bitmap: separate out loading a bitmap...)
created bitmap_load, it missed calling it after bitmap_create when a
bitmap is created through the sysfs interface.
So if a bitmap is added this way, we don't allocate memory properly
and can crash.
This is suitable for any -stable release since 2.6.35.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 031ed4d565 upstream.
This patch fixes a bug in tcm_fc where fc_exch memory from fc_exch_mgr->ep_pool
is currently being leaked by ft_send_resp_status() usage. Following current
code in ft_queue_status() response path, using lport->tt.seq_send() needs to be
followed by a lport->tt.exch_done() in order to release fc_exch memory back into
libfc_em kmem_cache.
ft_send_resp_status() code is currently used in pre submit se_cmd ft_send_work()
error exceptions, TM request setup exceptions, and main TM response callback
path in ft_queue_tm_resp(). This bugfix addresses the leak in these cases.
Cc: Mark D Rustad <mark.d.rustad@intel.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: Robert Love <robert.w.love@intel.com>
Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce880cb860 upstream.
The USB graphics card driver delays the unregistering of the framebuffer
device to a workqueue, which breaks the userspace visible remove uevent
sequence. Recent userspace tools started to support USB graphics card
hotplug out-of-the-box and rely on proper events sent by the kernel.
The framebuffer device is a direct child of the USB interface which is
removed immediately after the USB .disconnect() callback. But the fb device
in /sys stays around until its final cleanup, at a time where all the parent
devices have been removed already.
To work around that, we remove the sysfs fb device directly in the USB
.disconnect() callback and leave only the cleanup of the internal fb
data to the delayed work.
Before:
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb)
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb)
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb0 (graphics)
remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb)
remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb)
remove /2-1.2:1.0/graphics/fb0 (graphics)
After:
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb)
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb)
add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb1 (graphics)
remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb1 (graphics)
remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb)
remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb)
Tested-by: Bernie Thompson <bernie@plugable.com>
Acked-by: Bernie Thompson <bernie@plugable.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cf3fa6918 upstream.
If the target core signals an over- or under-run, tcm_loop should call
scsi_set_resid() to tell the SCSI midlayer about the residual data length.
The difference can be seen by doing something like
strace -eioctl sg_raw -r 1024 /dev/sda 8 0 0 0 1 0 > /dev/null
and looking at the "resid=" part of the SG_IO ioctl -- after this patch,
the field is correctly reported as 512.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 273b72c8ce upstream.
PXA's SSP engine fails to take its current channel phase into account
when enabling a stream while the engine is already running. This
results in randomly swapped left/right channels on either the record
or the playback side, depending on which one was enabled first.
The following patch fixes this by factoring out the bit field
modifications in question to a separate function that pauses the
engine temporarily, modifies the bits and kicks it off again
afterwards. Appearantly, a transition of SSCR0_SSE syncs both
directions properly.
The patch has been rolled out to quite a number of devices over the
last weeks and seems to fix the issue reliably.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Sven Neumann <s.neumann@raumfeld.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a05b0855fd upstream.
Taking i_mutex in hugetlbfs_read() can result in deadlock with mmap as
explained below
Thread A:
read() on hugetlbfs
hugetlbfs_read() called
i_mutex grabbed
hugetlbfs_read_actor() called
__copy_to_user() called
page fault is triggered
Thread B, sharing address space with A:
mmap() the same file
->mmap_sem is grabbed on task_B->mm->mmap_sem
hugetlbfs_file_mmap() is called
attempt to grab ->i_mutex and block waiting for A to give it up
Thread A:
pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
which happens to be the same thing as task_B->mm->mmap_sem. Block waiting
for B to give it up.
AFAIU the i_mutex locking was added to hugetlbfs_read() as per
http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html to take
care of the race between truncate and read. This patch fixes this by
looking at page->mapping under lock_page() (find_lock_page()) to ensure
that the inode didn't get truncated in the range during a parallel read.
Ideally we can extend the patch to make sure we don't increase i_size in
mmap. But that will break userspace, because applications will now have
to use truncate(2) to increase i_size in hugetlbfs.
Based on the original patch from Hillf Danton.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5bf18fa22 upstream.
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory
Overcommit) on powerpc, we tripped the following:
kernel BUG at mm/bootmem.c:483!
cpu 0x0: Vector: 700 (Program Check) at [c000000000c03940]
pc: c000000000a62bd8: .alloc_bootmem_core+0x90/0x39c
lr: c000000000a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c
sp: c000000000c03bc0
msr: 8000000000021032
current = 0xc000000000b0cce0
paca = 0xc000000001d80000
pid = 0, comm = swapper
kernel BUG at mm/bootmem.c:483!
enter ? for help
[c000000000c03c80] c000000000a64bcc
.sparse_early_usemaps_alloc_node+0x84/0x29c
[c000000000c03d50] c000000000a64f10 .sparse_init+0x12c/0x28c
[c000000000c03e20] c000000000a474f4 .setup_arch+0x20c/0x294
[c000000000c03ee0] c000000000a4079c .start_kernel+0xb4/0x460
[c000000000c03f90] c000000000009670 .start_here_common+0x1c/0x2c
This is
BUG_ON(limit && goal + size > limit);
and after some debugging, it seems that
goal = 0x7ffff000000
limit = 0x80000000000
and sparse_early_usemaps_alloc_node ->
sparse_early_usemaps_alloc_pgdat_section calls
return alloc_bootmem_section(usemap_size() * count, section_nr);
This is on a system with 8TB available via the AMS pool, and as a quirk
of AMS in firmware, all of that memory shows up in node 0. So, we end
up with an allocation that will fail the goal/limit constraints.
In theory, we could "fall-back" to alloc_bootmem_node() in
sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE
defined, we'll BUG_ON() instead. A simple solution appears to be to
unconditionally remove the limit condition in alloc_bootmem_section,
meaning allocations are allowed to cross section boundaries (necessary
for systems of this size).
Johannes Weiner pointed out that if alloc_bootmem_section() no longer
guarantees section-locality, we need check_usemap_section_nr() to print
possible cross-dependencies between node descriptors and the usemaps
allocated through it. That makes the two loops in
sparse_early_usemaps_alloc_node() identical, so re-factor the code a
bit.
[akpm@linux-foundation.org: code simplification]
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a5a9906d4 upstream.
In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode. In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.
It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds). The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().
Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously. This is
probably why it wasn't common to run into this. For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.
Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).
The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value. Even if the real pmd is changing under the
value we hold on the stack, we don't care. If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).
All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd. The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds). I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).
if (pmd_trans_huge(*pmd)) {
if (next-addr != HPAGE_PMD_SIZE) {
VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
split_huge_page_pmd(vma->vm_mm, pmd);
} else if (zap_huge_pmd(tlb, vma, pmd, addr))
continue;
/* fall through */
}
if (pmd_none_or_clear_bad(pmd))
Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.
The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.
====== start quote =======
mapcount 0 page_mapcount 1
kernel BUG at mm/huge_memory.c:1384!
At some point prior to the panic, a "bad pmd ..." message similar to the
following is logged on the console:
mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).
The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
the page's PMD table entry.
143 void pmd_clear_bad(pmd_t *pmd)
144 {
-> 145 pmd_ERROR(*pmd);
146 pmd_clear(pmd);
147 }
After the PMD table entry has been cleared, there is an inconsistency
between the actual number of PMD table entries that are mapping the page
and the page's map count (_mapcount field in struct page). When the page
is subsequently reclaimed, __split_huge_page() detects this inconsistency.
1381 if (mapcount != page_mapcount(page))
1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
1383 mapcount, page_mapcount(page));
-> 1384 BUG_ON(mapcount != page_mapcount(page));
The root cause of the problem is a race of two threads in a multithreaded
process. Thread B incurs a page fault on a virtual address that has never
been accessed (PMD entry is zero) while Thread A is executing an madvise()
system call on a virtual address within the same 2 MB (huge page) range.
virtual address space
.---------------------.
| |
| |
.-|---------------------|
| | |
| | |<-- B(fault)
| | |
2 MB | |/////////////////////|-.
huge < |/////////////////////| > A(range)
page | |/////////////////////|-'
| | |
| | |
'-|---------------------|
| |
| |
'---------------------'
- Thread A is executing an madvise(..., MADV_DONTNEED) system call
on the virtual address range "A(range)" shown in the picture.
sys_madvise
// Acquire the semaphore in shared mode.
down_read(¤t->mm->mmap_sem)
...
madvise_vma
switch (behavior)
case MADV_DONTNEED:
madvise_dontneed
zap_page_range
unmap_vmas
unmap_page_range
zap_pud_range
zap_pmd_range
//
// Assume that this huge page has never been accessed.
// I.e. content of the PMD entry is zero (not mapped).
//
if (pmd_trans_huge(*pmd)) {
// We don't get here due to the above assumption.
}
//
// Assume that Thread B incurred a page fault and
.---------> // sneaks in here as shown below.
| //
| if (pmd_none_or_clear_bad(pmd))
| {
| if (unlikely(pmd_bad(*pmd)))
| pmd_clear_bad
| {
| pmd_ERROR
| // Log "bad pmd ..." message here.
| pmd_clear
| // Clear the page's PMD entry.
| // Thread B incremented the map count
| // in page_add_new_anon_rmap(), but
| // now the page is no longer mapped
| // by a PMD entry (-> inconsistency).
| }
| }
|
v
- Thread B is handling a page fault on virtual address "B(fault)" shown
in the picture.
...
do_page_fault
__do_page_fault
// Acquire the semaphore in shared mode.
down_read_trylock(&mm->mmap_sem)
...
handle_mm_fault
if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
// We get here due to the above assumption (PMD entry is zero).
do_huge_pmd_anonymous_page
alloc_hugepage_vma
// Allocate a new transparent huge page here.
...
__do_huge_pmd_anonymous_page
...
spin_lock(&mm->page_table_lock)
...
page_add_new_anon_rmap
// Here we increment the page's map count (starts at -1).
atomic_set(&page->_mapcount, 0)
set_pmd_at
// Here we set the page's PMD entry which will be cleared
// when Thread A calls pmd_clear_bad().
...
spin_unlock(&mm->page_table_lock)
The mmap_sem does not prevent the race because both threads are acquiring
it in shared mode (down_read). Thread B holds the page_table_lock while
the page's map count and PMD table entry are updated. However, Thread A
does not synchronize on that lock.
====== end quote =======
[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e984e2c2 upstream.
An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
iSCSI session as fully operative. This means that there is window
where there are no posted receive buffers on the initiator side, so
it's possible for the iSER RC connection to break because of RNR NAK /
retry errors. To fix this, rely on the flags bits in the login
request to have FFP (0x3) in the lower nibble as a marker for the
final login request, and post an initial chunk of receive buffers
before sending that login request instead of after getting the login
response.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41c7f74242 upstream.
Currently, the RTC code does not disable the alarm in the hardware.
This means that after a sequence such as the one below (the files are in the
RTC sysfs), the box will boot up after 2 minutes even though we've
asked for the alarm to be turned off.
# echo $((`cat since_epoch`)+120) > wakealarm
# echo 0 > wakealarm
# poweroff
Fix this by disabling the alarm when there are no timers to run.
The original version of this patch was reverted. This version
disables the irq directly instead of setting a disabled timer
in the future.
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
[Merged in the second revision from Rabin]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a09b659cd6 upstream.
In 2008, commit 0c5d1eb77a ("genirq: record trigger type") modified the
way set_irq_type() handles the 'no trigger' condition. However, this has
an adverse effect on PCMCIA support on Intel StrongARM and probably PXA
platforms.
PCMCIA has several status signals on the socket which can trigger
interrupts; some of these status signals depend on the card's mode
(whether it is configured in memory or IO mode). For example, cards have
a 'Ready/IRQ' signal: in memory mode, this provides an indication to
PCMCIA that the card has finished its power up initialization. In IO
mode, it provides the device interrupt signal. Other status signals
switch between on-board battery status and loud speaker output.
In classical PCMCIA implementations, where you have a specific socket
controller, the controller provides a method to mask interrupts from the
socket, and importantly ignore any state transitions on the pins which
correspond with interrupts once masked. This masking prevents unwanted
events caused by the removal and application of socket power being
forwarded.
However, on platforms where there is no socket controller, the PCMCIA
status and interrupt signals are routed to standard edge-triggered GPIOs.
These GPIOs can be configured to interrupt on rising edge, falling edge,
or never. This is where the problems start.
Edge triggered interrupts are required to record events while disabled via
the usual methods of {free,request,disable,enable}_irq() to prevent
problems with dropped interrupts (eg, the 8390 driver uses disable_irq()
to defer the delivery of interrupts). As a result, these interfaces can
not be used to implement the desired behaviour.
The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via
disable_irq() on suspend, and enabled via enable_irq() after resume, we
will record the state transitions caused by powering events as valid
interrupts, and foward them to the card driver, which may attempt to
access a card which is not powered up.
This leads delays resume while drivers spin in their interrupt handlers,
and complaints from drivers before they realize what's happened.
Moreover, in the case of the 'Ready/IRQ' signal, this is requested and
freed by the card driver itself; the PCMCIA core has no idea whether the
interrupt is requested, and, therefore, whether a call to disable_irq()
would be valid. (We tried this around 2.4.17 / 2.5.1 kernel era, and
ended up throwing it out because of this problem.)
Therefore, it was decided back in around 2002 to disable the edge
triggering instead, resulting in all state transitions on the GPIO being
ignored. That's what we actually need the hardware to do.
The commit above changes this behaviour; it explicitly prevents the 'no
trigger' state being selected.
The reason that request_irq() does not accept the 'no trigger' state is
for compatibility with existing drivers which do not provide their desired
triggering configuration. The set_irq_type() function is 'new' and not
used by non-trigger aware drivers.
Therefore, revert this change, and restore previously working platforms
back to their former state.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: linux@arm.linux.org.uk
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b60a18da3 upstream.
The queue handling in the udev daemon assumes that the events are
ordered.
Before this patch uevent_seqnum is incremented under sequence_lock,
than an event is send uner uevent_sock_mutex. I want to say that code
contained a window between incrementing seqnum and sending an event.
This patch locks uevent_sock_mutex before incrementing uevent_seqnum.
v2: delete sequence_lock, uevent_seqnum is protected by uevent_sock_mutex
v3: unlock the mutex before the goto exit
Thanks for Kay for the comments.
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Tested-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9b89e2567 upstream.
Driver rtl8192ce when used with the RTL8188CE device would start at about
20 Mbps on a 54 Mbps connection, but quickly drop to 1 Mbps. One of the
symptoms is that the AP would need to retransmit each packet 4 of 5 times
before the driver would acknowledge it. Recovery is possible only by
unloading and reloading the driver. This problem was reported at
https://bugzilla.redhat.com/show_bug.cgi?id=770207.
The problem is due to a missing update of the gain setting.
Signed-off-by: Jingjun Wu <jingjun_wu@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 093ea2d3a7 upstream.
A MCS7820 device supports two serial ports and a MCS7840 device supports
four serial ports. Both devices use the same driver, but the attach function
in driver was unable to correctly handle the port numbers for MCS7820
device. This problem has been fixed in this patch and this fix has been
verified on x86 Linux kernel 3.2.9 with both MCS7820 and MCS7840 devices.
Signed-off-by: Donald Lee <donald@asix.com.tw>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5360a53a7 upstream.
This patch updates the cp210x driver to support CP210x multiple
interface devices devices from Silicon Labs. The existing driver
always sends control requests to interface 0, which is hardcoded in
the usb_control_msg function calls. This only allows for single
interface devices to be used, and causes a bug when using ports on an
interface other than 0 in the multiple interface devices.
Here are the changes included in this patch:
- Updated the device list to contain the Silicon Labs factory default
VID/PID for multiple interface CP210x devices
- Created a cp210x_port_private struct created for each port on
startup, this struct holds the interface number
- Added a cp210x_release function to clean up the cp210x_port_private
memory created on startup
- Modified usb_get_config and usb_set_config to get a pointer to the
cp210x_port_private struct, and use the interface number there in the
usb_control_message wIndex param
Signed-off-by: Preston Fick <preston.fick@silabs.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d161b99f8 upstream.
This patch adds new device IDs to the ftdi_sio module to support
the new Sealevel SeaLINK+8 2038-ROHS device.
Signed-off-by: Scott Dial <scott.dial@scientiallc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c192c8e71a upstream.
Gobi 1000 devices have a different port layout, which wasn't respected
by the current driver, and thus it grabbed the QMI/net port. In the
near future we'll be attaching another driver to the QMI/net port for
these devices (cdc-wdm and qmi_wwan) so make sure the qcserial driver
doesn't claim them. This patch also prevents qcserial from binding to
interfaces 0 and 1 on 1K devices because those interfaces do not
respond.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e90fc3cb08 upstream.
When build i.mx platform with imx_v6_v7_defconfig, and after adding
USB Gadget support, it has below build error:
CC drivers/usb/host/fsl-mph-dr-of.o
drivers/usb/host/fsl-mph-dr-of.c: In function 'fsl_usb2_device_register':
drivers/usb/host/fsl-mph-dr-of.c:97: error: 'struct pdev_archdata'
has no member named 'dma_mask'
It has discussed at: http://www.spinics.net/lists/linux-usb/msg57302.html
For PowerPC, there is dma_mask at struct pdev_archdata, but there is
no dma_mask at struct pdev_archdata for ARM. The pdev_archdata is
related to specific platform, it should NOT be accessed by
cross platform drivers, like USB.
The code for pdev_archdata should be useless, as for PowerPC,
it has already gotten the value for pdev->dev.dma_mask at function
arch_setup_pdev_archdata of arch/powerpc/kernel/setup-common.c.
Tested-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5cc5ed866 upstream.
When loading g_ether gadget, there is below message:
Backtrace:
[<80012248>] (dump_backtrace+0x0/0x10c) from [<803cb42c>] (dump_stack+0x18/0x1c)
r7:00000000 r6:80512000 r5:8052bef8 r4:80513f30
[<803cb414>] (dump_stack+0x0/0x1c) from [<8000feb4>] (show_regs+0x44/0x50)
[<8000fe70>] (show_regs+0x0/0x50) from [<8004c840>] (__schedule_bug+0x68/0x84)
r5:8052bef8 r4:80513f30
[<8004c7d8>] (__schedule_bug+0x0/0x84) from [<803cd0e4>] (__schedule+0x4b0/0x528)
r5:8052bef8 r4:809aad00
[<803ccc34>] (__schedule+0x0/0x528) from [<803cd214>] (_cond_resched+0x44/0x58)
[<803cd1d0>] (_cond_resched+0x0/0x58) from [<800a9488>] (dma_pool_alloc+0x184/0x250)
r5:9f9b4000 r4:9fb4fb80
[<800a9304>] (dma_pool_alloc+0x0/0x250) from [<802a8ad8>] (fsl_req_to_dtd+0xac/0x180)
[<802a8a2c>] (fsl_req_to_dtd+0x0/0x180) from [<802a8ce4>] (fsl_ep_queue+0x138/0x274)
[<802a8bac>] (fsl_ep_queue+0x0/0x274) from [<7f004328>] (composite_setup+0x2d4/0xfac [g_ether])
[<7f004054>] (composite_setup+0x0/0xfac [g_ether]) from [<802a9bb4>] (fsl_udc_irq+0x8dc/0xd38)
[<802a92d8>] (fsl_udc_irq+0x0/0xd38) from [<800704f8>] (handle_irq_event_percpu+0x54/0x188)
[<800704a4>] (handle_irq_event_percpu+0x0/0x188) from [<80070674>] (handle_irq_event+0x48/0x68)
[<8007062c>] (handle_irq_event+0x0/0x68) from [<800738ec>] (handle_level_irq+0xb4/0x138)
r5:80514f94 r4:80514f40
[<80073838>] (handle_level_irq+0x0/0x138) from [<8006ffa4>] (generic_handle_irq+0x38/0x44)
r7:00000012 r6:80510b1c r5:80529860 r4:80512000
[<8006ff6c>] (generic_handle_irq+0x0/0x44) from [<8000f4c4>] (handle_IRQ+0x54/0xb4)
[<8000f470>] (handle_IRQ+0x0/0xb4) from [<800085b8>] (tzic_handle_irq+0x64/0x94)
r9:412fc085 r8:00000000 r7:80513f30 r6:00000001 r5:00000000
r4:00000000
[<80008554>] (tzic_handle_irq+0x0/0x94) from [<8000e680>] (__irq_svc+0x40/0x60)
The reason of above dump message is calling dma_poll_alloc with can-schedule
mem_flags at atomic context.
To fix this problem, below changes are made:
- fsl_req_to_dtd doesn't need to be protected by spin_lock_irqsave,
as struct usb_request can be access at process context. Move lock
to beginning of hardware visit (fsl_queue_td).
- Change the memory flag which using to allocate dTD descriptor buffer,
the memory flag can be from gadget layer.
It is tested at i.mx51 bbg board with g_mass_storage, g_ether, g_serial.
Signed-off-by: Peter Chen <peter.chen@freescale.com>
Acked-by: Li Yang <leoli@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 711c68b3c0 upstream.
We must not allow the input buffer length to change while we're
shuffling the buffer contents. We also mustn't clear the WDM_READ
flag after more data might have arrived. Therefore move both of these
into the spinlocked region at the bottom of wdm_read().
When reading desc->length without holding the iuspin lock, use
ACCESS_ONCE() to ensure the compiler doesn't re-read it with
inconsistent results.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 548dd4b6da upstream.
Do not report errors in write path if port is used as a console as this
may trigger the same error (and error report) resulting in a loop.
Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a4c61b7ce upstream.
Bugzilla 40012: PIO_UNIMAP bug: error updating Unicode-to-font map
https://bugzilla.kernel.org/show_bug.cgi?id=40012
The unicode font map for the virtual console is a 32x32x64 table which
allocates rows dynamically as entries are added. The unicode value
increases sequentially and should count all entries even in empty
rows. The defect is when copying the unicode font map in con_set_unimap(),
the unicode value is not incremented properly. The wrong unicode value
is entered in the new font map.
Signed-off-by: Liz Clark <liz.clark@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58112dfbfe upstream.
This is supposed to be doing a shift before the comparison instead of
just doing a bitwise AND directly. The current code means the start()
just returns without doing anything.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59263b513c upstream.
Some of the newer futex PI opcodes do not check the cmpxchg enabled
variable and call unconditionally into the handling functions. Cover
all PI opcodes in a separate check.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33d2832ab0 upstream.
HID devices should specify this in their interface descriptors, not in the
device descriptor. This fixes a "missing hardware id" bug under Windows 7 with
a VIA VL800 (3.0) controller.
Signed-off-by: Orjan Friberg <of@flatfrog.com>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85b4b3c8c1 upstream.
A read from GadgetFS endpoint 0 during the data stage of a control
request would always return 0 on success (as returned by
wait_event_interruptible) despite having written data into the user
buffer.
This patch makes it correctly set the return value to the number of
bytes read.
Signed-off-by: Thomas Faber <thfabba@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39287076e4 upstream.
musb INDEX register is getting modified/corrupted during temporary
un-locking in a SMP system. Set this register with proper value
after re-acquiring the lock
Scenario:
---------
CPU1 is handling a data transfer completion interrupt received for
the CLASS1 EP
CPU2 is handling a CLASS2 thread which is queuing data to musb for
transfer
Below is the error sequence:
CPU1 | CPU2
--------------------------------------------------------------------
Data transfer completion inter- |
rupt recieved. |
|
musb INDEX reg set to CLASS1 EP |
|
musb LOCK is acquired. |
|
| CLASS2 thread queues data.
|
| CLASS2 thread tries to acquire musb
| LOCK but lock is already taken by
| CLASS1, so CLASS2 thread is
| spinning.
|
From Interrupt Context musb |
giveback function is called |
|
The giveback function releases | CLASS2 thread now acquires LOCK
LOCK |
|
ClASS1 Request's completion cal-| ClASS2 schedules the data transfer and
lback is called | sets the MUSB INDEX to Class2 EP number
|
Interrupt handler for CLASS1 EP |
tries to acquire LOCK and is |
spinning |
|
Interrupt for Class1 EP acquires| Class2 completes the scheduling etc and
the MUSB LOCK | releases the musb LOCK
|
Interrupt for Class1 EP schedul-|
es the next data transfer |
but musb INDEX register is still|
set to CLASS2 EP |
Since the MUSB INDEX register is set to a different endpoint, we
read and modify the wrong registers. Hence data transfer will not
happen properly. This results in unpredictable behavior
So, the MUSB INDEX register is set to proper value again when
interrupt re-acquires the lock
Signed-off-by: Supriya Karanth <supriya.karanth@stericsson.com>
Signed-off-by: Praveena Nadahally <praveen.nadahally@stericsson.com>
Reviewed-by: srinidhi kasagar <srinidhi.kasagar@stericsson.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
commit dc0827c128 upstream.
Add PID 0x6015, corresponding to the new series of FT-X chips
(FT220XD, FT201X, FT220X, FT221X, FT230X, FT231X, FT240X). They all
appear as serial devices, and seem indistinguishable except for the
default product string stored in their EEPROM. The baudrate
generation matches FT232RL devices.
Tested with a FT201X and FT230X at various baudrates (100 - 3000000).
Sample dmesg:
ftdi_sio: v1.6.0:USB FTDI Serial Converters Driver
usb 2-1: new full-speed USB device number 6 using ohci_hcd
usb 2-1: New USB device found, idVendor=0403, idProduct=6015
usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 2-1: Product: FT230X USB Half UART
usb 2-1: Manufacturer: FTDI
usb 2-1: SerialNumber: DC001WI6
ftdi_sio 2-1:1.0: FTDI USB Serial Device converter detected
drivers/usb/serial/ftdi_sio.c: ftdi_sio_port_probe
drivers/usb/serial/ftdi_sio.c: ftdi_determine_type: bcdDevice = 0x1000, bNumInterfaces = 1
usb 2-1: Detected FT-X
usb 2-1: Number of endpoints 2
usb 2-1: Endpoint 1 MaxPacketSize 64
usb 2-1: Endpoint 2 MaxPacketSize 64
usb 2-1: Setting MaxPacketSize 64
drivers/usb/serial/ftdi_sio.c: read_latency_timer
drivers/usb/serial/ftdi_sio.c: write_latency_timer: setting latency timer = 1
drivers/usb/serial/ftdi_sio.c: create_sysfs_attrs
drivers/usb/serial/ftdi_sio.c: sysfs attributes for FT-X
usb 2-1: FTDI USB Serial Device converter now attached to ttyUSB0
Signed-off-by: Jim Paris <jim@jtan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1cee1d840 upstream.
Microchip VID (0x04d8) was mislabeled as Hornby VID according to USB-IDs.
A Full Speed USB Demo Board PID (0x000a) was mislabeled as
Hornby Elite (an Digital Command Controller Console for model railways).
Most likely the Hornby based their design on
PIC18F87J50 Full Speed USB Demo Board.
Signed-off-by: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 444aa7fa9b upstream.
BeagleBone changed to the default FTDI 0403:6010 id in rev A5 to make life
easier for Windows users, so we need a similar workaround as the Calao
board to support it.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 656d2b3964 upstream.
On some misconfigured ftdi_sio devices, if the manufacturer string is
NULL, the kernel will oops when the device is plugged in. This patch
fixes the problem.
Reported-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Tested-by: Wojciech M Zabolotny <W.Zabolotny@elka.pw.edu.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5889d3d420 upstream.
This device presents a total of 5 interfaces with ff/ff/ff
class/subclass/protocol. The last one of these is verified
to be a QMI/wwan combined interface which should be handled
by the qmi_wwan driver, so we blacklist it here.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 963940cf47 upstream.
commit 0d905fd "USB: option: convert Huawei K3765, K4505, K4605
reservered interface to blacklist" accidentally ANDed two
blacklist tests by leaving out a return. This was not noticed
because the two consecutive bracketless if statements made it
syntactically correct.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 210787e82a upstream.
On il3945_down procedure we free tx queue data and nullify il->txq
pointer. After that we drop mutex and then cancel delayed works. There
is possibility, that after drooping mutex and before the cancel, some
delayed work will start and crash while trying to send commands to
the device. For example, here is reported crash in
il3945_bg_reg_txpower_periodic():
https://bugzilla.kernel.org/show_bug.cgi?id=42766#c10
Patch fix problem by adding il->txq check on works that send commands,
hence utilize tx queue.
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[ Upstream commit c577923756 ]
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in 96b52e61be (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dfd25ffffc ]
commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.
Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.
In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.
In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().
Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b832796caa upstream.
I have a workload where perf top scribbles over the stack and we SEGV.
What makes it interesting is that an snprintf is causing this.
The workload is a c++ gem that has method names over 3000 characters
long, but snprintf is designed to avoid overrunning buffers. So what
went wrong?
The problem is we assume snprintf returns the number of characters
written:
ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", self->level);
...
ret += repsep_snprintf(bf + ret, size - ret, "%s", self->ms.sym->name);
Unfortunately this is not how snprintf works. snprintf returns the
number of characters that would have been written if there was enough
space. In the above case, if the first snprintf returns a value larger
than size, we pass a negative size into the second snprintf and happily
scribble over the stack. If you have 3000 character c++ methods thats a
lot of stack to trample.
This patch fixes repsep_snprintf by clamping the value at size - 1 which
is the maximum snprintf can write before adding the NULL terminator.
I get the sinking feeling that there are a lot of other uses of snprintf
that have this same bug, we should audit them all.
Cc: David Ahern <dsahern@gmail.com>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20120307114249.44275ca3@kryten
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c017386352 upstream.
When writing files to afs I sometimes hit a BUG:
kernel BUG at fs/afs/rxrpc.c:179!
With a backtrace of:
afs_free_call
afs_make_call
afs_fs_store_data
afs_vnode_store_data
afs_write_back_from_locked_page
afs_writepages_region
afs_writepages
The cause is:
ASSERT(skb_queue_empty(&call->rx_queue));
Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:
rx abort fs reply store-data error diskquota exceeded (32)
So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.
By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c724fb927 upstream.
A read of a large file on an afs mount failed:
# cat junk.file > /dev/null
cat: junk.file: Bad message
Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:
_enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...
if (call->offset < count) {
if (last) {
_leave(" = -EBADMSG [%d < %zu]", call->offset, count);
return -EBADMSG;
}
Which matches the trace:
[cat ] ==> afs_extract_data({65132},{524},1,,65536)
[cat ] <== afs_extract_data() = -EBADMSG [0 < 65536]
call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ee161ce5e upstream.
When the system is under heavy load, there can be a significant delay
between the getscl() and time_after() calls inside sclhi(). That delay
may cause the time_after() check to trigger after SCL has gone high,
causing sclhi() to return -ETIMEDOUT.
To fix the problem, double check that SCL is still low after the
timeout has been reached, before deciding to return -ETIMEDOUT.
Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32260d9440 upstream.
The driver probe function leaked memory if creating the cpu0_vid attribute file
failed. Fix by converting the driver to use devm_kzalloc.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 33fa9b6204 upstream.
NCT6775F and NCT6776F have their own set of registers for FAN_STOP_TIME. The
correct registers were used to read FAN_STOP_TIME, but writes used the wrong
registers. Fix it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For 3.0 stable kernel the backport of 048cd4e51d
"compat: fix compile breakage on s390" breaks compilation...
Re-add a single #include <asm/compat.h> in order to fix this.
This patch is _not_ necessary for upstream, only for stable kernels
which include the "build fix" mentioned above.
One fix for arch/s390/kernel/setup.c was already sent and applied. But
we need a similar patch for drivers/s390/char/fs3270.c.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0adb9902f upstream.
Newer version of binutils are more strict about specifying the
correct options to enable certain classes of instructions.
The sparc32 build is done for v7 in order to support sun4c systems
which lack hardware integer multiply and divide instructions.
So we have to pass -Av8 when building the assembler routines that
use these instructions and get patched into the kernel when we find
out that we have a v8 capable cpu.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff3bc1e752 upstream.
When pre-allocating skbs for received packets, we set ip_summed =
CHECKSUM_UNNCESSARY. We used to change it back to CHECKSUM_NONE when
the received packet had an incorrect checksum or unhandled protocol.
Commit bc8acf2c8c ('drivers/net: avoid
some skb->ip_summed initializations') mistakenly replaced the latter
assignment with a DEBUG-only assertion that ip_summed ==
CHECKSUM_NONE. This assertion is always false, but it seems no-one
has exercised this code path in a DEBUG build.
Fix this by moving our assignment of CHECKSUM_UNNECESSARY into
efx_rx_packet_gro().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62d3c5439c upstream.
This patch (as1519) fixes a bug in the block layer's disk-events
polling. The polling is done by a work routine queued on the
system_nrt_wq workqueue. Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.
Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.
The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f53d2fe81 upstream.
The following situation might occur:
__blkdev_get: add_disk:
register_disk()
get_gendisk()
disk_block_events()
disk->ev == NULL
disk_add_events()
__disk_unblock_events()
disk->ev != NULL
--ev->block
Then we unblock events, when they are suppose to be blocked. This can
trigger events related block/genhd.c warnings, but also can crash in
sd_check_events() or other places.
I'm able to reproduce crashes with the following scripts (with
connected usb dongle as sdb disk).
<snip>
DEV=/dev/sdb
ENABLE=/sys/bus/usb/devices/1-2/bConfigurationValue
function stop_me()
{
for i in `jobs -p` ; do kill $i 2> /dev/null ; done
exit
}
trap stop_me SIGHUP SIGINT SIGTERM
for ((i = 0; i < 10; i++)) ; do
while true; do fdisk -l $DEV 2>&1 > /dev/null ; done &
done
while true ; do
echo 1 > $ENABLE
sleep 1
echo 0 > $ENABLE
done
</snip>
I use the script to verify patch fixing oops in sd_revalidate_disk
http://marc.info/?l=linux-scsi&m=132935572512352&w=2
Without Jun'ichi Nomura patch titled "Fix NULL pointer dereference in
sd_revalidate_disk" or this one, script easily crash kernel within
a few seconds. With both patches applied I do not observe crash.
Unfortunately after some time (dozen of minutes), script will hung in:
[ 1563.906432] [<c08354f5>] schedule_timeout_uninterruptible+0x15/0x20
[ 1563.906437] [<c04532d5>] msleep+0x15/0x20
[ 1563.906443] [<c05d60b2>] blk_drain_queue+0x32/0xd0
[ 1563.906447] [<c05d6e00>] blk_cleanup_queue+0xd0/0x170
[ 1563.906454] [<c06d278f>] scsi_free_queue+0x3f/0x60
[ 1563.906459] [<c06d7e6e>] __scsi_remove_device+0x6e/0xb0
[ 1563.906463] [<c06d4aff>] scsi_forget_host+0x4f/0x60
[ 1563.906468] [<c06cd84a>] scsi_remove_host+0x5a/0xf0
[ 1563.906482] [<f7f030fb>] quiesce_and_remove_host+0x5b/0xa0 [usb_storage]
[ 1563.906490] [<f7f03203>] usb_stor_disconnect+0x13/0x20 [usb_storage]
Anyway I think this patch is some step forward.
As drawback, I do not teardown on sysfs file create error, because I do
not know how to nullify disk->ev (since it can be used). However add_disk
error handling practically does not exist too, and things will work
without this sysfs file, except events will not be exported to user
space.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe316bf2d5 upstream.
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.
However it ends up calling driver's revalidate_disk without open
and could cause oops.
In the case of SCSI:
process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052
This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For kernels <= 3.0 the backport of 048cd4e51d
"compat: fix compile breakage on s390" will break compilation...
Re-add a single #include <asm/compat.h> in order to fix this.
This patch is _not_ necessary for upstream, only for stable kernels
which include the "build fix" mentioned above.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
commit 4e50391968 upstream.
This patch adds support for the Sitecom LN-031 USB adapter with a AX88178 chip.
Added USB id to find correct driver for AX88178 1000 Ethernet adapter.
Signed-off-by: Joerg Neikes <j.neikes@midlandgate.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 11aad99af6 ]
This driver attempts to use two TX rings but lacks proper support :
1) IRQ handler only takes care of TX completion on first TX ring
2) the stop/start logic uses the legacy functions (for non multiqueue
drivers)
This means all packets witk skb mark set to 1 are sent through high
queue but are never cleaned and queue eventualy fills and block the
device, triggering the infamous "NETDEV WATCHDOG" message.
Lets use a single TX ring to fix the problem, this driver is not a real
multiqueue one yet.
Minimal fix for stable kernels.
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d6ddef9e64 ]
When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4648dc97af ]
This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.
This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).
Since 2008 (832d11c5cd)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.
Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.
Before the fixes in commits cc9a672ee5
and daef52bab1, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.
After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:
(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
then shifts a prefix of that skb that is below snd_una
(2) tcp_shifted_skb() increments the packet count of the
already-SACKed prev sk_buff
(3) tcp_sacktag_one() sees the end of the new SACKed range is below
snd_una, so it short-circuits and doesn't increase tp->sacked_out
(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
decrements tp->sacked_out by this "inflated" pcount that was
missing a matching increase in tp->sacked_out, and hence
tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
to s32 is negative.
(6) this leads to the warnings seen in the recent "WARNING: at
net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0);
More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.
This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d1d81d4c3d ]
otherwise source IPv6 address of ICMPV6_MGM_QUERY packet
might be random junk if IPv6 is disabled on interface or
link-local address is not yet ready (DAD).
Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0638c247f ]
In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:
(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).
(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c90d3b303 ]
When tcp_shifted_skb() shifts bytes from the skb that is currently
pointed to by 'highest_sack' then the increment of
TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
implicit advancement, combined with the recent fix to pass the correct
SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
that the newly SACKed range was before the tcp_highest_sack_seq(),
leading to a call to tcp_update_reordering() with a degree of
reordering matching the size of the newly SACKed range (typically just
1 packet, which is a NOP, but potentially larger).
This commit fixes this by simply calling tcp_sacktag_one() before the
TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
highest SACKed sequence.
Correspondingly, we can simplify the code a little now that
tcp_shifted_skb() should update the lost_cnt_hint in all cases where
skb == tp->lost_skb_hint.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a49ad6e89 ]
This patch fixes a (mostly cosmetic) bug introduced by the patch
'ppp: Use SKB queue abstraction interfaces in fragment processing'
found here: http://www.spinics.net/lists/netdev/msg153312.html
The above patch rewrote and moved the code responsible for cleaning
up discarded fragments but the new code does not catch every case
where this is necessary. This results in some discarded fragments
remaining in the queue, and triggering a 'bad seq' error on the
subsequent call to ppp_mp_reconstruct. Fragments are discarded
whenever other fragments of the same frame have been lost.
This can generate a lot of unwanted and misleading log messages.
This patch also adds additional detail to the debug logging to
make it clearer which fragments were lost and which other fragments
were discarded as a result of losses. (Run pppd with 'kdebug 1'
option to enable debug logging.)
Signed-off-by: Ben McKeegan <ben@netservers.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 84338a6c9d ]
When the fixed race condition happens:
1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.
2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.
3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.
4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.
Signed-off-by: Michel Machado <michel@digirati.com.br>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 461e74377c upstream.
We have several reports which says acer-wmi is loaded on ideapads
and register rfkill for wifi which can not be unblocked.
Since ideapad-laptop also register rfkill for wifi and it works
reliably, it will be fine acer-wmi is not going to register rfkill
for wifi once VPC2004 is found.
Also put IBM0068/LEN0068 in the list. Though thinkpad_acpi has no
wifi rfkill capability, there are reports which says acer-wmi also
block wireless on Thinkpad E520/E420.
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 097b180ca0 upstream.
complete_walk() already puts nd->path, no need to do it again at cleanup time.
This would result in Oopses if triggered, apparently the codepath is not too
well exercised.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f6c7e62fc upstream.
complete_walk() returns either ECHILD or ESTALE. do_last() turns this into
ECHILD unconditionally. If not in RCU mode, this error will reach userspace
which is complete nonsense.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3780d038fd upstream.
Is possible that we stop queue and then do not wake up it again,
especially when packets are transmitted fast. That can be easily
reproduced with modified tx queue entry_num to some small value e.g. 16.
If mac80211 already hold local->queue_stop_reason_lock, then we can wait
on that lock in both rt2x00queue_pause_queue() and
rt2x00queue_unpause_queue(). After drooping ->queue_stop_reason_lock
is possible that __ieee80211_wake_queue() will be performed before
__ieee80211_stop_queue(), hence we stop queue and newer wake up it
again.
Another race condition is possible when between rt2x00queue_threshold()
check and rt2x00queue_pause_queue() we will process all pending tx
buffers on different cpu. This might happen if for example interrupt
will be triggered on cpu performing rt2x00mac_tx().
To prevent race conditions serialize pause/unpause by queue->tx_lock.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe6b91f470 upstream.
Disabling all runtime PM during system shutdown turns out not to be a
good idea, because some devices may need to be woken up from a
low-power state at that time.
The whole point of disabling runtime PM for system shutdown was to
prevent untimely runtime-suspend method calls. This patch (as1504)
accomplishes the same result by incrementing the usage count for each
device and waiting for ongoing runtime-PM callbacks to finish. This
is what we already do during system suspend and hibernation, which
makes sense since the shutdown method is pretty much a legacy analog
of the pm->poweroff method.
This fixes a recent regression on some OMAP systems introduced by
commit af8db1508f (PM / driver core:
disable device's runtime PM during shutdown).
Reported-and-tested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Kostyantyn Shlyakhovoy <x0155534@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaff12039f upstream.
Some older Panasonic made camcorders (Panasonic AG-EZ30 and NV-DX110,
Grundig Scenos DLC 2000) reject requests with ack_busy_X if a request is
sent immediately after they sent a response to a prior transaction.
This causes firewire-core to fail probing of the camcorder with "giving
up on config rom for node id ...". Consequently, programs like kino or
dvgrab are unaware of the presence of a camcorder.
Such transaction failures happen also with the ieee1394 driver stack
(of the 2.4...2.6 kernel series until 2.6.36 inclusive) but with a lower
likelihood, such that kino or dvgrab are generally able to use these
camcorders via the older driver stack. The cause for firewire-ohci's or
firewire-core's worse behavior is not yet known. Gap count optimization
in firewire-core is not the cause. Perhaps the slightly higher latency
of transaction completion in the older stack plays a role. (ieee1394:
AR-resp DMA context tasklet -> packet completion ktread -> user process;
firewire-core: tasklet -> user process.)
This change introduces retries and delays after ack_busy_X into
firewire-core's Config ROM reader, such that at least firewire-core's
probing and /dev/fw* creation are successful. This still leaves the
problem that userland processes are facing transaction failures.
gscanbus's built-in retry routines deal with them successfully, but
neither kino's nor dvgrab's do ever succeed.
But at least DV capture with "dvgrab -noavc -card 0" works now. Live
video preview in kino works too, but not actual capture.
One way to prevent Configuration ROM reading failures in application
programs is to modify libraw1394 to synthesize read responses by means
of firewire-core's Configuration ROM cache. This would only leave
CMP and FCP transaction failures as a potential problem source for
applications.
Reported-and-tested-by: Thomas Seilund <tps@netmaster.dk>
Reported-and-tested-by: René Fritz <rene@colorcube.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c1176b6a2 upstream.
Clemens points out that we need to use compat_ptr() in order to safely
cast from u64 to addresses of a 32-bit usermode client.
Before, our conversion went wrong
- in practice if the client cast from pointer to integer such that
sign-extension happened, (libraw1394 and libdc1394 at least were not
doing that, IOW were not affected)
or
- in theory on s390 (which doesn't have FireWire though) and on the
tile architecture, regardless of what the client does.
The bug would usually be observed as the initial get_info ioctl failing
with "Bad address" (EFAULT).
Reported-by: Carl Karsten <carl@personnelware.com>
Reported-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4949be1682 upstream.
Right now we won't touch ASPM state if ASPM is disabled, except in the case
where we find a device that appears to be too old to reliably support ASPM.
Right now we'll clear it in that case, which is almost certainly the wrong
thing to do. The easiest way around this is just to disable the blacklisting
when ASPM is disabled.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a7f4255f90 upstream.
Commit f0fbf0abc0 ("x86: integrate delay functions") converted
delay_tsc() into a random delay generator for 64 bit. The reason is
that it merged the mostly identical versions of delay_32.c and
delay_64.c. Though the subtle difference of the result was:
static void delay_tsc(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned long bclock, now;
Now the function uses rdtscl() which returns the lower 32bit of the
TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64
bit this fails when the lower 32bit are close to wrap around when
bclock is read, because the following check
if ((now - bclock) >= loops)
break;
evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0
because the unsigned long (now - bclock) of these values results in
0xffffffff00000001 which is definitely larger than the loops
value. That explains Tvortkos observation:
"Because I am seeing udelay(500) (_occasionally_) being short, and
that by delaying for some duration between 0us (yep) and 491us."
Make those variables explicitely u32 again, so this works for both 32
and 64 bit.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7b2855505 upstream.
Current code has put_ioctx() called asynchronously from aio_fput_routine();
that's done *after* we have killed the request that used to pin ioctx,
so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
from progressing. As the result, we can end up with async call of
put_ioctx() being the last one and possibly happening during exit_mmap()
or elf_core_dump(), neither of which expects stray munmap() being done
to them...
We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
with that, but that's all we care about - neither io_destroy() nor
exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
does really_put_req(), so the ioctx teardown won't be done until then
and we don't care about the contents of ioctx past that point.
Since actual freeing of these suckers is RCU-delayed, we don't need to
bump ioctx refcount when request goes into list for async removal.
All we need is rcu_read_lock held just over the ->ctx_lock-protected
area in aio_fput_routine().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86b62a2cb4 upstream.
Have ioctx_alloc() return an extra reference, so that caller would drop it
on success and not bother with re-grabbing it on failure exit. The current
code is obviously broken - io_destroy() from another thread that managed
to guess the address io_setup() would've returned would free ioctx right
under us; gets especially interesting if aio_context_t * we pass to
io_setup() points to PROT_READ mapping, so put_user() fails and we end
up doing io_destroy() on kioctx another thread has just got freed...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97e43c983c upstream.
Silence following warnings:
WARNING: drivers/mfd/cs5535-mfd.o(.data+0x20): Section mismatch in
reference from the variable cs5535_mfd_drv to the function
.devinit.text:cs5535_mfd_probe()
The variable cs5535_mfd_drv references
the function __devinit cs5535_mfd_probe()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
WARNING: drivers/mfd/cs5535-mfd.o(.data+0x28): Section mismatch in
reference from the variable cs5535_mfd_drv to the function
.devexit.text:cs5535_mfd_remove()
The variable cs5535_mfd_drv references
the function __devexit cs5535_mfd_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
Rename the variable from *_drv to *_driver so
modpost ignore the OK references to __devinit/__devexit
functions.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 474de3bbad upstream.
Fix scan_timers() to be __devinit and not __init since
the function get called from cs5535_mfgpt_probe which is
__devinit.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ca93de9b7 upstream.
Fix dm-raid flush support.
Both md and dm have support for flush, but the dm-raid target
forgot to set the flag to indicate that flushes should be
passed on. (Important for data integrity e.g. with writeback cache
enabled.)
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0c535e0d6f upstream.
This patch fixes a crash by recognising discards in dm_io.
Currently dm_mirror can send REQ_DISCARD bios if running over a
discard-enabled device and without support in dm_io the system
crashes badly.
BUG: unable to handle kernel paging request at 00800000
IP: __bio_add_page.part.17+0xf5/0x1e0
...
bio_add_page+0x56/0x70
dispatch_io+0x1cf/0x240 [dm_mod]
? km_get_page+0x50/0x50 [dm_mod]
? vm_next_page+0x20/0x20 [dm_mod]
? mirror_flush+0x130/0x130 [dm_mirror]
dm_io+0xdc/0x2b0 [dm_mod]
...
Introduced in 2.6.38-rc1 by commit 5fc2ffeabb
(dm raid1: support discard).
Signed-off-by: Milan Broz <mbroz@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4231d47e6f upstream.
|kernel BUG at kernel/rtmutex.c:724!
|[<c029599c>] (rt_spin_lock_slowlock+0x108/0x2bc) from [<c01c2330>] (defer_bh+0x1c/0xb4)
|[<c01c2330>] (defer_bh+0x1c/0xb4) from [<c01c3afc>] (rx_complete+0x14c/0x194)
|[<c01c3afc>] (rx_complete+0x14c/0x194) from [<c01cac88>] (usb_hcd_giveback_urb+0xa0/0xf0)
|[<c01cac88>] (usb_hcd_giveback_urb+0xa0/0xf0) from [<c01e1ff4>] (musb_giveback+0x34/0x40)
|[<c01e1ff4>] (musb_giveback+0x34/0x40) from [<c01e2b1c>] (musb_advance_schedule+0xb4/0x1c0)
|[<c01e2b1c>] (musb_advance_schedule+0xb4/0x1c0) from [<c01e2ca8>] (musb_cleanup_urb.isra.9+0x80/0x8c)
|[<c01e2ca8>] (musb_cleanup_urb.isra.9+0x80/0x8c) from [<c01e2ed0>] (musb_urb_dequeue+0xec/0x108)
|[<c01e2ed0>] (musb_urb_dequeue+0xec/0x108) from [<c01cbb90>] (unlink1+0xbc/0xcc)
|[<c01cbb90>] (unlink1+0xbc/0xcc) from [<c01cc2ec>] (usb_hcd_unlink_urb+0x54/0xa8)
|[<c01cc2ec>] (usb_hcd_unlink_urb+0x54/0xa8) from [<c01c2a84>] (unlink_urbs.isra.17+0x2c/0x58)
|[<c01c2a84>] (unlink_urbs.isra.17+0x2c/0x58) from [<c01c2b44>] (usbnet_terminate_urbs+0x94/0x10c)
|[<c01c2b44>] (usbnet_terminate_urbs+0x94/0x10c) from [<c01c2d68>] (usbnet_stop+0x100/0x15c)
|[<c01c2d68>] (usbnet_stop+0x100/0x15c) from [<c020f718>] (__dev_close_many+0x94/0xc8)
defer_bh() takes the lock which is hold during unlink_urbs(). The safe
walk suggest that the skb will be removed from the list and this is done
by defer_bh() so it seems to be okay to drop the lock here.
Reported-by: AnÃbal Almeida Pinto <anibal.pinto@efacec.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 992d52529d upstream.
On Access Point mode, when transmitting a packet, if the destination
station is in powersave mode, we abort transmitting the packet to the
device queue, but we do not reclaim the allocated memory. Given enough
packets, we can go in a state where there is no packet on the device
queue, but we think the device has no memory left, so no packet gets
transmitted, connections breaks and the AP stops working.
This undo the allocation done in the TX path when the station is in
power-save mode.
Signed-off-by: Nicolas Cavallari <cavallar@lri.fr>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99c90ab31f upstream.
ALPS touchpad detection fails if some buttons of ALPS are pressed.
The reason is that the "E6" query response byte is different from
what is expected.
This was tested on a Toshiba Portege R500.
Signed-off-by: Akio Idehara <zbe64533@gmail.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9105b8b200 upstream.
Currently the module init function registers a platform_device and
only then allocates its IRQ and I/O region. This allows allocation to
race with the device's suspend() function. Instead, allocate
resources in the platform driver's probe() function and free them in
the remove() function.
The module exit function removes the platform device before the
character device that provides access to it. Change it to reverse the
order of initialisation.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c49d005b6c upstream.
A hardware bug in the OMAP4 HDMI PHY causes physical damage to the board
if the HDMI PHY is kept powered on when the cable is not connected.
This patch solves the problem by adding hot-plug-detection into the HDMI
IP driver. This is not a real HPD support in the sense that nobody else
than the IP driver gets to know about the HPD events, but is only meant
to fix the HW bug.
The strategy is simple: If the display device is turned off by the user,
the PHY power is set to OFF. When the display device is turned on by the
user, the PHY power is set either to LDOON or TXON, depending on whether
the HDMI cable is connected.
The reason to avoid PHY OFF when the display device is on, but the cable
is disconnected, is that when the PHY is turned OFF, the HDMI IP is not
"ticking" and thus the DISPC does not receive pixel clock from the HDMI
IP. This would, for example, prevent any VSYNCs from happening, and
would thus affect the users of omapdss. By using LDOON when the cable is
disconnected we'll avoid the HW bug, but keep the HDMI working as usual
from the user's point of view.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78a1ad8f12 upstream.
The HDMI GPIO pins LS_OE and CT_CP_HPD are not currently configured.
This patch configures them as output pins.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bb122d155 upstream.
"hdmi_hpd" pin is muxed to INPUT and PULLUP, but the pin is not
currently used, and in the future when it is used, the pin is used as a
GPIO and is board specific, not an OMAP4 wide thing.
So remove the muxing for now.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3932a32fcf upstream.
The GPIO 60 on 4430sdp and Panda is not HPD GPIO, as currently marked in
the board files, but CT_CP_HPD, which is used to enable/disable HPD
functionality.
This patch renames the GPIO.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b065403710 upstream.
Patchset "ARM: orion: Refactor the MPP code common in the orion
platform" broke at least Orion5x based platforms. These platforms have
pins configured as GPIO when the selector is not 0x0. However the
common code assumes the selector is always 0x0 for a GPIO lines. It
then ignores the GPIO bits in the MPP definitions, resulting in that
Orion5x machines cannot correctly configure there GPIO lines.
The Fix removes the assumption that the selector is always 0x0.
In order that none GPIO configurations are correctly blocked,
Kirkwood and mv78xx0 MPP definitions are corrected to only set the
GPIO bits for GPIO configurations.
This third version, which does not contain any whitespace changes,
and is rebased on v3.3-rc2.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
commit 7205335358 upstream.
The patch "ARM: orion: Consolidate USB platform setup code.", commit
4fcd3f374a broke USB on TS-7800 and
other orion5x boards, because the wrong type of PHY was being passed
to the EHCI driver in the platform data. Orion5x needs EHCI_PHY_ORION
and all the others want EHCI_PHY_NA.
Allow the mach- code to tell the generic plat-orion code which USB PHY
enum to place into the platform data.
Version 2: Rebase to v3.3-rc2.
Reported-by: Ambroz Bizjak <ambrop7@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Ambroz Bizjak <ambrop7@gmail.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3.0.21's 603b634847 directly used
the upstream patch, yet kprobes locking in 3.0.x uses spin_lock...()
rather than raw_spin_lock...().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ed80a75b2 upstream.
According to i.MX27 Reference Manual (p 1593) TXBIT0 bit selects
whether the most significant or the less significant part of the
data word written to the FIFO is transmitted.
As DSP_A is the same as DSP_B with a data offset of 1 bit, it
doesn't make any sense to remove TXBIT0 bit here.
Signed-off-by: Javier Martin <javier.martin@vista-silicon.com>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7679e42ec8 upstream.
Recent enhancements in the bias management means that we might not be
in standby when the CODEC is idle and can have active widgets without
being in full power mode but the shutdown functionality assumes these
things. Add checks for the bias level at each stage so that we don't
do transitions other than the ON->PREPARE->STANDBY->OFF ones that the
drivers are expecting.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41f8ad7636 upstream.
It used to be that minors where 8 bit. But now they
are actually 20 bit. So the fix is simplicity itself.
I've tested with 300 devices and all user-mode utils
work just fine. I have also mechanically added 10,000
to the ida (so devices are /dev/osd10000, /dev/osd10001 ...)
and was able to mkfs an exofs filesystem and access osds
from user-mode.
All the open-osd user-mode code uses the same library
to access devices through their symbolic names in
/dev/osdX so I'd say it's pretty safe. (Well tested)
This patch is very important because some of the systems
that will be deploying the 3.2 pnfs-objects code are larger
than 64 OSDs and will stop to work properly when reaching
that number.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 37891abc84 upstream.
This patch (as1531) adds a NOGET quirk for the Slim+ keyboard marketed
by AIREN. This keyboard seems to have a lot of bugs; NOGET works
around only one of them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: okias <d.okias@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c641e8471 upstream.
Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
in exit_mmap() recently.
Quoting Hugh's discovery and explanation of the SMP race condition:
"mm->nr_ptes had unusual locking: down_read mmap_sem plus
page_table_lock when incrementing, down_write mmap_sem (or mm_users
0) when decrementing; whereas THP is careful to increment and
decrement it under page_table_lock.
Now most of those paths in THP also hold mmap_sem for read or write
(with appropriate checks on mm_users), but two do not: when
split_huge_page() is called by hwpoison_user_mappings(), and when
called by add_to_swap().
It's conceivable that the latter case is responsible for the
exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."
The simplest way to fix it without having to alter the locking is to make
split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
pagetables that exists for every mapped hugepage. It was an arbitrary
choice not to count them and either way is not wrong or right, because
they are not used but they're still allocated.
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bbb8168ed upstream.
Duplicate the data for iniAddac early on, to avoid having to do redundant
memcpy calls later. While we're at it, make AR5416 < v2.2 use the same
codepath. Fixes a reported crash on x86.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reported-by: Magnus Määttä <magnus.maatta@logica.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8617b093d0 upstream.
rate control algorithms concludes the rate as invalid
with rate[i].idx < -1 , while they do also check for rate[i].count is
non-zero. it would be safer to zero initialize the 'count' field.
recently we had a ath9k rate control crash where the ath9k rate control
in ath_tx_status assumed to check only for rate[i].count being non-zero
in one instance and ended up in using invalid rate index for
'connection monitoring NULL func frames' which eventually lead to the crash.
thanks to Pavel Roskin for fixing it and finding the root cause.
https://bugzilla.redhat.com/show_bug.cgi?id=768639
Cc: Pavel Roskin <proski@gnu.org>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5bccda0ebc upstream.
The cifs code will attempt to open files on lookup under certain
circumstances. What happens though if we find that the file we opened
was actually a FIFO or other special file?
Currently, the open filehandle just ends up being leaked leading to
a dentry refcount mismatch and oops on umount. Fix this by having the
code close the filehandle on the server if it turns out not to be a
regular file. While we're at it, change this spaghetti if statement
into a switch too.
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 371528caec upstream.
There is an issue when memcg unregisters events that were attached to
the same eventfd:
- On the first call mem_cgroup_usage_unregister_event() removes all
events attached to a given eventfd, and if there were no events left,
thresholds->primary would become NULL;
- Since there were several events registered, cgroups core will call
mem_cgroup_usage_unregister_event() again, but now kernel will oops,
as the function doesn't expect that threshold->primary may be NULL.
That's a good question whether mem_cgroup_usage_unregister_event()
should actually remove all events in one go, but nowadays it can't
do any better as cftype->unregister_event callback doesn't pass
any private event-associated cookie. So, let's fix the issue by
simply checking for threshold->primary.
FWIW, w/o the patch the following oops may be observed:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
Pid: 574, comm: kworker/0:2 Not tainted 3.3.0-rc4+ #9 Bochs Bochs
RIP: 0010:[<ffffffff810be32c>] [<ffffffff810be32c>] mem_cgroup_usage_unregister_event+0x9c/0x1f0
RSP: 0018:ffff88001d0b9d60 EFLAGS: 00010246
Process kworker/0:2 (pid: 574, threadinfo ffff88001d0b8000, task ffff88001de91cc0)
Call Trace:
[<ffffffff8107092b>] cgroup_event_remove+0x2b/0x60
[<ffffffff8103db94>] process_one_work+0x174/0x450
[<ffffffff8103e413>] worker_thread+0x123/0x2d0
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b6b0ad6e5 upstream.
On i.MX53 we have to write a special SDHCI_CMD_ABORTCMD to the
SDHCI_TRANSFER_MODE register during a MMC_STOP_TRANSMISSION
command. This works for SD cards. However, with MMC cards
the MMC_SET_BLOCK_COUNT command is used instead, but this
needs the same handling. Fix MMC cards by testing for the
MMC_SET_BLOCK_COUNT command aswell. Tested on a custom i.MX53
board with a Transcend MMC+ card and eMMC.
The kernel started used MMC_SET_BLOCK_COUNT in 3.0, so this
is a regression for these boards introduced in 3.0; it should
go to 3.0/3.1/3.2-stable.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62aca40365 upstream.
Michael Cree said:
: : I have noticed some user space problems (pulseaudio crashes in pthread
: : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha
: : systems) that arise when using a 2.6.39 or later kernel on Alpha.
: : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as
: : criterion for good/bad kernel) eventually leads to:
: :
: : 8d7718aa08 is the first bad commit
: : commit 8d7718aa08
: : Author: Michel Lespinasse <walken@google.com>
: : Date: Thu Mar 10 18:50:58 2011 -0800
: :
: : futex: Sanitize futex ops argument types
: :
: : Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
: : prototypes to use u32 types for the futex as this is the data type the
: : futex core code uses all over the place.
: :
: : Looking at the commit I see there is a change of the uaddr argument in
: : the Alpha architecture specific code for futexes from int to u32, but I
: : don't see why this should cause a problem.
Richard Henderson said:
: futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
: u32 oldval, u32 newval)
: ...
: : "r"(uaddr), "r"((long)oldval), "r"(newval)
:
:
: There is no 32-bit compare instruction. These are implemented by
: consistently extending the values to a 64-bit type. Since the
: load instruction sign-extends, we want to sign-extend the other
: quantity as well (despite the fact it's logically unsigned).
:
: So:
:
: - : "r"(uaddr), "r"((long)oldval), "r"(newval)
: + : "r"(uaddr), "r"((long)(int)oldval), "r"(newval)
:
: should do the trick.
Michael said:
: This fixes the glibc test suite failures and the pulseaudio related
: crashes, but it does not fix the java compiiler lockups that I was (and
: are still) observing. That is some other problem.
Reported-by: Michael Cree <mcree@orcon.net.nz>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee932bf9ac upstream.
In the current kernel implementation, the Logitech Harmony 900 remote
control is matched to the cdc_ether driver through the generic
USB_CDC_SUBCLASS_MDLM entry. However, this device appears to be of the
pseudo-MDLM (Belcarra) type, rather than the standard one. This patch
blacklists the Harmony 900 from the cdc_ether driver and whitelists it for
the pseudo-MDLM driver in zaurus.
Signed-off-by: Scott Talbert <talbert@techie.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e39d40c65d upstream.
s3c2410_dma_suspend suspends channels from 0 to dma_channels.
s3c2410_dma_resume resumes channels in reverse order. So
pointer should be decremented instead of being incremented.
Signed-off-by: Gusakov Andrey <dron0gus@gmail.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52abb700e1 upstream.
Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken)
fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
handle_level_irq.
This happens because thread_mask is or'ed unconditionally in
irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So
the check for !desc->thread_active fails and keeps the interrupt
disabled.
Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
Document the thread_mask magic while at it.
Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81b5482c32 upstream.
The code is currently always checking the first resource of every
device only (several times.) This has been broken since the ACPI check
was added in February 2010 in commit
91fedede03.
Fix the check to run on each resource individually, once.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5189fa19a4 upstream.
There is only one error code to return for a bad user-space buffer
pointer passed to a system call in the same address space as the
system call is executed, and that is EFAULT. Furthermore, the
low-level access routines, which catch most of the faults, return
EFAULT already.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8e252586f upstream.
The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.
Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bff172a35 upstream.
A bug report with an old Sony laptop showed that we can't rely on BIOS
setting the pins of headphones but the driver should set always by
itself.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3868137ea4 upstream.
Some codecs don't supply the mute amp-capabilities although the lowest
volume gives the mute. It'd be handy if the parser provides the mute
mixers in such a case.
This patch adds an extension amp-cap bit (which is used only in the
driver) to represent the min volume = mute state. Also modified the
amp cache code to support the fake mute feature when this bit is set
but the real mute bit is unset.
In addition, conexant cx5051 parser uses this new feature to implement
the missing mute controls.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42825
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d05772060 upstream.
Enable the compat keyctl wrapper on s390x so that 32-bit s390 userspace can
call the keyctl() syscall.
There's an s390x assembly wrapper that truncates all the register values to
32-bits and this then calls compat_sys_keyctl() - but the latter only exists if
CONFIG_KEYS_COMPAT is enabled, and the s390 Kconfig doesn't enable it.
Without this patch, 32-bit calls to the keyctl() syscall are given an ENOSYS
error:
[root@devel4 ~]# keyctl show
Session Keyring
-3: key inaccessible (Function not implemented)
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: dan@danny.cz
Cc: Carsten Otte <cotte@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 844990daa2 upstream.
The hardware generates an interrupt for every completed command in the
queue while the code assumed that it will only generate one interrupt
when the queue is empty. So, explicitly check if the queue is really
empty. This patch fixed problems which occurred due to high traffic on
the bus. While we are here, move the completion-initialization after the
parameter error checking.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Marek Vasut <marek.vasut@gmail.com>
Cc: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97d2a10d58 upstream.
1. address has to be page aligned.
2. set_memory_x uses page size argument, not size.
Bug causes with following commit:
commit da28179b4e90dda56912ee825c7eaa62fc103797
Author: Mingarelli, Thomas <Thomas.Mingarelli@hp.com>
Date: Mon Nov 7 10:59:00 2011 +0100
watchdog: hpwdt: Changes to handle NX secure bit in 32bit path
commit e67d668e14 upstream.
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.
Signed-off-by: Maxim Uvarov <maxim.uvarov@oracle.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6737055c1 upstream.
The GPI_28 IRQ was not registered properly. The registration of
IRQ_LPC32XX_GPI_28 was added and the (wrong) IRQ_LPC32XX_GPI_11 at
LPC32XX_SIC1_IRQ(4) was replaced by IRQ_LPC32XX_GPI_28 (see manual of
LPC32xx / interrupt controller).
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35dd0a75d4 upstream.
This patch fixes the initialization of the interrupt controller of the LPC32xx
by correctly setting up SIC1 and SIC2 instead of (wrongly) using the same value
as for the Main Interrupt Controller (MIC).
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94ed7830cb upstream.
This patch fixes the wakeup disable function by clearing latched events.
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2707208ee8 upstream.
This patch fixes a HW bug by flushing RX FIFOs of the UARTs on init. It was
ported from NXP's git.lpclinux.com tree.
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 048cd4e51d upstream.
The new is_compat_task() define for the !COMPAT case in
include/linux/compat.h conflicts with a similar define in
arch/s390/include/asm/compat.h.
This is the minimal patch which fixes the build issues.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c761ea05a upstream.
The autofs compat handling fix caused a compile failure when
CONFIG_COMPAT isn't defined.
Instead of adding random #ifdef'fery in autofs, let's just make the
compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task()
just hardcodes to zero.
We could probably do something similar for a number of other cases where
we have #ifdef's in code, but this is the low-hanging fruit.
Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a32744d4ab upstream.
When the autofs protocol version 5 packet type was added in commit
5c0a32fc2c ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.
However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.
And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.
End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.
As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.
Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.
So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.
Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 822bfa51ce upstream.
"nframes" comes from the user and "nframes * CD_FRAMESIZE_RAW" can wrap
on 32 bit systems. That would have been ok if we used the same wrapped
value for the copy, but we use a shifted value. We should just use the
checked version of copy_to_user() because it's not going to make a
difference to the speed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28d82dc1c4 upstream.
The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely. A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.
To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited. Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm. In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.
Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.
This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events. I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'. In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5. Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links. This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.
In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'. In this case, each 'source file descriptor' has a 1 path of
length 1. Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous. Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.
In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations. Currently its only used in a subset
of the add paths. I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths. I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead. Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.
Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1). However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order. Thus, this limit is currently easily bypassed. The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.
Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL. I've also
testing using the piptest.c epoll tester, which showed no difference in
performance. I've also created a number of different epoll networks and
tested that they behave as expectded.
I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Cc: Nelson Elhage <nelhage@ksplice.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 971316f050 upstream.
signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.
Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.
This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d80e731eca upstream.
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.
epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.
This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.
__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.
ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.
The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.
In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.
Note:
- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.
- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1c1a3d012 upstream.
By hwmon sysfs interface convention, setting pwm_enable to zero sets a fan
to full speed. In the f75375s driver, this need be done by enabling
manual fan control, plus duty mode for the F875387 chip, and then setting
the maximum duty cycle. Fix a bug where the two necessary register writes
were swapped, effectively discarding the setting to full-speed.
Signed-off-by: Nikolaus Schulz <mail@microschulz.de>
Cc: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afa159538a upstream.
status has to be set to STREAMING before the streaming worker is
queued. hdpvr_transmit_buffers() will exit immediately otherwise.
Reported-by: Joerg Desch <vvd.joede@googlemail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c63522460 upstream.
The current use of /tmp for file lists is insecure. Put them under
$objtree/debian instead.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: maximilian attems <max@stro.at>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d69703263 upstream.
This patch fixes a regression that was introduced by
commit 0a5f384677
davinci_emac: Add Carrier Link OK check in Davinci RX Handler
Said commit adds a check whether the carrier link is ok. If the link is
not ok, the skb is freed and no new dma descriptor added to the rx dma
channel. This causes trouble during initialization when the carrier
status has not yet been updated. If a lot of packets are received while
netif_carrier_ok returns false, all dma descriptors are freed and the
rx dma transfer is stopped.
The bug occurs when the board is connected to a network with lots of
traffic and the ifconfig down/up is done, e.g., when reconfiguring
the interface with DHCP.
The bug can be reproduced by flood pinging the davinci board while doing
ifconfig eth0 down
ifconfig eth0 up
on the board.
After that, the rx path stops working and the overrun value reported
by ifconfig is counting up.
This patch reverts commit 0a5f384677
and instead issues warnings only if cpdma_chan_submit returns -ENOMEM.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Cc: <stable@vger.kernel.org>
Cc: Cyril Chemparathy <cyril@ti.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Rajashekhara, Sudhakar <sudhakar.raj@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba9adbe67e upstream.
Set the RX FIFO flush watermark lower.
According to Federico and JMicron's reply,
setting it to 16QW would be stable on most platforms.
Otherwise, user might experience packet drop issue.
Reported-by: Federico Quagliata <federico@quagliata.org>
Fixed-by: Federico Quagliata <federico@quagliata.org>
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0aac52e17 upstream.
Commit f11017ec2d (2.6.37)
moved the fwmark variable in subcontext that is invalidated before
reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer
in the param structure make sure the fwmark variable is in
same context. As the fwmark templates can not be matched,
more and more template connections are created and the
controlled connections can not go to single real server.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fea6d607e1 upstream.
This patch (as1520) fixes a bug in the SCSI layer's power management
implementation.
LUN scanning can be carried out asynchronously in do_scan_async(), and
sd uses an asynchronous thread for the time-consuming parts of disk
probing in sd_probe_async(). Currently nothing coordinates these
async threads with system sleep transitions; they can and do attempt
to continue scanning/probing SCSI devices even after the host adapter
has been suspended. As one might expect, the outcome is not ideal.
This is what the "prepare" stage of system suspend was created for.
After the prepare callback has been called for a host, target, or
device, drivers are not allowed to register any children underneath
them. Currently the SCSI prepare callback is not implemented; this
patch rectifies that omission.
For SCSI hosts, the prepare routine calls scsi_complete_async_scans()
to wait until async scanning is finished. It might be slightly more
efficient to wait only until the host in question has been scanned,
but there's currently no way to do that. Besides, during a sleep
transition we will ultimately have to wait until all the host scanning
has finished anyway.
For SCSI devices, the prepare routine calls async_synchronize_full()
to wait until sd probing is finished. The routine does nothing for
SCSI targets, because asynchronous target scanning is done only as
part of host scanning.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 267a6ad4ae upstream.
In do_scan_async(), calling scsi_autopm_put_host(shost) may reference
freed shost, and cause Posison overwitten warning.
Yes, this case can happen, for example, an USB is disconnected just
when do_scan_async() thread starts to run, then scsi_host_put() called
in scsi_finish_async_scan() will lead to shost be freed(because the
refcount of shost->shost_gendev decreases to 1 after USB disconnects),
at this point, if references shost again, system will show following
warning msg.
To make scsi_autopm_put_host(shost) always reference a valid shost,
put it just before scsi_host_put() in function
scsi_finish_async_scan().
[ 299.281565] =============================================================================
[ 299.281634] BUG kmalloc-4096 (Tainted: G I ): Poison overwritten
[ 299.281682] -----------------------------------------------------------------------------
[ 299.281684]
[ 299.281752] INFO: 0xffff880056c305d0-0xffff880056c305d0. First byte
0x6a instead of 0x6b
[ 299.281816] INFO: Allocated in scsi_host_alloc+0x4a/0x490 age=1688
cpu=1 pid=2004
[ 299.281870] __slab_alloc+0x617/0x6c1
[ 299.281901] __kmalloc+0x28c/0x2e0
[ 299.281931] scsi_host_alloc+0x4a/0x490
[ 299.281966] usb_stor_probe1+0x5b/0xc40 [usb_storage]
[ 299.282010] storage_probe+0xa4/0xe0 [usb_storage]
[ 299.282062] usb_probe_interface+0x172/0x330 [usbcore]
[ 299.282105] driver_probe_device+0x257/0x3b0
[ 299.282138] __driver_attach+0x103/0x110
[ 299.282171] bus_for_each_dev+0x8e/0xe0
[ 299.282201] driver_attach+0x26/0x30
[ 299.282230] bus_add_driver+0x1c4/0x430
[ 299.282260] driver_register+0xb6/0x230
[ 299.282298] usb_register_driver+0xe5/0x270 [usbcore]
[ 299.282337] 0xffffffffa04ab03d
[ 299.282364] do_one_initcall+0x47/0x230
[ 299.282396] sys_init_module+0xa0f/0x1fe0
[ 299.282429] INFO: Freed in scsi_host_dev_release+0x18a/0x1d0 age=85
cpu=0 pid=2008
[ 299.282482] __slab_free+0x3c/0x2a1
[ 299.282510] kfree+0x296/0x310
[ 299.282536] scsi_host_dev_release+0x18a/0x1d0
[ 299.282574] device_release+0x74/0x100
[ 299.282606] kobject_release+0xc7/0x2a0
[ 299.282637] kobject_put+0x54/0xa0
[ 299.282668] put_device+0x27/0x40
[ 299.282694] scsi_host_put+0x1d/0x30
[ 299.282723] do_scan_async+0x1fc/0x2b0
[ 299.282753] kthread+0xdf/0xf0
[ 299.282782] kernel_thread_helper+0x4/0x10
[ 299.282817] INFO: Slab 0xffffea00015b0c00 objects=7 used=7 fp=0x
(null) flags=0x100000000004080
[ 299.282882] INFO: Object 0xffff880056c30000 @offset=0 fp=0x (null)
[ 299.282884]
...
Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4bc724e82 upstream.
An interrupt might be pending when irq_startup() is called, but the
startup code does not invoke the resend logic. In some cases this
prevents the device from issuing another interrupt which renders the
device non functional.
Call the resend function in irq_startup() to keep things going.
Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac56376111 upstream.
When the primary handler of an interrupt which is marked IRQ_ONESHOT
returns IRQ_HANDLED or IRQ_NONE, then the interrupt thread is not
woken and the unmask logic of the interrupt line is never
invoked. This keeps the interrupt masked forever.
This was not noticed as most IRQ_ONESHOT users wake the thread
unconditionally (usually because they cannot access the underlying
device from hard interrupt context). Though this behaviour was nowhere
documented and not necessarily intentional. Some drivers can avoid the
thread wakeup in certain cases and run into the situation where the
interrupt line s kept masked.
Handle it gracefully.
Reported-and-tested-by: Lothar Wassmann <lw@karo-electronics.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2504a6423b upstream.
Rate control algorithms are supposed to stop processing when they
encounter a rate with the index -1. Checking for rate->count not being
zero is not enough.
Allowing a rate with negative index leads to memory corruption in
ath_debug_stat_rc().
One consequence of the bug is discussed at
https://bugzilla.redhat.com/show_bug.cgi?id=768639
Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68d07f64b8 upstream
Intel has a PCI USB xhci host controller on a new platform. It doesn't
have a line IRQ definition in BIOS. The Linux driver refuses to
initialize this controller, but Windows works well because it only depends
on MSI.
Actually, Linux also can work for MSI. This patch avoids the line IRQ
checking for USB3 HCDs in usb core PCI probe. It allows the xHCI driver
to try to enable MSI or MSI-X first. It will fail the probe if MSI
enabling failed and there's no legacy PCI IRQ.
This patch should be backported to kernels as old as 2.6.32.
[Maintainer note: This patch is a backport of commit
68d07f64b8 "USB: Don't fail USB3 probe on
missing legacy PCI IRQ." to the 3.0 kernel. Note, the original patch
description was wrong. We should not back port this to kernels older
than 2.6.36, since that was the first kernel to support MSI and MSI-X
for xHCI hosts. These systems will just not work without MSI support,
so the probe should fail on kernels older than 2.6.36.]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb94a40668 upstream.
This patch (as1521b) fixes the interaction between usb-storage's
scanning thread and the freezer. The current implementation has a
race: If the device is unplugged shortly after being plugged in and
just as a system sleep begins, the scanning thread may get frozen
before the khubd task. Khubd won't be able to freeze until the
disconnect processing is complete, and the disconnect processing can't
proceed until the scanning thread finishes, so the sleep transition
will fail.
The implementation in the 3.2 kernel suffers from an additional
problem. There the scanning thread calls set_freezable_with_signal(),
and the signals sent by the freezer will mess up the thread's I/O
delays, which are all interruptible.
The solution to both problems is the same: Replace the kernel thread
used for scanning with a delayed-work routine on the system freezable
work queue. Freezable work queues have the nice property that you can
cancel a work item even while the work queue is frozen, and no signals
are needed.
The 3.2 version of this patch solves the problem in Bugzilla #42730.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34ddc81a23 upstream.
After all the FPU state cleanups and finally finding the problem that
caused all our FPU save/restore problems, this re-introduces the
preloading of FPU state that was removed in commit b3b0870ef3 ("i387:
do not preload FPU state at task switch time").
However, instead of simply reverting the removal, this reimplements
preloading with several fixes, most notably
- properly abstracted as a true FPU state switch, rather than as
open-coded save and restore with various hacks.
In particular, implementing it as a proper FPU state switch allows us
to optimize the CR0.TS flag accesses: there is no reason to set the
TS bit only to then almost immediately clear it again. CR0 accesses
are quite slow and expensive, don't flip the bit back and forth for
no good reason.
- Make sure that the same model works for both x86-32 and x86-64, so
that there are no gratuitous differences between the two due to the
way they save and restore segment state differently due to
architectural differences that really don't matter to the FPU state.
- Avoid exposing the "preload" state to the context switch routines,
and in particular allow the concept of lazy state restore: if nothing
else has used the FPU in the meantime, and the process is still on
the same CPU, we can avoid restoring state from memory entirely, just
re-expose the state that is still in the FPU unit.
That optimized lazy restore isn't actually implemented here, but the
infrastructure is set up for it. Of course, older CPU's that use
'fnsave' to save the state cannot take advantage of this, since the
state saving also trashes the state.
In other words, there is now an actual _design_ to the FPU state saving,
rather than just random historical baggage. Hopefully it's easier to
follow as a result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f94edacf99 upstream.
This moves the bit that indicates whether a thread has ownership of the
FPU from the TS_USEDFPU bit in thread_info->status to a word of its own
(called 'has_fpu') in task_struct->thread.has_fpu.
This fixes two independent bugs at the same time:
- changing 'thread_info->status' from the scheduler causes nasty
problems for the other users of that variable, since it is defined to
be thread-synchronous (that's what the "TS_" part of the naming was
supposed to indicate).
So perfectly valid code could (and did) do
ti->status |= TS_RESTORE_SIGMASK;
and the compiler was free to do that as separate load, or and store
instructions. Which can cause problems with preemption, since a task
switch could happen in between, and change the TS_USEDFPU bit. The
change to TS_USEDFPU would be overwritten by the final store.
In practice, this seldom happened, though, because the 'status' field
was seldom used more than once, so gcc would generally tend to
generate code that used a read-modify-write instruction and thus
happened to avoid this problem - RMW instructions are naturally low
fat and preemption-safe.
- On x86-32, the current_thread_info() pointer would, during interrupts
and softirqs, point to a *copy* of the real thread_info, because
x86-32 uses %esp to calculate the thread_info address, and thus the
separate irq (and softirq) stacks would cause these kinds of odd
thread_info copy aliases.
This is normally not a problem, since interrupts aren't supposed to
look at thread information anyway (what thread is running at
interrupt time really isn't very well-defined), but it confused the
heck out of irq_fpu_usable() and the code that tried to squirrel
away the FPU state.
(It also caused untold confusion for us poor kernel developers).
It also turns out that using 'task_struct' is actually much more natural
for most of the call sites that care about the FPU state, since they
tend to work with the task struct for other reasons anyway (ie
scheduling). And the FPU data that we are going to save/restore is
found there too.
Thanks to Arjan Van De Ven <arjan@linux.intel.com> for pointing us to
the %esp issue.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-and-tested-by: Raphael Prevost <raphael@buro.asia>
Acked-and-tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4903062b54 upstream.
The AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception is
pending. In order to not leak FIP state from one process to another, we
need to do a floating point load after the fxsave of the old process,
and before the fxrstor of the new FPU state. That resets the state to
the (uninteresting) kernel load, rather than some potentially sensitive
user information.
We used to do this directly after the FPU state save, but that is
actually very inconvenient, since it
(a) corrupts what is potentially perfectly good FPU state that we might
want to lazy avoid restoring later and
(b) on x86-64 it resulted in a very annoying ordering constraint, where
"__unlazy_fpu()" in the task switch needs to be delayed until after
the DS segment has been reloaded just to get the new DS value.
Coupling it to the fxrstor instead of the fxsave automatically avoids
both of these issues, and also ensures that we only do it when actually
necessary (the FP state after a save may never actually get used). It's
simply a much more natural place for the leaked state cleanup.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3b0870ef3 upstream.
Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
is spending several days looking for a bug in the state save/restore
code. And the preload code has some rather subtle interactions with
both paravirtualization support and segment state restore, so it's not
nearly as simple as it should be.
Also, now that we no longer necessarily depend on a single bit (ie
TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
to do better. If we are really switching between two processes that
keep touching the FP state, save/restore is inevitable, but in the case
of having one process that does most of the FPU usage, we may actually
be able to do much better than the preloading.
In particular, we may be able to keep track of which CPU the process ran
on last, and also per CPU keep track of which process' FP state that CPU
has. For modern CPU's that don't destroy the FPU contents on save time,
that would allow us to do a lazy restore by just re-enabling the
existing FPU state - with no restore cost at all!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d59d7a9f5 upstream.
This creates three helper functions that do the TS_USEDFPU accesses, and
makes everybody that used to do it by hand use those helpers instead.
In addition, there's a couple of helper functions for the "change both
CR0.TS and TS_USEDFPU at the same time" case, and the places that do
that together have been changed to use those. That means that we have
fewer random places that open-code this situation.
The intent is partly to clarify the code without actually changing any
semantics yet (since we clearly still have some hard to reproduce bug in
this area), but also to make it much easier to use another approach
entirely to caching the CR0.TS bit for software accesses.
Right now we use a bit in the thread-info 'status' variable (this patch
does not change that), but we might want to make it a full field of its
own or even make it a per-cpu variable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6c66418dc upstream.
Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
it. By moving it into the callers, we always do the TS_USEDFPU next to
the CR0.TS accesses in the source code, and it's much easier to see how
the two go hand in hand.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15d8791cae upstream.
Commit 5b1cbac377 ("i387: make irq_fpu_usable() tests more robust")
added a sanity check to the #NM handler to verify that we never cause
the "Device Not Available" exception in kernel mode.
However, that check actually pinpointed a (fundamental) race where we do
cause that exception as part of the signal stack FPU state save/restore
code.
Because we use the floating point instructions themselves to save and
restore state directly from user mode, we cannot do that atomically with
testing the TS_USEDFPU bit: the user mode access itself may cause a page
fault, which causes a task switch, which saves and restores the FP/MMX
state from the kernel buffers.
This kind of "recursive" FP state save is fine per se, but it means that
when the signal stack save/restore gets restarted, it will now take the
'#NM' exception we originally tried to avoid. With preemption this can
happen even without the page fault - but because of the user access, we
cannot just disable preemption around the save/restore instruction.
There are various ways to solve this, including using the
"enable/disable_page_fault()" helpers to not allow page faults at all
during the sequence, and fall back to copying things by hand without the
use of the native FP state save/restore instructions.
However, the simplest thing to do is to just allow the #NM from kernel
space, but fix the race in setting and clearing CR0.TS that this all
exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
atomic wrt scheduling, so while the actual state save/restore can be
interrupted and restarted, the act of actually clearing/setting CR0.TS
and the TS_USEDFPU bit together must not.
Instead of just adding random "preempt_disable/enable()" calls to what
is already excessively ugly code, this introduces some helper functions
that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
the user state instead.
Those helper functions should probably eventually replace the other
ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
some more: the task switching functionality in particular needs to
expose the difference between the 'prev' and 'next' threads, while the
new helper functions intentionally were written to only work with
'current'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c38e234562 upstream.
The check for save_init_fpu() (introduced in commit 5b1cbac377: "i387:
make irq_fpu_usable() tests more robust") was the wrong way around, but
I hadn't noticed, because my "tests" were bogus: the FPU exceptions are
disabled by default, so even doing a divide by zero never actually
triggers this code at all unless you do extra work to enable them.
So if anybody did enable them, they'd get one spurious warning.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b1cbac377 upstream.
Some code - especially the crypto layer - wants to use the x86
FP/MMX/AVX register set in what may be interrupt (typically softirq)
context.
That *can* be ok, but the tests for when it was ok were somewhat
suspect. We cannot touch the thread-specific status bits either, so
we'd better check that we're not going to try to save FP state or
anything like that.
Now, it may be that the TS bit is always cleared *before* we set the
USEDFPU bit (and only set when we had already cleared the USEDFP
before), so the TS bit test may actually have been sufficient, but it
certainly was not obviously so.
So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks. Because it seems that somehow
AES-NI is corrupting user FP state. The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be98c2cdb1 upstream.
It was marked asmlinkage for some really old and stale legacy reasons.
Fix that and the equally stale comment.
Noticed when debugging the irq_fpu_usable() bugs.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a45aa3b305 upstream.
The superspeed device attached to a USB 3.0 hub(such as VIA's)
doesn't respond the address device command after resume. The
root cause is the superspeed hub will miss the Hub Depth value
that is used as an offset into the route string to locate the
bits it uses to determine the downstream port number after
reset, and all packets can't be routed to the device attached
to the superspeed hub.
Hub driver sends a Set Hub Depth request to the superspeed hub
except for USB 3.0 root hub when the hub is initialized and
doesn't send the request again after reset due to the resume
process. So moving the code that sends the Set Hub Depth request
to the superspeed hub from hub_configure() to hub_activate()
is to cover those situations include initialization and reset.
The patch should be backported to kernels as old as 2.6.39.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 340a3504fd upstream.
The xHCI 0.96 spec says that HS bulk and control endpoint NAK rate must
be encoded as an exponent of two number of microframes. The endpoint
descriptor has the NAK rate encoded in number of microframes. We were
just copying the value from the endpoint descriptor into the endpoint
context interval field, which was not correct. This lead to the VIA
host rejecting the add of a bulk OUT endpoint from any USB 2.0 mass
storage device.
The fix is to use the correct encoding. Refactor the code to convert
number of frames to an exponential number of microframes, and make sure
we convert the number of microframes in HS bulk and control endpoints to
an exponent.
This should be back ported to kernels as old as 2.6.31, that contain the
commit dfa49c4ad1 "USB: xhci - fix math
in xhci_get_endpoint_interval"
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Felipe Contreras <felipe.contreras@gmail.com>
Suggested-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3278a55a1a upstream.
The code to set the device removable bits in the USB 2.0 roothub
descriptor was accidentally looking at the USB 3.0 port registers
instead of the USB 2.0 registers. This can cause an oops if there are
more USB 2.0 registers than USB 3.0 registers.
This should be backported to kernels as old as 2.6.39, that contain the
commit 4bbb0ace9a "xhci: Return a USB 3.0
hub descriptor for USB3 roothub."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cab928ee1f upstream.
On some systems with an Intel Panther Point xHCI host controller, the
BIOS disables the xHCI PCI device during boot, and switches the xHCI
ports over to EHCI. This allows the BIOS to access USB devices without
having xHCI support.
The downside is that the xHCI BIOS handoff mechanism will fail because
memory mapped I/O is not enabled for the disabled PCI device.
Jesse Barnes says this is expected behavior. The PCI core will enable
BARs before quirks run, but it will leave it in an undefined state, and
it may not have memory mapped I/O enabled.
Make the generic USB quirk handler call pci_enable_device() to re-enable
MMIO, and call pci_disable_device() once the host-specific BIOS handoff
is finished. This will balance the ref counts in the PCI core. When
the PCI probe function is called, usb_hcd_pci_probe() will call
pci_enable_device() again.
This should be back ported to kernels as old as 2.6.31. That was the
first kernel with xHCI support, and no one has complained about BIOS
handoffs failing due to memory mapped I/O being disabled on other hosts
(EHCI, UHCI, or OHCI).
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9f5343e35 upstream.
Somehow we ended up with duplicate hub feature #defines in ch11.h.
Tatyana Brokhman first created the USB 3.0 hub feature macros in 2.6.38
with commit 0eadcc0920 "usb: USB3.0 ch11
definitions". In 2.6.39, I modified a patch from John Youn that added
similar macros in a different place in the same file, and committed
dbe79bbe9d "USB 3.0 Hub Changes".
Some of the #defines used different names for the same values. Others
used exactly the same names with the same values, like these gems:
#define USB_PORT_FEAT_BH_PORT_RESET 28
...
#define USB_PORT_FEAT_BH_PORT_RESET 28
According to my very geeky husband (who looked it up in the C99 spec),
it is allowed to have object-like macros with duplicate names as long as
the replacement list is exactly the same. However, he recalled that
some compilers will give warnings when they find duplicate macros. It's
probably best to remove the duplicates in the stable tree, so that the
code compiles for everyone.
The macros are now fixed to move the feature requests that are specific
to USB 3.0 hubs into a new section (out of the USB 2.0 hub feature
section), and use the most common macro name.
This patch should be backported to 2.6.39.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Tatyana Brokhman <tlinder@codeaurora.org>
Cc: John Youn <johnyoun@synopsys.com>
Cc: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7fd25702ba upstream.
This USB-serial cable with mini stereo jack enumerates as:
Bus 001 Device 004: ID 1a61:3410 Abbott Diabetes Care
It is a TI3410 inside.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e44fe5ec upstream.
1. Remove all old mass-storage ids's pid:
0x0026,0x0053,0x0098,0x0099,0x0149,0x0150,0x0160;
2. As the pid from 0x1401 to 0x1510 which have not surely assigned to
use for serial-port or mass-storage port,so i think it should be
removed now, and will re-add after it have assigned in future;
3. sort the pid to WCDMA and CDMA.
Signed-off-by: Rui li <li.rui27@zte.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9cc20b268a ]
commit f39925dbde (ipv4: Cache learned redirect information in
inetpeer.) introduced a regression in ICMP redirect handling.
It assumed ipv4_dst_check() would be called because all possible routes
were attached to the inetpeer we modify in ip_rt_redirect(), but thats
not true.
commit 7cc9150ebe (route: fix ICMP redirect validation) tried to fix
this but solution was not complete. (It fixed only one route)
So we must lookup existing routes (including different TOS values) and
call check_peer_redir() on them.
Reported-by: Ivan Zahariev <famzah@icdsoft.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7cc9150ebe ]
The commit f39925dbde
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination (see
Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0af2a0d057 ]
This commit ensures that lost_cnt_hint is correctly updated in
tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment
in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders
need their own adjustment.
This applies the spirit of 1e5289e121 -
except now that the sequence range passed into tcp_sacktag_one() is
correct we need only have a special case adjustment for FACK.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit daef52bab1 ]
Fix the newly-SACKed range to be the range of newly-shifted bytes.
Previously - since 832d11c5cd -
tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start
and end sequence numbers of the skb it passes in set to the range just
beyond the range that is newly-SACKed.
This commit also removes a special-case adjustment to lost_cnt_hint in
tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint
in tcp_sacktag_one() now properly handles this things now that the
correct start sequence number is passed in.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc9a672ee5 ]
This commit allows callers of tcp_sacktag_one() to pass in sequence
ranges that do not align with skb boundaries, as tcp_shifted_skb()
needs to do in an upcoming fix in this patch series.
In fact, now tcp_sacktag_one() does not need to depend on an input skb
at all, which makes its semantics and dependencies more clear.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e2446eaab5 ]
Binding RST packet outgoing interface to incoming interface
for tcp v4 when there is no socket associate with it.
when sk is not NULL, using sk->sk_bound_dev_if instead.
(suggested by Eric Dumazet).
This has few benefits:
1. tcp_v6_send_reset already did that.
2. This helps tcp connect with SO_BINDTODEVICE set. When
connection is lost, we still able to sending out RST using
same interface.
3. we are sending reply, it is most likely to be succeed
if iif is used
Signed-off-by: Shawn Lu <shawn.lu@ericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b530b1930b ]
Initially diagnosed on Ubuntu 11.04 with kernel 2.6.38.
velocity_close is not called during a suspend / resume cycle in this
driver and it has no business playing directly with power states.
Signed-off-by: David Lv <DavidLv@viatech.com.cn>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eb10192447 ]
Not now, but it looks you are correct. q->qdisc is NULL until another
additional qdisc is attached (beside tfifo). See 50612537e9.
The following patch should work.
From: Hagen Paul Pfeifer <hagen@jauu.net>
netem: catch NULL pointer by updating the real qdisc statistic
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70620c46ac ]
Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.
Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.
This patch adds the check back for the pneigh_lookup() scenario.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e6b45241c5 ]
Eric Dumazet found that commit 813b3b5db8
(ipv4: Use caller's on-stack flowi as-is in output
route lookups.) that comes in 3.0 added a regression.
The problem appears to be that resulting flowi4_oif is
used incorrectly as input parameter to some routing lookups.
The result is that when connecting to local port without
listener if the IP address that is used is not on a loopback
interface we incorrectly assign RTN_UNICAST to the output
route because no route is matched by oif=lo. The RST packet
can not be sent immediately by tcp_v4_send_reset because
it expects RTN_LOCAL.
So, change ip_route_connect and ip_route_newports to
update the flowi4 fields that are input parameters because
we do not want unnecessary binding to oif.
To make it clear what are the input parameters that
can be modified during lookup and to show which fields of
floiw4 are reused add a new function to update the flowi4
structure: flowi4_update_output.
Thanks to Yurij M. Plotnikov for providing a bug report including a
program to reproduce the problem.
Thanks to Eric Dumazet for tracking the problem down to
tcp_v4_send_reset and providing initial fix.
Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5dc7883f2a ]
This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save
nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved
the nexthop of SRR in ip_option->nexthop and update iph->daddr until
we get to ip_forward_options(), but we need to update it before
ip_rt_get_source(), otherwise we may get a wrong src.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ac8a48106b ]
We can not update iph->daddr in ip_options_rcv_srr(), It is too early.
When some exception ocurred later (eg. in ip_forward() when goto
sr_failed) we need the ip header be identical to the original one as
ICMP need it.
Add a field 'nexthop' in struct ip_options to save nexthop of LSRR
or SSRR option.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b12f62efb8 ]
When opt->srr_is_hit is set skb_rtable(skb) has been updated for
'nexthop' and iph->daddr should always equals to skb_rtable->rt_dst
holds, We need update iph->daddr either.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3013dc0cce ]
Jean Delvare reported bonding on top of 3c59x adapters was not detecting
network cable removal fast enough.
3c59x indeed uses a 60 seconds timer to check link status if carrier is
on, and 5 seconds if carrier is off.
This patch reduces timer period to 5 seconds if device is a bonding
slave.
Reported-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 237114384a ]
VETH_INFO_PEER carries struct ifinfomsg plus optional IFLA
attributes. A minimal size of sizeof(struct ifinfomsg) must be
enforced or we may risk accessing that struct beyond the limits
of the netlink message.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ca3b72c5d ]
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.
He provided a patch faking a zeroed header to let GRO aggregates frames.
Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.
__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 936d7de3d7 ]
Commit a0417fa3a1 ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method. Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16bda13d90 ]
Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.
This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8e43a905dd upstream.
Bootup with lockdep enabled has been broken on v7 since b46c0f7465
("ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR").
This is because v7_setup (which is called very early during boot) calls
v7_flush_dcache_all, and the save_and_disable_irqs added by that patch
ends up attempting to call into lockdep C code (trace_hardirqs_off())
when we are in no position to execute it (no stack, MMU off).
Fix this by using a notrace variant of save_and_disable_irqs. The code
already uses the notrace variant of restore_irqs.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b46c0f7465 upstream.
armv7's flush_cache_all() flushes caches via set/way. To
determine the cache attributes (line size, number of sets,
etc.) the assembly first writes the CSSELR register to select a
cache level and then reads the CCSIDR register. The CSSELR register
is banked per-cpu and is used to determine which cache level CCSIDR
reads. If the task is migrated between when the CSSELR is written and
the CCSIDR is read the CCSIDR value may be for an unexpected cache
level (for example L1 instead of L2) and incorrect cache flushing
could occur.
Disable interrupts across the write and read so that the correct
cache attributes are read and used for the cache flushing
routine. We disable interrupts instead of disabling preemption
because the critical section is only 3 instructions and we want
to call v7_dcache_flush_all from __v7_setup which doesn't have a
full kernel stack with a struct thread_info.
This fixes a problem we see in scm_call() when flush_cache_all()
is called from preemptible context and sometimes the L2 cache is
not properly flushed out.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9f9a03150 upstream.
To ensure that we don't just reuse the bad delegation when we attempt to
recover the nfs4_state that received the bad stateid error.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d6144de8b upstream.
If the read or write buffer size associated with the command sent
through the mmc_blk_ioctl is zero, do not prepare data buffer.
This enables a ioctl(2) call to for instance send a MMC_SWITCH to set
a byte in the ext_csd.
Signed-off-by: Johan Rudholm <johan.rudholm@stericsson.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Note that since the patch isn't applicable (and unnecessary) to
3.3-rc, there is no corresponding upstream fix.]
The cx5051 parser calls snd_hda_input_jack_add() in the init callback
to create and initialize the jack detection instances. Since the init
callback is called at each time when the device gets woken up after
suspend or power-saving mode, the duplicated instances are accumulated
at each call. This ends up with the kernel warnings with the too
large array size.
The fix is simply to move the calls of snd_hda_input_jack_add() into
the parser section instead of the init callback.
The fix is needed only up to 3.2 kernel, since the HD-audio jack layer
was redesigned in the 3.3 kernel.
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 545d680938 upstream.
After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.
One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.
https://launchpad.net/bugs/926292
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b57e6b560f upstream.
read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init' after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script
while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done
BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
Call Trace:
[<8137b9df>] rwlock_bug+0x3d/0x47
[<81179830>] do_raw_read_lock+0x19/0x29
[<8137f063>] _raw_read_lock+0xd/0xf
[<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211]
[<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
[<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211]
[<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
[<f9076d97>] ieee80211_open+0x48/0x4c [mac80211]
[<812dbed8>] __dev_open+0x82/0xab
[<812dc0c9>] __dev_change_flags+0x9c/0x113
[<812dc1ae>] dev_change_flags+0x18/0x44
[<8132144f>] devinet_ioctl+0x243/0x51a
[<81321ba9>] inet_ioctl+0x93/0xac
[<812cc951>] sock_ioctl+0x1c6/0x1ea
[<812cc78b>] ? might_fault+0x20/0x20
[<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2
[<810a6ebb>] ? fget_light+0x2f/0x70
[<812ce549>] ? sys_recvmsg+0x3e/0x48
[<810b1f35>] sys_ioctl+0x46/0x69
[<8137fa77>] sysenter_do_call+0x12/0x2
Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71f6bd4a23 upstream.
Fixes PCI device detection on IBM xSeries IBM 3850 M2 / x3950 M2
when using ACPI resources (_CRS).
This is default, a manual workaround (without this patch)
would be pci=nocrs boot param.
V2: Add dev_warn if the workaround is hit. This should reveal
how common such setups are (via google) and point to possible
problems if things are still not working as expected.
-> Suggested by Jan Beulich.
Tested-by: garyhade@us.ibm.com
Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a45a9407c upstream.
perf on POWER stopped working after commit e050e3f0a7 (perf: Fix
broken interrupt rate throttling). That patch exposed a bug in
the POWER perf_events code.
Since the PMCs count upwards and take an exception when the top bit
is set, we want to write 0x80000000 - left in power_pmu_start. We were
instead programming in left which effectively disables the counter
until we eventually hit 0x80000000. This could take seconds or longer.
With the patch applied I get the expected number of samples:
SAMPLE events: 9948
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 363434b5dc upstream.
An error while creating sysfs attribute files in the driver's probe function
results in an error abort, but already created files are not removed. This patch
fixes the problem.
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Dirk Eibach <eibach@gdsys.de>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f2da1ac0b upstream.
Initialize PPR register for both channels, and set correct PPR register bits.
Also remove unnecessary variable initializations.
Signed-off-by: Chris D Schimp <silverchris@gmail.com>
[guenter.roeck@ericsson.com: Merged two patches into one]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63d97a36e upstream.
RPM calculation from tachometer value does not depend on PPR.
Also, do not report negative RPM values.
Signed-off-by: Chris D Schimp <silverchris@gmail.com>
[guenter.roeck@ericsson.com: do not report negative RPM values]
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2ea0f5f04 upstream.
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.
The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.
Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73736e0387 upstream.
Zhihua Che reported a possible memleak in slub allocator on
CONFIG_PREEMPT=y builds.
It is possible current thread migrates right before disabling irqs in
__slab_alloc(). We must check again c->freelist, and perform a normal
allocation instead of scratching c->freelist.
Many thanks to Zhihua Che for spotting this bug, introduced in 2.6.39
V2: Its also possible an IRQ freed one (or several) object(s) and
populated c->freelist, so its not a CONFIG_PREEMPT only problem.
Reported-by: Zhihua Che <zhihua.che@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a92d687c8 upstream.
Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice. As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).
This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.
While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.
Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58d7d18b52 upstream.
The previous patch used the modulus operator over a power of 2
unnecessarily which may produce suboptimal binary code. This
patch changes changes them to binary ands instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09e87e5c4f upstream.
In order to enable temperature mode aka automatic mode for the F75373 and
F75375 chips, the two FANx_MODE bits in the fan configuration register
need be set to 01, not 10.
Signed-off-by: Nikolaus Schulz <mail@microschulz.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 977b7e3a52 upstream.
When a SD card is hot removed without umount, del_gendisk() will call
bdi_unregister() without destroying/freeing it. This leaves the bdi in
the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state.
When sync(2) gets the bdi before bdi_unregister() and calls
bdi_queue_work() after the unregister, trace_writeback_queue will be
dereferencing the NULL bdi->dev. Fix it with a simple test for NULL.
LKML-reference: http://lkml.org/lkml/2012/1/18/346
Reported-by: Rabin Vincent <rabin@rab.in>
Tested-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07ae2dfcf4 upstream.
The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)
Signed-off-by: Eliad Peller <eliad@wizery.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3310225dfc upstream.
PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs
in the below lines of bdi_dirty_limit():
bdi_dirty *= numerator;
do_div(bdi_dirty, denominator);
1) divide error: do_div() only uses the lower 32 bit of the denominator,
which may trimmed to be 0 when PROP_MAX_SHIFT > 32.
2) overflow: (bdi_dirty * numerator) could easily overflow if numerator
used up to 48 bits, leaving only 16 bits to bdi_dirty
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4a03fc7ef upstream.
This patch fixes an issue where perf report shows nan% for certain
perf.data files. The below is from a report for a do_fork probe:
-nan% sshd [kernel.kallsyms] [k] do_fork
-nan% packagekitd [kernel.kallsyms] [k] do_fork
-nan% dbus-daemon [kernel.kallsyms] [k] do_fork
-nan% bash [kernel.kallsyms] [k] do_fork
A git bisect shows commit f3bda2c as the cause. However, looking back
through the git history, I saw commit 640c03c which seems to have
removed the required initialization for perf_sample->period. The problem
only started showing after commit f3bda2c. The below patch re-introduces
the initialization and it fixes the problem for me.
With the below patch, for the same perf.data:
73.08% bash [kernel.kallsyms] [k] do_fork
8.97% 11-dhclient [kernel.kallsyms] [k] do_fork
6.41% sshd [kernel.kallsyms] [k] do_fork
3.85% 20-chrony [kernel.kallsyms] [k] do_fork
2.56% sendmail [kernel.kallsyms] [k] do_fork
This patch applies over current linux-tip commit 9949284.
Problem introduced in:
$ git describe 640c03c
v2.6.37-rc3-83-g640c03c
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120203170113.5190.25558.stgit@localhost6.localdomain6
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3aaeb38c4, along
with dependent backports of commits:
69cce1d1409de79c127c218fa90f07580da35a31f7e57044eee049f28883 ]
Gergely Kalman reported crashes in check_peer_redir().
It appears commit f39925dbde (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.
Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.
Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.
As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)
Many thanks to Gergely for providing a pretty report pointing to the
bug.
Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8eb28480e upstream.
The driver uses the pstate number from the status register as index in
its table of ACPI pstates (powernow_table). This is wrong as this is
not a 1-to-1 mapping.
For example we can have _PSS information to just utilize Pstate 0 and
Pstate 4, ie.
powernow-k8: Core Performance Boosting: on.
powernow-k8: 0 : pstate 0 (2200 MHz)
powernow-k8: 1 : pstate 4 (1400 MHz)
In this example the driver's powernow_table has just 2 entries. Using
the pstate number (4) as index into this table is just plain wrong.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 201bf0f129 upstream.
Due to CPB we can't directly map SW Pstates to Pstate MSRs. Get rid of
the paranoia check. (assuming that the ACPI Pstate information is
correct.)
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1608ea5f4b upstream.
As ZTE have and will use more pid for new products this year,
so we need to add some new zte 3g-dongle's pid on option.c ,
and delete one pid 0x0154 because it use for mass-storage port.
Signed-off-by: Rui li <li.rui27@zte.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e4436a7c17 upstream.
The Netlogic XLP SoC's on-chip USB controller appears as a PCI
USB device, but does not need the EHCI/OHCI handoff done in
usb/host/pci-quirks.c.
The pci-quirks.c is enabled for all vendors and devices, and is
enabled if USB and PCI are configured.
If we do not skip the qurik handling on XLP, the readb() call in
ehci_bios_handoff() will cause a crash since byte access is not
supported for EHCI registers in XLP.
Signed-off-by: Jayachandran C <jayachandranc@netlogicmicro.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 683da59d7b upstream.
ab943a2e12 (USB: gadget: gadget zero uses new suspend/resume hooks)
introduced a copy-paste error where f_loopback.c writes to a variable
declared in f_sourcesink.c. This prevents one from creating gadgets
that only have a loopback function.
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 635032cb39 upstream.
Programming an image was broken, because odev->buf_offs was not advanced
for val == 0 in append_values(). This regression was introduced in:
commit 1ff12a4aa3
Author: Kevin A. Granade <kevin.granade@gmail.com>
Date: Sat Sep 5 01:03:39 2009 -0500
Staging: asus_oled: Cleaned up checkpatch issues.
Fix the image processing by special-casing val == 0.
I have tested this change on an Asus G50V laptop only.
Cc: Jakub Schmidtke <sjakub@gmail.com>
Cc: Kevin A. Granade <kevin.granade@gmail.com>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fbc890987 upstream.
According to SPC-4, the sense key for commands that are failed with
INVALID FIELD IN PARAMETER LIST and INVALID FIELD IN CDB should be
ILLEGAL REQUEST (5h) rather than ABORTED COMMAND (Bh). Without this
patch, a tcm_loop LUN incorrectly gives:
# sg_raw -r 1 -v /dev/sda 3 1 0 0 ff 0
Sense Information:
Fixed format, current; Sense key: Aborted Command
Additional sense: Invalid field in cdb
Raw sense data (in hex):
70 00 0b 00 00 00 00 0a 00 00 00 00 24 00 00 00
00 00
While a real SCSI disk gives:
Sense Information:
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in cdb
Raw sense data (in hex):
70 00 05 00 00 00 00 18 00 00 00 00 24 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
with the main point being that the real disk gives a sense key of
ILLEGAL REQUEST (5h).
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6816966a84 upstream.
Initiators that aren't the active reservation holder should be able to
do a PERSISTENT RESERVE IN command in all cases, so add it to the list
of allowed CDBs in core_scsi3_pr_seq_non_holder().
Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e08e34e37 upstream.
The comments quote the right parts of the spec:
* d) Establish a unit attention condition for the
* initiator port associated with every I_T nexus
* that lost its registration other than the I_T
* nexus on which the PERSISTENT RESERVE OUT command
* was received, with the additional sense code set
* to REGISTRATIONS PREEMPTED.
and
* e) Establish a unit attention condition for the initiator
* port associated with every I_T nexus that lost its
* persistent reservation and/or registration, with the
* additional sense code set to REGISTRATIONS PREEMPTED;
but the actual code accidentally uses ASCQ_2AH_RESERVATIONS_PREEMPTED
instead of ASCQ_2AH_REGISTRATIONS_PREEMPTED. Fix this.
Signed-off-by: Marco Sanvido <marco@purestorage.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9980cdcf2 upstream.
Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
and so triggers some BUGs in Transparent HugePage codepaths.
asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc9086004b upstream.
When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone. Migration
avoids entering a new zone by never going beyond the free scanned.
Unfortunately, in very rare cases nodes can overlap. When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810f795c>] free_pcppages_bulk+0xcc/0x450
PGD 1dda554067 PUD 1e1cb58067 PMD 0
Oops: 0000 [#1] SMP
CPU 37
Pid: 17088, comm: memcg_process_s Tainted: G X
RIP: free_pcppages_bulk+0xcc/0x450
Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
Call Trace:
free_hot_cold_page+0x17e/0x1f0
__pagevec_free+0x90/0xb0
release_pages+0x22a/0x260
pagevec_lru_move_fn+0xf3/0x110
putback_lru_page+0x66/0xe0
unmap_and_move+0x156/0x180
migrate_pages+0x9e/0x1b0
compact_zone+0x1f3/0x2f0
compact_zone_order+0xa2/0xe0
try_to_compact_pages+0xdf/0x110
__alloc_pages_direct_compact+0xee/0x1c0
__alloc_pages_slowpath+0x370/0x830
__alloc_pages_nodemask+0x1b1/0x1c0
alloc_pages_vma+0x9b/0x160
do_huge_pmd_anonymous_page+0x160/0x270
do_page_fault+0x207/0x4c0
page_fault+0x25/0x30
The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering. The real problem was because the PFN
layout looks like this
Zone PFN ranges:
DMA 0x00000010 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x01e80000
Movable zone start PFN for each node
early_node_map[14] active PFN ranges
0: 0x00000010 -> 0x0000009b
0: 0x00000100 -> 0x0007a1ec
0: 0x0007a354 -> 0x0007a379
0: 0x0007f7ff -> 0x0007f800
0: 0x00100000 -> 0x00680000
1: 0x00680000 -> 0x00e80000
0: 0x00e80000 -> 0x01080000
1: 0x01080000 -> 0x01280000
0: 0x01280000 -> 0x01480000
1: 0x01480000 -> 0x01680000
0: 0x01680000 -> 0x01880000
1: 0x01880000 -> 0x01a80000
0: 0x01a80000 -> 0x01c80000
1: 0x01c80000 -> 0x01e80000
The fix is straight-forward. isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.
This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 025e4ab3db upstream.
This fixes a memory-corrupting bug: not only does it cause the warning,
but as a result of dropping the refcount to zero, it causes the
pcmcia_socket0 device structure to be freed while it still has
references, causing slab caches corruption. A fatal oops quickly
follows this warning - often even just a 'dmesg' following the warning
causes the kernel to oops.
While testing suspend/resume on an ARM device with PCMCIA support, and a
CF card inserted, I found that after five suspend and resumes, the
kernel would complain, and shortly die after with slab corruption.
WARNING: at include/linux/kref.h:41 kobject_get+0x28/0x50()
As the message doesn't give a clue about which kobject, and the built-in
debugging in drivers/base/power/main.c happens too late, this was added
right before each get_device():
printk("%s: %p [%s] %u\n", __func__, dev, kobject_name(&dev->kobj), atomic_read(&dev->kobj.kref.refcount));
and on the 3rd s2ram cycle, the following behaviour observed:
On the 3rd suspend/resume cycle:
dpm_prepare: c1a0d998 [pcmcia_socket0] 3
dpm_suspend: c1a0d998 [pcmcia_socket0] 3
dpm_suspend_noirq: c1a0d998 [pcmcia_socket0] 3
dpm_resume_noirq: c1a0d998 [pcmcia_socket0] 3
dpm_resume: c1a0d998 [pcmcia_socket0] 3
dpm_complete: c1a0d998 [pcmcia_socket0] 2
4th:
dpm_prepare: c1a0d998 [pcmcia_socket0] 2
dpm_suspend: c1a0d998 [pcmcia_socket0] 2
dpm_suspend_noirq: c1a0d998 [pcmcia_socket0] 2
dpm_resume_noirq: c1a0d998 [pcmcia_socket0] 2
dpm_resume: c1a0d998 [pcmcia_socket0] 2
dpm_complete: c1a0d998 [pcmcia_socket0] 1
5th:
dpm_prepare: c1a0d998 [pcmcia_socket0] 1
dpm_suspend: c1a0d998 [pcmcia_socket0] 1
dpm_suspend_noirq: c1a0d998 [pcmcia_socket0] 1
dpm_resume_noirq: c1a0d998 [pcmcia_socket0] 1
dpm_resume: c1a0d998 [pcmcia_socket0] 1
dpm_complete: c1a0d998 [pcmcia_socket0] 0
------------[ cut here ]------------
WARNING: at include/linux/kref.h:41 kobject_get+0x28/0x50()
Modules linked in: ucb1x00_core
Backtrace:
[<c0212090>] (dump_backtrace+0x0/0x110) from [<c04799dc>] (dump_stack+0x18/0x1c)
[<c04799c4>] (dump_stack+0x0/0x1c) from [<c021cba0>] (warn_slowpath_common+0x50/0x68)
[<c021cb50>] (warn_slowpath_common+0x0/0x68) from [<c021cbdc>] (warn_slowpath_null+0x24/0x28)
[<c021cbb8>] (warn_slowpath_null+0x0/0x28) from [<c0335374>] (kobject_get+0x28/0x50)
[<c033534c>] (kobject_get+0x0/0x50) from [<c03804f4>] (get_device+0x1c/0x24)
[<c0388c90>] (dpm_complete+0x0/0x1a0) from [<c0389cc0>] (dpm_resume_end+0x1c/0x20)
...
Looking at commit 7b24e79882 ("pcmcia: split up central event handler"),
the following change was made to cs.c:
return 0;
}
#endif
-
- send_event(skt, CS_EVENT_PM_RESUME, CS_EVENT_PRI_LOW);
+ if (!(skt->state & SOCKET_CARDBUS) && (skt->callback))
+ skt->callback->early_resume(skt);
return 0;
}
And the corresponding change in ds.c is from:
-static int ds_event(struct pcmcia_socket *skt, event_t event, int priority)
-{
- struct pcmcia_socket *s = pcmcia_get_socket(skt);
...
- switch (event) {
...
- case CS_EVENT_PM_RESUME:
- if (verify_cis_cache(skt) != 0) {
- dev_dbg(&skt->dev, "cis mismatch - different card\n");
- /* first, remove the card */
- ds_event(skt, CS_EVENT_CARD_REMOVAL, CS_EVENT_PRI_HIGH);
- mutex_lock(&s->ops_mutex);
- destroy_cis_cache(skt);
- kfree(skt->fake_cis);
- skt->fake_cis = NULL;
- s->functions = 0;
- mutex_unlock(&s->ops_mutex);
- /* now, add the new card */
- ds_event(skt, CS_EVENT_CARD_INSERTION,
- CS_EVENT_PRI_LOW);
- }
- break;
...
- }
- pcmcia_put_socket(s);
- return 0;
-} /* ds_event */
to:
+static int pcmcia_bus_early_resume(struct pcmcia_socket *skt)
+{
+ if (!verify_cis_cache(skt)) {
+ pcmcia_put_socket(skt);
+ return 0;
+ }
+ dev_dbg(&skt->dev, "cis mismatch - different card\n");
+ /* first, remove the card */
+ pcmcia_bus_remove(skt);
+ mutex_lock(&skt->ops_mutex);
+ destroy_cis_cache(skt);
+ kfree(skt->fake_cis);
+ skt->fake_cis = NULL;
+ skt->functions = 0;
+ mutex_unlock(&skt->ops_mutex);
+ /* now, add the new card */
+ pcmcia_bus_add(skt);
+ return 0;
+}
As can be seen, the original function called pcmcia_get_socket() and
pcmcia_put_socket() around the guts, whereas the replacement code
calls pcmcia_put_socket() only in one path. This creates an imbalance
in the refcounting.
Testing with pcmcia_put_socket() put removed shows that the bug is gone:
dpm_suspend: c1a10998 [pcmcia_socket0] 5
dpm_suspend_noirq: c1a10998 [pcmcia_socket0] 5
dpm_resume_noirq: c1a10998 [pcmcia_socket0] 5
dpm_resume: c1a10998 [pcmcia_socket0] 5
dpm_complete: c1a10998 [pcmcia_socket0] 5
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 585c0fd821 upstream.
NCT6776F can select fan input pins for fans 3 to 5 with a secondary set of
chip register bits. Check that second set of bits in addition to the first set
to detect if fans 3..5 are monitored.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df754e6af2 upstream.
It's unlikely that TAINT_FIRMWARE_WORKAROUND causes false
lockdep messages, so do not disable lockdep in that case.
We still want to keep lockdep disabled in the
TAINT_OOT_MODULE case:
- bin-only modules can cause various instabilities in
their and in unrelated kernel code
- they are impossible to debug for kernel developers
- they also typically do not have the copyright license
permission to link to the GPL-ed lockdep code.
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-xopopjjens57r0i13qnyh2yo@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 684a3ff7e6 upstream.
ecryptfs_write() can enter an infinite loop when truncating a file to a
size larger than 4G. This only happens on architectures where size_t is
represented by 32 bits.
This was caused by a size_t overflow due to it incorrectly being used to
store the result of a calculation which uses potentially large values of
type loff_t.
[tyhicks@canonical.com: rewrite subject and commit message]
Signed-off-by: Li Wang <liwang@nudt.edu.cn>
Signed-off-by: Yunchuan Wen <wenyunchuan@kylinos.com.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 097354eb14 upstream.
Otherwise hangcheck spuriously fires when running blitter/bsd-only
workloads.
Contrary to a similar patch by Ben Widawsky this does not check
INSTDONE of the other rings. Chris Wilson implied that in a failure to
detect a hang, most likely because INSTDONE was fluctuating. Thus only
check ACTHD, which as far as I know is rather reliable. Also, blitter
and bsd rings can't launch complex tasks from a single instruction
(like 3D_PRIM on the render with complex or even infinite shaders).
This fixes spurious gpu hang detection when running
tests/gem_hangcheck_forcewake on snb/ivb.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 832afda6a7 upstream.
On DP monitor hot remove, clear DP_AUDIO_OUTPUT_ENABLE accordingly,
so that the audio driver will receive hot plug events and take action
to refresh its device state and ELD contents.
Note that the DP_AUDIO_OUTPUT_ENABLE bit may be enabled or disabled
only when the link training is complete and set to "Normal".
Tested OK for both hot plug/remove and DPMS on/off.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2deed76118 upstream.
On HDMI monitor hot remove, clear SDVO_AUDIO_ENABLE accordingly, so that
the audio driver will receive hot plug events and take action to refresh
its device state and ELD contents.
The cleared SDVO_AUDIO_ENABLE bit needs to be restored to prevent losing
HDMI audio after DPMS on.
CC: Wang Zhenyu <zhenyu.z.wang@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 853a0c25ba upstream.
When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0e8ed858e upstream.
Commit 873bd4c (ASoC: Don't set invalid name string to snd_card->driver
field) broke generation of a driver name for all ASoC cards relying on the
automatic generation of one. Fix this by using the old default with spaces
replaced by underscores.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb297a3e43 upstream.
This issue happens under the following conditions:
1. preemption is off
2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
3. RT scheduling class
4. SMP system
Sequence is as follows:
1.suppose current task is A. start schedule()
2.task A is enqueued pushable task at the entry of schedule()
__schedule
prev = rq->curr;
...
put_prev_task
put_prev_task_rt
enqueue_pushable_task
4.pick the task B as next task.
next = pick_next_task(rq);
3.rq->curr set to task B and context_switch is started.
rq->curr = next;
4.At the entry of context_swtich, release this cpu's rq->lock.
context_switch
prepare_task_switch
prepare_lock_switch
raw_spin_unlock_irq(&rq->lock);
5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
6.try_to_wake_up() which called by ISR acquires rq->lock
try_to_wake_up
ttwu_remote
rq = __task_rq_lock(p)
ttwu_do_wakeup(rq, p, wake_flags);
task_woken_rt
7.push_rt_task picks the task A which is enqueued before.
task_woken_rt
push_rt_tasks(rq)
next_task = pick_next_pushable_task(rq)
8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
lowest_rq can be the remote rq.
(But,If preemption is on, double_lock_balance always return 1 and it
does't happen.)
push_rt_task
find_lock_lowest_rq
if (double_lock_balance(rq, lowest_rq))..
9.find_lock_lowest_rq return the available rq. task A is migrated to
the remote cpu/rq.
push_rt_task
...
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
10. But, task A is on irq context at this cpu.
So, task A is scheduled by two cpus at the same time until restore from IRQ.
Task A's stack is corrupted.
To fix it, don't migrate an RT task if it's still running.
Signed-off-by: Chanho Min <chanho.min@lge.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b61925061 upstream.
The value of this register is transferred to the V_COUNTER register at the
beginning of vertical blank. V_COUNTER is the reference for VLINE waits and
goes from VIEWPORT_Y_START to VIEWPORT_Y_START+VIEWPORT_HEIGHT during scanout,
so if VIEWPORT_Y_START is not 0, V_COUNTER actually went backwards at the
beginning of vertical blank, and VLINE waits excluding the whole scanout area
could never finish (possibly only if VIEWPORT_Y_START is larger than the length
of vertical blank in scanlines). Setting DESKTOP_HEIGHT to the framebuffer
height should prevent this for any kind of VLINE wait.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=45329 .
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bf380bc70 upstream.
When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned. Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned. This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash. This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.
PID: 9902 TASK: d47aecd0 CPU: 0 COMMAND: "memcg_process_s"
#0 [d72d3ad0] crash_kexec at c028cfdb
#1 [d72d3b24] oops_end at c05c5322
#2 [d72d3b38] __bad_area_nosemaphore at c0227e60
#3 [d72d3bec] bad_area at c0227fb6
#4 [d72d3c00] do_page_fault at c05c72ec
#5 [d72d3c80] error_code (via page_fault) at c05c47a4
EAX: 00000000 EBX: 000c0000 ECX: 00000001 EDX: 00000807 EBP: 000c0000
DS: 007b ESI: 00000001 ES: 007b EDI: f3000a80 GS: 6f50
CS: 0060 EIP: c030b15a ERR: ffffffff EFLAGS: 00010002
#6 [d72d3cb4] isolate_migratepages at c030b15a
#7 [d72d3d14] zone_watermark_ok at c02d26cb
#8 [d72d3d2c] compact_zone at c030b8de#9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
EAX: b71ff000 EBX: 00000001 ECX: 00001600 EDX: 00000431
DS: 007b ESI: 08048950 ES: 007b EDI: bfaa3788
SS: 007b ESP: bfaa36e0 EBP: bfaa3828 GS: 6f50
CS: 0073 EIP: 080487c8 ERR: ffffffff EFLAGS: 00010202
It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.
BUG: unable to handle kernel paging request at 01c00008
IP: [<c0522399>] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000
It is expected that it also affects 3.2.x and current mainline.
The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned. Lets say we have a case
like this
H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc->migrate_pfn
f = cc->free_pfn
o = memory hole
H------|------H------|----m-Hoooooo|ooooooH-f----|------H
The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole. When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole. It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.
This patch ensures that isolate_migratepages calls pfn_valid when
necessary.
Reported-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Tested-by: Herbert van den Bergh <herbert.van.den.bergh@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99f02ef1f1 upstream.
Fix a race condition that shows in conjunction with xip_file_fault() when
two threads of the same user process fault on the same memory page.
In this case, the race winner will install the page table entry and the
unlucky loser will cause an oops: xip_file_fault calls vm_insert_pfn (via
vm_insert_mixed) which drops out at this check:
retval = -EBUSY;
if (!pte_none(*pte))
goto out_unlock;
The resulting -EBUSY return value will trigger a BUG_ON() in
xip_file_fault.
This fix simply considers the fault as fixed in this case, because the
race winner has successfully installed the pte.
[akpm@linux-foundation.org: use conventional (and consistent) comment layout]
Reported-by: David Sadler <dsadler@us.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Reported-by: Louis Alex Eisner <leisner@cs.ucsd.edu>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bda3a47c88 upstream.
commit 463894705e deleted redundant
chan_id and chancnt initialization in dma drivers as this is done
in dma_async_device_register().
However, atc_enable_irq() relied on chan_id set before registering
the device, what left only channel 0 functional for this driver.
This patch introduces atc_enable/disable_chan_irq() as a variant
of atc_enable/disable_irq() with the channel as explicit argument.
Signed-off-by: Nikolaus Voss <n.voss@weinmann.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55ca6140e9 upstream.
In function pre_handler_kretprobe(), the allocated kretprobe_instance
object will get leaked if the entry_handler callback returns non-zero.
This may cause all the preallocated kretprobe_instance objects exhausted.
This issue can be reproduced by changing
samples/kprobes/kretprobe_example.c to probe "mutex_unlock". And the fix
is straightforward: just put the allocated kretprobe_instance object back
onto the free_instances list.
[akpm@linux-foundation.org: use raw_spin_lock/unlock]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6f7feae6d upstream.
In the current code, vendor-specific MADs (e.g with the FDR-10
attribute) are silently dropped by the driver, resulting in timeouts
at the sending side and inability to query/configure the relevant
feature. However, the ConnectX firmware is able to handle such MADs.
For unsupported attributes, the firmware returns a GET_RESPONSE MAD
containing an error status.
For example, for a FDR-10 node with LID 11:
# ibstat mlx4_0 1
CA: 'mlx4_0'
Port 1:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 11
LMC: 0
SM lid: 24
Capability mask: 0x02514868
Port GUID: 0x0002c903002e65d1
Link layer: InfiniBand
Extended Port Query (EPI) vendor mad timeouts before the patch:
# smpquery MEPI 11 -d
ibwarn: [4196] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
ibwarn: [4196] _do_madrpc: retry 1 (timeout 1000 ms)
ibwarn: [4196] _do_madrpc: retry 2 (timeout 1000 ms)
ibwarn: [4196] _do_madrpc: timeout after 3 retries, 3000 ms
ibwarn: [4196] mad_rpc: _do_madrpc failed; dport (Lid 11)
smpquery: iberror: [pid 4196] main: failed: operation EPI: ext port info query failed
EPI query works OK with the patch:
# smpquery MEPI 11 -d
ibwarn: [6548] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
ibwarn: [6548] mad_rpc: data offs 64 sz 64
mad data
0000 0000 0000 0001 0000 0001 0000 0001
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
# Ext Port info: Lid 11 port 0
StateChangeEnable:...............0x00
LinkSpeedSupported:..............0x01
LinkSpeedEnabled:................0x01
LinkSpeedActive:.................0x01
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Ira Weiny <weiny2@llnl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 320cfa6ce0 upstream.
The PCIe device
FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd FireWire Host Controller
[1180:e832] (prog-if 10 [OHCI])
is unable to access attached FireWire devices when MSI is enabled but
works if MSI is disabled.
http://www.mail-archive.com/alsa-user@lists.sourceforge.net/msg28251.html
Hence add the "disable MSI" quirks flag for this device, or in fact for
safety and simplicity for all current (R5U230, R5U231, R5U240) and
future Ricoh PCIe 1394 controllers.
Reported-by: Stefan Thomas <kontrapunktstefan@googlemail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1bb399ad0 upstream.
The Audigy's SB1394 controller is actually from Texas Instruments
and has the same bus reset packet generation bug, so it needs the
same quirk entry.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d08f2c713 upstream.
Once /proc/pid/mem is opened, the memory can't be released until
mem_release() even if its owner exits.
Change mem_open() to do atomic_inc(mm_count) + mmput(), this only
pins mm_struct. Change mem_rw() to do atomic_inc_not_zero(mm_count)
before access_remote_vm(), this verifies that this mm is still alive.
I am not sure what should mem_rw() return if atomic_inc_not_zero()
fails. With this patch it returns zero to match the "mm == NULL" case,
may be it should return -EINVAL like it did before e268337d.
Perhaps it makes sense to add the additional fatal_signal_pending()
check into the main loop, to ensure we do not hold this memory if
the target task was oom-killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 572d34b946 upstream.
No functional changes, cleanup and preparation.
mem_read() and mem_write() are very similar. Move this code into the
new common helper, mem_rw(), which takes the additional "int write"
argument.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbcb834605 upstream.
KDFONTOP(GET) currently fails with EIO when being run in a 32bit userland
with a 64bit kernel if the font width is not 8.
This is because of the setting of the KD_FONT_FLAG_OLD flag, which makes
con_font_get return EIO in such case.
This flag should *not* be set for KDFONTOP, since it's actually the whole
point of this flag (see comment in con_font_set for instance).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Arthur Taylor <art@ified.ca>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ef5d844cc upstream.
following statement can only change device size from 8-bit(0) to 16-bit(1),
but not vice versa:
regval |= GPMC_CONFIG1_DEVICESIZE(wval);
so as this field has 1 reserved bit, that could be used in future,
just clear both bits and then OR with the desired value
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8130b9d7b9 upstream.
If we are context switched whilst copying into a thread's
vfp_hard_struct then the partial copy may be corrupted by the VFP
context switching code (see "ARM: vfp: flush thread hwstate before
restoring context from sigframe").
This patch updates the ptrace VFP set code so that the thread state is
flushed before the copy, therefore disabling VFP and preventing
corruption from occurring.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 247f4993a5 upstream.
In a preemptible kernel, vfp_set() can be preempted, causing the
hardware VFP context to be switched while the thread vfp state is
being read and modified. This leads to a race condition which can
cause the thread vfp state to become corrupted if lazy VFP context
save occurs due to preemption in between the time thread->vfpstate
is read and the time the modified state is written back.
This may occur if preemption occurs during the execution of a
ptrace() call which modifies the VFP register state of a thread.
Such instances should be very rare in most realistic scenarios --
none has been reported, so far as I am aware. Only uniprocessor
systems should be affected, since VFP context save is not currently
lazy in SMP kernels.
The problem was introduced by my earlier patch migrating to use
regsets to implement ptrace.
This patch does a vfp_sync_hwstate() before reading
thread->vfpstate, to make sure that the thread's VFP state is not
live in the hardware registers while the registers are modified.
Thanks to Will Deacon for spotting this.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2af276dfb1 upstream.
Following execution of a signal handler, we currently restore the VFP
context from the ucontext in the signal frame. This involves copying
from the user stack into the current thread's vfp_hard_struct and then
flushing the new data out to the hardware registers.
This is problematic when using a preemptible kernel because we could be
context switched whilst updating the vfp_hard_struct. If the current
thread has made use of VFP since the last context switch, the VFP
notifier will copy from the hardware registers into the vfp_hard_struct,
overwriting any data that had been partially copied by the signal code.
Disabling preemption across copy_from_user calls is a terrible idea, so
instead we move the VFP thread flush *before* we update the
vfp_hard_struct. Since the flushing is performed lazily, this has the
effect of disabling VFP and clearing the CPU's VFP state pointer,
therefore preventing the thread from being updated with stale data on
the next context switch.
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3deaa7190a upstream.
Herbert Poetzl reported a performance regression since 2.6.39. The test
is a simple dd read, but with big block size. The reason is:
T1: ra (A, A+128k), (A+128k, A+256k)
T2: lock_page for page A, submit the 256k
T3: hit page A+128K, ra (A+256k, A+384). the range isn't submitted
because of plug and there isn't any lock_page till we hit page A+256k
because all pages from A to A+256k is in memory
T4: hit page A+256k, ra (A+384, A+ 512). Because of plug, the range isn't
submitted again.
T5: lock_page A+256k, so (A+256k, A+512k) will be submitted. The task is
waitting for (A+256k, A+512k) finish.
There is no request to disk in T3 and T4, so readahead pipeline breaks.
We really don't need block plug for generic_file_aio_read() for buffered
I/O. The readahead already has plug and has fine grained control when I/O
should be submitted. Deleting plug for buffered I/O fixes the regression.
One side effect is plug makes the request size 256k, the size is 128k
without it. This is because default ra size is 128k and not a reason we
need plug here.
Vivek said:
: We submit some readahead IO to device request queue but because of nested
: plug, queue never gets unplugged. When read logic reaches a page which is
: not in page cache, it waits for page to be read from the disk
: (lock_page_killable()) and that time we flush the plug list.
:
: So effectively read ahead logic is kind of broken in parts because of
: nested plugging. Removing top level plug (generic_file_aio_read()) for
: buffered reads, will allow unplugging queue earlier for readahead.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3c076351c4 upstream.
Right now we forcibly clear ASPM state on all devices if the BIOS indicates
that the feature isn't supported. Based on the Microsoft presentation
"PCI Express In Depth for Windows Vista and Beyond", I'm starting to think
that this may be an error. The implication is that unless the platform
grants full control via _OSC, Windows will not touch any PCIe features -
including ASPM. In that case clearing ASPM state would be an error unless
the platform has granted us that control.
This patch reworks the ASPM disabling code such that the actual clearing
of state is triggered by a successful handoff of PCIe control to the OS.
The general ASPM code undergoes some changes in order to ensure that the
ability to clear the bits isn't overridden by ASPM having already been
disabled. Further, this theoretically now allows for situations where
only a subset of PCIe roots hand over control, leaving the others in the
BIOS state.
It's difficult to know for sure that this is the right thing to do -
there's zero public documentation on the interaction between all of these
components. But enough vendors enable ASPM on platforms and then set this
bit that it seems likely that they're expecting the OS to leave them alone.
Measured to save around 5W on an idle Thinkpad X220.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34b76fcaee upstream.
[Based on a patch from Johan, mangled by gregkh to keep things in line]
Fix up the variable usage in the set_termios call.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Cc: Preston Fick <preston.fick@silabs.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f482fc88a upstream.
This fix changes the way baudrates are set on the CP210x devices from
Silicon Labs. The CP2101/2/3 will respond to both a GET/SET_BAUDDIV
command, and GET/SET_BAUDRATE command, while CP2104 and higher devices
only respond to GET/SET_BAUDRATE. The current cp210x.ko driver in
kernel version 3.2.0 only implements the GET/SET_BAUDDIV command.
This patch implements the two new codes for the GET/SET_BAUDRATE
commands. Then there is a change in the way that the baudrate is
assigned or retrieved. This is done according to the CP210x USB
specification in AN571. This document can be found here:
http://www.silabs.com/pages/DownloadDoc.aspx?FILEURL=Support%20Documents/TechnicalDocs/AN571.pdf&src=DocumentationWebPart
Sections 5.3/5.4 describe the USB packets for the old baudrate method.
Sections 5.5/5.6 describe the USB packets for the new method. This
patch also implements the new request scheme, and eliminates the
unnecessary baudrate calculations since it uses the "actual baudrate"
method.
This patch solves the problem reported for the CP2104 in bug 42586,
and also keeps support for all other devices (CP2101/2/3).
This patchfile is also attached to the bug report on
bugzilla.kernel.org. This patch has been developed and test on the
3.2.0 mainline kernel version under Ubuntu 10.11.
Signed-off-by: Preston Fick <preston.fick@silabs.com>
[duplicate patch also sent by Johan - gregkh]
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 791b7d7cf6 upstream.
This device is a Oscilloscope/Logic Analizer/Pattern Generator/TDR,
using a Silabs CP2103 USB to UART Bridge.
Signed-off-by: Renato Caldas <rmsc@fe.up.pt>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a622e71f5 ]
md5 key is added in socket through remote address.
remote address should be used in finding md5 key when
sending out reset packet.
Signed-off-by: shawnlu <shawn.lu@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5b35e1e6e9 ]
This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.
Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().
As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit efc3dbc374 ]
rds_sock_info() triggers locking warnings because we try to perform a
local_bh_enable() (via sock_i_ino()) while hardware interrupts are
disabled (via taking rds_sock_lock).
There is no reason for rds_sock_lock to be a hardware IRQ disabling
lock, none of these access paths run in hardware interrupt context.
Therefore making it a BH disabling lock is safe and sufficient to
fix this bug.
Reported-by: Kumar Sanghvi <kumaras@chelsio.com>
Reported-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d00a9dd21b ]
Several problems fixed in this patch :
1) Target of the conditional jump in case a divide by 0 is performed
by a bpf is wrong.
2) Must 'generate' the full function prologue/epilogue at pass=0,
or else we can stop too early in pass=1 if the proglen doesnt change.
(if the increase of prologue/epilogue equals decrease of all
instructions length because some jumps are converted to near jumps)
3) Change the wrong length detection at the end of code generation to
issue a more explicit message, no need for a full stack trace.
Reported-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 68315801db ]
When a packet is received on an L2TP IP socket (L2TPv3 IP link
encapsulation), the l2tpip socket's backlog_rcv function calls
xfrm4_policy_check(). This is not necessary, since it was called
before the skb was added to the backlog. With CONFIG_NET_NS enabled,
xfrm4_policy_check() will oops if skb->dev is null, so this trivial
patch removes the call.
This bug has always been present, but only when CONFIG_NET_NS is
enabled does it cause problems. Most users are probably using UDP
encapsulation for L2TP, hence the problem has only recently
surfaced.
EIP: 0060:[<c12bb62b>] EFLAGS: 00210246 CPU: 0
EIP is at l2tp_ip_recvmsg+0xd4/0x2a7
EAX: 00000001 EBX: d77b5180 ECX: 00000000 EDX: 00200246
ESI: 00000000 EDI: d63cbd30 EBP: d63cbd18 ESP: d63cbcf4
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Call Trace:
[<c1218568>] sock_common_recvmsg+0x31/0x46
[<c1215c92>] __sock_recvmsg_nosec+0x45/0x4d
[<c12163a1>] __sock_recvmsg+0x31/0x3b
[<c1216828>] sock_recvmsg+0x96/0xab
[<c10b2693>] ? might_fault+0x47/0x81
[<c10b2693>] ? might_fault+0x47/0x81
[<c1167fd0>] ? _copy_from_user+0x31/0x115
[<c121e8c8>] ? copy_from_user+0x8/0xa
[<c121ebd6>] ? verify_iovec+0x3e/0x78
[<c1216604>] __sys_recvmsg+0x10a/0x1aa
[<c1216792>] ? sock_recvmsg+0x0/0xab
[<c105a99b>] ? __lock_acquire+0xbdf/0xbee
[<c12d5a99>] ? do_page_fault+0x193/0x375
[<c10d1200>] ? fcheck_files+0x9b/0xca
[<c10d1259>] ? fget_light+0x2a/0x9c
[<c1216bbb>] sys_recvmsg+0x2b/0x43
[<c1218145>] sys_socketcall+0x16d/0x1a5
[<c11679f0>] ? trace_hardirqs_on_thunk+0xc/0x10
[<c100305f>] sysenter_do_call+0x12/0x38
Code: c6 05 8c ea a8 c1 01 e8 0c d4 d9 ff 85 f6 74 07 3e ff 86 80 00 00 00 b9 17 b6 2b c1 ba 01 00 00 00 b8 78 ed 48 c1 e8 23 f6 d9 ff <ff> 76 0c 68 28 e3 30 c1 68 2d 44 41 c1 e8 89 57 01 00 83 c4 0c
Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b924551bed ]
bond_alb_init_slave() is called from bond_enslave() and sets the slave's MAC
address. This is done differently for TLB and ALB modes.
bond->alb_info.rlb_enabled is used to discriminate between the two modes but
this flag may be uninitialized if the slave is being enslaved prior to calling
bond_open() -> bond_alb_initialize() on the master.
It turns out all the callers of alb_set_slave_mac_addr() pass
bond->alb_info.rlb_enabled as the hw parameter.
This patch cleans up the unnecessary parameter of alb_set_slave_mac_addr() and
makes the function decide based on the bonding mode instead, which fixes the
above problem.
Reported-by: Narendra K <Narendra_K@Dell.com>
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8a8ee9aff6 ]
caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.
Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.
A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.
To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.
With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ee4433efe ]
By definition net_generic should never be called when it can return
NULL. Fail conspicously with a BUG_ON to make it clear when people mess
up that a NULL return should never happen.
Recently there was a bug in the CAIF subsystem where it was registered
with register_pernet_device instead of register_pernet_subsys. It was
erroneously concluded that net_generic could validly return NULL and
that net_assign_generic was buggy (when it was just inefficient).
Hopefully this BUG_ON will prevent people to coming to similar erroneous
conclusions in the futrue.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 073862ba5d ]
When a new net namespace is created, we should attach to it a "struct
net_generic" with enough slots (even empty), or we can hit the following
BUG_ON() :
[ 200.752016] kernel BUG at include/net/netns/generic.h:40!
...
[ 200.752016] [<ffffffff825c3cea>] ? get_cfcnfg+0x3a/0x180
[ 200.752016] [<ffffffff821cf0b0>] ? lockdep_rtnl_is_held+0x10/0x20
[ 200.752016] [<ffffffff825c41be>] caif_device_notify+0x2e/0x530
[ 200.752016] [<ffffffff810d61b7>] notifier_call_chain+0x67/0x110
[ 200.752016] [<ffffffff810d67c1>] raw_notifier_call_chain+0x11/0x20
[ 200.752016] [<ffffffff821bae82>] call_netdevice_notifiers+0x32/0x60
[ 200.752016] [<ffffffff821c2b26>] register_netdevice+0x196/0x300
[ 200.752016] [<ffffffff821c2ca9>] register_netdev+0x19/0x30
[ 200.752016] [<ffffffff81c1c67a>] loopback_net_init+0x4a/0xa0
[ 200.752016] [<ffffffff821b5e62>] ops_init+0x42/0x180
[ 200.752016] [<ffffffff821b600b>] setup_net+0x6b/0x100
[ 200.752016] [<ffffffff821b6466>] copy_net_ns+0x86/0x110
[ 200.752016] [<ffffffff810d5789>] create_new_namespaces+0xd9/0x190
net_alloc_generic() should take into account the maximum index into the
ptr array, as a subsystem might use net_generic() anytime.
This also reduces number of reallocations in net_assign_generic()
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15699e6faf upstream.
The probe does not strictly require the USB_CDC_DMM_TYPE
descriptor, which is a good thing as it makes the driver
usable on non-conforming interfaces. A user could e.g.
bind to it to a CDC ECM interface by using the new_id and
bind sysfs files. But this would fail with a 0 buffer length
due to the missing descriptor.
Fix by defining a reasonable fallback size: The minimum
device receive buffer size required by the CDC WMC standard,
revision 1.1
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 655e247daf upstream.
As it turns out, there was a mismatch between the allocated inbuf size
(desc->bMaxPacketSize0, typically something like 64) and the length we
specified in the URB (desc->wMaxCommand, typically something like 2048)
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62aaf24dc1 upstream.
wdm_disconnect() waits for the mutex held by wdm_read() before
calling wake_up_all(). This causes a deadlock, preventing device removal
to complete. Do the wake_up_all() before we start waiting for the locks.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86b2bbfdbd upstream.
Properly clamp temperature limits set by the user. Without this fix,
attempts to write temperature limits above the maximum supported by
the chip (255 degrees Celsius) would arbitrarily and unexpectedly
result in the limit being set to 0 degree Celsius.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf840551a8 upstream.
When a TD length mismatch is found during isoc TRB enqueue, it directly
returns -EINVAL. However, isoc transfer is partially enqueued at this time,
and the ring should be cleared.
This should be backported to kernels as old as 2.6.36, which contain the
commit 522989a27c "xhci: Fix failed
enqueue in the middle of isoch TD."
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0cd5d482b upstream.
The xHCI hub port code gets passed a zero-based port number by the USB
core. It then adds one to in order to find a device slot by port number
and device speed by calling xhci_find_slot_id_by_port. That function
clearly states it requires a one-based port number. The xHCI port
status change event handler was using a zero-based port number that it
got from find_faked_portnum_from_hw_portnum, not a one-based port
number. This lead to the doorbells never being rung for a device after
a resume, or worse, a different device with the same speed having its
doorbell rung (which could lead to bad power management in the xHCI host
controller).
This patch should be backported to kernels as old as 2.6.39.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2492c6e645 upstream.
Add missing iounmap in error handling code, in a case where the function
already preforms iounmap on some other execution path.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
statement S,S1;
int ret;
@@
e = \(ioremap\|ioremap_nocache\)(...)
... when != iounmap(e)
if (<+...e...+>) S
... when any
when != iounmap(e)
*if (...)
{ ... when != iounmap(e)
return ...; }
... when any
iounmap(e);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1097ccebe6 upstream.
This changes the max length for the usb seven segment delcom device to 8
from 6. Delcom has both 6 and 8 variants and having 8 works fine with
devices which are only 6.
Signed-off-by: Harrison Metzger <harrisonmetz@gmail.com>
Signed-off-by: Stuart Pook <stuart@acm.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf9c05d5b6 upstream.
The assignment of handle in vmw_framebuffer_create_handle doesn't actually do anything useful and is incorrectly assigning an integer value to a pointer argument. It appears that this is a typo and should be dereferencing handle rather than assigning to it directly. This fixes a bug where an undefined handle value is potentially returned to user-space.
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Reviewed-by: Jakob Bornecrantz<jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26aa38cafa upstream.
There was an error on the jsm driver that would cause it to be unable to
recover after a second error is detected.
At the first error, the device recovers properly:
[72521.485691] EEH: Detected PCI bus error on device 0003:02:00.0
[72521.485695] EEH: This PCI device has failed 1 times in the last hour:
...
[72532.035693] ttyn3 at MMIO 0x0 (irq = 49) is a jsm
[72532.105689] jsm: Port 3 added
However, at the second error, it cascades until EEH disables the device:
[72631.229549] Call Trace:
...
[72641.725687] jsm: Port 3 added
[72641.725695] EEH: Detected PCI bus error on device 0003:02:00.0
[72641.725698] EEH: This PCI device has failed 3 times in the last hour:
It was caused because the PCI state was not being saved after the first
restore. Therefore, at the second recovery the PCI state would not be
restored.
Signed-off-by: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Breno Leitao <brenohl@br.ibm.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef605fdb33 upstream.
Protect against pl011_console_write() and the interrupt for
the console UART running concurrently on different CPUs.
Otherwise the console_write could spin for a long time
waiting for the UART to become not busy, while the other
CPU continuously services UART interrupts and keeps the
UART busy.
The checks for sysrq and oops_in_progress are taken
from 8250.c.
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Reviewed-by: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Reviewed-by: Bibek Basu <bibek.basu@stericsson.com>
Reviewed-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0eee50af5b upstream.
Commit 74c2107759 (serial: Use block_til_ready helper) and its fixup
3f582b8c11 (serial: fix termios settings in open) introduced a
regression on UV systems. The serial eventually freezes while being
used. It's completely unpredictable and sometimes needs a heap of
traffic to happen first.
To reproduce this, yast installation was used as it turned out to be
pretty reliable in reproducing. Especially during installation process
where one doesn't have an SSH daemon running. And no monitor as the HW
is completely headless. So this was fun to find. Given the machine
doesn't boot on vanilla before 2.6.36 final. (And the commits above
are older.)
Unless there is some bad race in the code, the hardware seems to be
pretty broken. Otherwise pure MSR read should not cause such a bug,
or?
So to prevent the bug, revert to the old behavior. I.e. read modem
status only if we really have to -- for non-CLOCAL set serials.
Non-CLOCAL works on this hardware OK, I tried. See? I don't.
And document that shit.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
References: https://lkml.org/lkml/2011/12/6/573
References: https://bugzilla.novell.com/show_bug.cgi?id=718518
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d443d8499 upstream.
Calling edge_remove_sysfs_attrs from edge_disconnect is too late
as the device has already been removed from sysfs.
Do the simple and obvious thing and make edge_remove_sysfs_attrs
the port_remove method.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: Wolfgang Frisch <wfpub@roembden.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8537bd2c4 upstream.
using a separate read and write mutex for locking is sufficient to make the
driver accept simultaneous read and write. This improves useability a lot.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c428b70c1e upstream.
wdm_in_callback() will also touch this field, so we cannot change it without locking
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc216ec363 upstream.
I tested this against 2.6.39 in the Ubuntu kernel, however I see the IDs
are not in latest 3.2 git.
This adds IDs for the FTDI controller in the Rainforest Automation
Zigbee dongle.
Signed-off-by: Peter Naulls <peter@chocky.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 108e02b129 upstream.
Fix regression introduced by commit b1ffb4c851 ("USB: Fix Corruption
issue in USB ftdi driver ftdi_sio.c") which caused the termios settings
to no longer be initialised at open. Consequently it was no longer
possible to set the port to the default speed of 9600 baud without first
changing to another baud rate and back again.
Reported-by: Roland Ramthun <mail@roland-ramthun.de>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Tested-by: Roland Ramthun <mail@roland-ramthun.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb833a9e09 upstream.
Return EINVAL if new baud_base does not match the current one.
The baud_base is device specific and can not be changed. This restores
the old (pre-2005) behaviour which was changed due to a
misunderstanding regarding this fact (see
https://lkml.org/lkml/2005/1/20/84).
Reported-by: Torbjörn Lofterud <torbjorn@pi.nxs.se>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 612539e81f upstream.
On v7, we use the same cache maintenance instructions for data lines
as for unified lines. This was not the case for v6, where HARVARD_CACHE
was defined to indicate the L1 cache topology.
This patch removes the erroneous compile-time check for HARVARD_CACHE in
proc-v7.S, ensuring that we perform I-side invalidation at boot.
Reported-and-Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Catalin Marinas <Catalin.Marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2c0d0266c upstream.
syslog-ng versions before 3.3.0beta1 (2011-05-12) assume that
CAP_SYS_ADMIN is sufficient to access syslog, so ever since CAP_SYSLOG
was introduced (2010-11-25) they have triggered a warning.
Commit ee24aebffb ("cap_syslog: accept CAP_SYS_ADMIN for now")
improved matters a little by making syslog-ng work again, just keeping
the WARN_ONCE(). But still, this is a warning that writes a stack trace
we don't care about to syslog, sets a taint flag, and alarms sysadmins
when nothing worse has happened than use of an old userspace with a
recent kernel.
Convert the WARN_ONCE to a printk_once to avoid that while continuing to
give userspace developers a hint that this is an unwanted
backward-compatibility feature and won't be around forever.
Reported-by: Ralf Hildebrandt <ralf.hildebrandt@charite.de>
Reported-by: Niels <zorglub_olsen@hotmail.com>
Reported-by: Paweł Sikora <pluto@agmk.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Liked-by: Gergely Nagy <algernon@madhouse-project.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b25eb690e upstream.
The refactoring of Realtek codec driver in 3.2 kernel caused a
regression for ASUS A6Rp laptop; it doesn't give any output.
The reason was that this machine has a secret master mute (or EAPD)
control via NID 0x0f VREF. Setting VREF50 on this node makes the
sound working again.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42588
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b68edc91c upstream.
We've decided to provide CPU family specific container files
(starting with CPU family 15h). E.g. for family 15h we have to
load microcode_amd_fam15h.bin instead of microcode_amd.bin
Rationale is that starting with family 15h patch size is larger
than 2KB which was hard coded as maximum patch size in various
microcode loaders (not just Linux).
Container files which include patches larger than 2KB cause
different kinds of trouble with such old patch loaders. Thus we
have to ensure that the default container file provides only
patches with size less than 2KB.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20120120164412.GD24508@alberich.amd.com
[ documented the naming convention and tidied the code a bit. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c770c273 upstream
When finding the longest extent in an AG, we read the value directly
out of the AGF buffer without endian conversion. This will give an
incorrect length, resulting in FITRIM operations potentially not
trimming everything that it should.
Note, for 3.0-stable this has been modified to apply to
fs/xfs/linux-2.6/xfs_discard.c instead of fs/xfs/xfs_discard.c. -bpm
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b90a603a1 upstream.
When the ahash driver returns -EBUSY, AH4/6 input functions return
NET_XMIT_DROP, presumably copied from the output code path. But
returning transmit codes on input doesn't make a lot of sense.
Since NET_XMIT_DROP is a positive int, this gets interpreted as
the next header type (i.e., success). As that can only end badly,
remove the check.
Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30fb6aa740 upstream.
Multiple users of the function tracer can register their functions
with the ftrace_ops structure. The accounting within ftrace will
update the counter on each function record that is being traced.
When the ftrace_ops filtering adds or removes functions, the
function records will be updated accordingly if the ftrace_ops is
still registered.
When a ftrace_ops is removed, the counter of the function records,
that the ftrace_ops traces, are decremented. When they reach zero
the functions that they represent are modified to stop calling the
mcount code.
When changes are made, the code is updated via stop_machine() with
a command passed to the function to tell it what to do. There is an
ENABLE and DISABLE command that tells the called function to enable
or disable the functions. But the ENABLE is really a misnomer as it
should just update the records, as records that have been enabled
and now have a count of zero should be disabled.
The DISABLE command is used to disable all functions regardless of
their counter values. This is the big off switch and is not the
complement of the ENABLE command.
To make matters worse, when a ftrace_ops is unregistered and there
is another ftrace_ops registered, neither the DISABLE nor the
ENABLE command are set when calling into the stop_machine() function
and the records will not be updated to match their counter. A command
is passed to that function that will update the mcount code to call
the registered callback directly if it is the only one left. This
means that the ftrace_ops that is still registered will have its callback
called by all functions that have been set for it as well as the ftrace_ops
that was just unregistered.
Here's a way to trigger this bug. Compile the kernel with
CONFIG_FUNCTION_PROFILER set and with CONFIG_FUNCTION_GRAPH not set:
CONFIG_FUNCTION_PROFILER=y
# CONFIG_FUNCTION_GRAPH is not set
This will force the function profiler to use the function tracer instead
of the function graph tracer.
# cd /sys/kernel/debug/tracing
# echo schedule > set_ftrace_filter
# echo function > current_tracer
# cat set_ftrace_filter
schedule
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 692/68108025 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/0:2-909 [000] .... 531.235574: schedule <-worker_thread
<idle>-0 [001] .N.. 531.235575: schedule <-cpu_idle
kworker/0:2-909 [000] .... 531.235597: schedule <-worker_thread
sshd-2563 [001] .... 531.235647: schedule <-schedule_hrtimeout_range_clock
# echo 1 > function_profile_enabled
# echo 0 > function_porfile_enabled
# cat set_ftrace_filter
schedule
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 159701/118821262 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [002] ...1 604.870655: local_touch_nmi <-cpu_idle
<idle>-0 [002] d..1 604.870655: enter_idle <-cpu_idle
<idle>-0 [002] d..1 604.870656: atomic_notifier_call_chain <-enter_idle
<idle>-0 [002] d..1 604.870656: __atomic_notifier_call_chain <-atomic_notifier_call_chain
The same problem could have happened with the trace_probe_ops,
but they are modified with the set_frace_filter file which does the
update at closure of the file.
The simple solution is to change ENABLE to UPDATE and call it every
time an ftrace_ops is unregistered.
Link: http://lkml.kernel.org/r/1323105776-26961-3-git-send-email-jolsa@redhat.com
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 072126f452 upstream.
Currently, if set_ftrace_filter() is called when the ftrace_ops is
active, the function filters will not be updated. They will only be updated
when tracing is disabled and re-enabled.
Update the functions immediately during set_ftrace_filter().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41fb61c2d0 upstream.
Whenever the hash of the ftrace_ops is updated, the record counts
must be balance. This requires disabling the records that are set
in the original hash, and then enabling the records that are set
in the updated hash.
Moving the update into ftrace_hash_move() removes the bug where the
hash was updated but the records were not, which results in ftrace
triggering a warning and disabling itself because the ftrace_ops filter
is updated while the ftrace_ops was registered, and then the failure
happens when the ftrace_ops is unregistered.
The current code will not trigger this bug, but new code will.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51fc6dc8f9 upstream.
For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
Consequently, keeping all W[80] array on stack is unnecessary,
only 16 values are really needed.
Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).
Line by line explanation:
* BLEND_OP
array is "circular" now, all indexes have to be modulo 16.
Round number is positive, so remainder operation should be
without surprises.
* initial full message scheduling is trimmed to first 16 values which
come from data block, the rest is calculated before it's needed.
* original loop body is unrolled version of new SHA512_0_15 and
SHA512_16_79 macros, unrolling was done to not do explicit variable
renaming. Otherwise it's the very same code after preprocessing.
See sha1_transform() code which does the same trick.
Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).
See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84e31fdb7c upstream.
commit f9e2bca6c2
aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
created global message schedule area.
If sha512_update will ever be entered twice, hash will be silently
calculated incorrectly.
Probably the easiest way to notice incorrect hashes being calculated is
to run 2 ping floods over AH with hmac(sha512):
#!/usr/sbin/setkey -f
flush;
spdflush;
add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
spdadd IP1 IP2 any -P out ipsec ah/transport//require;
spdadd IP2 IP1 any -P in ipsec ah/transport//require;
XfrmInStateProtoError will start ticking with -EBADMSG being returned
from ah_input(). This never happens with, say, hmac(sha1).
With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
with multiple bidirectional ping flood streams like it doesn't tick
with SHA-1.
After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
This is OK for simple loads, for something more heavy, stack reduction will be done
separatedly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 598781d711 upstream.
If the master tries to authenticate a client using drm_authmagic and
that client has already closed its drm file descriptor,
either wilfully or because it was terminated, the
call to drm_authmagic will dereference a stale pointer into kmalloc'ed memory
and corrupt it.
Typically this results in a hard system hang.
This patch fixes that problem by removing any authentication tokens
(struct drm_magic_entry) open for a file descriptor when that file
descriptor is closed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58ded24f0f upstream.
If pages passed to the eCryptfs extent-based crypto functions are not
mapped and the module parameter ecryptfs_verbosity=1 was specified at
loading time, a NULL pointer dereference will occur.
Note that this wouldn't happen on a production system, as you wouldn't
pass ecryptfs_verbosity=1 on a production system. It leaks private
information to the system logs and is for debugging only.
The debugging info printed in these messages is no longer very useful
and rather than doing a kmap() in these debugging paths, it will be
better to simply remove the debugging paths completely.
https://launchpad.net/bugs/913651
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a261a03904 upstream.
Most filesystems call inode_change_ok() very early in ->setattr(), but
eCryptfs didn't call it at all. It allowed the lower filesystem to make
the call in its ->setattr() function. Then, eCryptfs would copy the
appropriate inode attributes from the lower inode to the eCryptfs inode.
This patch changes that and actually calls inode_change_ok() on the
eCryptfs inode, fairly early in ecryptfs_setattr(). Ideally, the call
would happen earlier in ecryptfs_setattr(), but there are some possible
inode initialization steps that must happen first.
Since the call was already being made on the lower inode, the change in
functionality should be minimal, except for the case of a file extending
truncate call. In that case, inode_newsize_ok() was never being
called on the eCryptfs inode. Rather than inode_newsize_ok() catching
maximum file size errors early on, eCryptfs would encrypt zeroed pages
and write them to the lower filesystem until the lower filesystem's
write path caught the error in generic_write_checks(). This patch
introduces a new function, called ecryptfs_inode_newsize_ok(), which
checks if the new lower file size is within the appropriate limits when
the truncate operation will be growing the lower file.
In summary this change prevents eCryptfs truncate operations (and the
resulting page encryptions), which would exceed the lower filesystem
limits or FSIZE rlimits, from ever starting.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reviewed-by: Li Wang <liwang@nudt.edu.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e6f0d7690 upstream.
ecryptfs_write() handles the truncation of eCryptfs inodes. It grabs a
page, zeroes out the appropriate portions, and then encrypts the page
before writing it to the lower filesystem. It was unkillable and due to
the lack of sparse file support could result in tying up a large portion
of system resources, while encrypting pages of zeros, with no way for
the truncate operation to be stopped from userspace.
This patch adds the ability for ecryptfs_write() to detect a pending
fatal signal and return as gracefully as possible. The intent is to
leave the lower file in a useable state, while still allowing a user to
break out of the encryption loop. If a pending fatal signal is detected,
the eCryptfs inode size is updated to reflect the modified inode size
and then -EINTR is returned.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30373dc0c8 upstream.
Print inode on metadata read failure. The only real
way of dealing with metadata read failures is to delete
the underlying file system file. Having the inode
allows one to 'find . -inum INODE`.
[tyhicks@canonical.com: Removed some minor not-for-stable parts]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db10e55651 upstream.
A malicious count value specified when writing to /dev/ecryptfs may
result in a a very large kernel memory allocation.
This patch peeks at the specified packet payload size, adds that to the
size of the packet headers and compares the result with the write count
value. The resulting maximum memory allocation size is approximately 532
bytes.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b4ead019af upstream.
The recent change of the power-widget handling for IDT codecs caused
the silent output from the docking-station line-out jack. This was
partially fixed by the commit f2cbba7602
"ALSA: hda - Fix the lost power-setup of seconary pins after PM resume".
But the line-out on the docking-station is still silent when booted
with the jack plugged even by this fix.
The remainig bug is that the power-widget is set off in stac92xx_init()
because the pins in cfg->line_out_pins[] aren't checked there properly
but only hp_pins[] are checked in is_nid_hp_pin().
This patch fixes the problem by checking both HP and line-out pins
and leaving the power-map correctly.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42637
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f5d78dc48 upstream.
We switch to dynamic debugging in commit
56e46742e8 but did not take into account that
now we do not control anymore whether a specific message is enabled or not.
So now we lock the "dbg_lock" and release it in every debugging macro, which
make them not so light-weight.
This commit removes the "dbg_lock" protection from the debugging macros to
fix the issue.
The downside is that now our DBGKEY() stuff is broken, but this is not
critical at all and will be fixed later.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 687875fb7d upstream.
Fix the following NULL ptr dereference caused by
cat /sys/devices/system/memory/memory0/removable
Pid: 13979, comm: sed Not tainted 3.0.13-0.5-default #1 IBM BladeCenter LS21 -[7971PAM]-/Server Blade
RIP: __count_immobile_pages+0x4/0x100
Process sed (pid: 13979, threadinfo ffff880221c36000, task ffff88022e788480)
Call Trace:
is_pageblock_removable_nolock+0x34/0x40
is_mem_section_removable+0x74/0xf0
show_mem_removable+0x41/0x70
sysfs_read_file+0xfe/0x1c0
vfs_read+0xc7/0x130
sys_read+0x53/0xa0
system_call_fastpath+0x16/0x1b
We are crashing because we are trying to dereference NULL zone which
came from pfn=0 (struct page ffffea0000000000). According to the boot
log this page is marked reserved:
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
and early_node_map confirms that:
early_node_map[3] active PFN ranges
1: 0x00000010 -> 0x0000009c
1: 0x00000100 -> 0x000bffa3
1: 0x00100000 -> 0x00240000
The problem is that memory_present works in PAGE_SECTION_MASK aligned
blocks so the reserved range sneaks into the the section as well. This
also means that free_area_init_node will not take care of those reserved
pages and they stay uninitialized.
When we try to read the removable status we walk through all available
sections and hope that the zone is valid for all pages in the section.
But this is not true in this case as the zone and nid are not initialized.
We have only one node in this particular case and it is marked as node=1
(rather than 0) and that made the problem visible because page_to_nid will
return 0 and there are no zones on the node.
Let's check that the zone is valid and that the given pfn falls into its
boundaries and mark the section not removable. This might cause some
false positives, probably, but we do not have any sane way to find out
whether the page is reserved by the platform or it is just not used for
whatever other reasons.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 85e72aa538 upstream.
/proc/pid/clear_refs is used to clear the Referenced and YOUNG bits for
pages and corresponding page table entries of the task with PID pid, which
includes any special mappings inserted into the page tables in order to
provide things like vDSOs and user helper functions.
On ARM this causes a problem because the vectors page is mapped as a
global mapping and since ec706dab ("ARM: add a vma entry for the user
accessible vector page"), a VMA is also inserted into each task for this
page to aid unwinding through signals and syscall restarts. Since the
vectors page is required for handling faults, clearing the YOUNG bit (and
subsequently writing a faulting pte) means that we lose the vectors page
*globally* and cannot fault it back in. This results in a system deadlock
on the next exception.
To see this problem in action, just run:
$ echo 1 > /proc/self/clear_refs
on an ARM platform (as any user) and watch your system hang. I think this
has been the case since 2.6.37
This patch avoids clearing the aforementioned bits for reserved pages,
therefore leaving the vectors page intact on ARM. Since reserved pages
are not candidates for swap, this change should not have any impact on the
usefulness of clear_refs.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Reported-by: Moussa Ba <moussaba@micron.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c25a785d66 upstream.
If the provided system call number is equal to __NR_syscalls, the
current check will pass and a function pointer just after the system
call table may be called, since sys_call_table is an array with total
size __NR_syscalls.
Whether or not this is a security bug depends on what the compiler puts
immediately after the system call table. It's likely that this won't do
anything bad because there is an additional NULL check on the syscall
entry, but if there happens to be a non-NULL value immediately after the
system call table, this may result in local privilege escalation.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f42af6c486 upstream.
Since commit
"7488876... dt/net: Eliminate users of of_platform_{,un}register_driver"
there are two platform drivers named "mdio-gpio" registered.
I renamed the of variant to "mdio-ofgpio".
Signed-off-by: Dirk Eibach <eibach@gdsys.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fe0fe83585 upstream.
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.
This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
because this code is never called outside of pnfs.c and pnfs IO
paths)
Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5c0b4129c0 upstream.
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.
Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit dfd00c4c8f upstream.
Same devices can generate interrupt without properly setting bit in
INT_SOURCE_CSR register (spurious interrupt), what will cause IRQ line
will be disabled by interrupts controller driver.
We discovered that clearing INT_MASK_CSR stops such behaviour. We
previously first read that register, and then clear all know interrupt
sources bits and do not touch reserved bits. After this patch, we write
to all register content (I believe writing to reserved bits on that
register will not cause any problems, I tested that on my rt2800pci
device).
This fix very bad performance problem, practically making device
unusable (since worked without interrupts), reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=658451
We previously tried to workaround that issue in commit
4ba7d99978 "rt2800pci: handle spurious
interrupts", but it was reverted in commit
82e5fc2a34
as thing, that will prevent to detect real spurious interrupts.
Reported-and-tested-by: Amir Hedayaty <hedayaty@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d059f9fa84 upstream.
Move the call to enable_timeouts() forward so that
BAU_MISC_CONTROL is initialized before using it in
calculate_destination_timeout().
Fix the calculation of a BAU destination timeout
for UV2 (in calculate_destination_timeout()).
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116211848.GB5767@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2727b17539 upstream.
Correct OMAP_I2C_SYSC_REG offset in omap4 register map.
Offset 0x20 is reserved and OMAP_I2C_SYSC_REG has 0x10 as offset.
Signed-off-by: Alexander Aring <a.aring@phytec.de>
[khilman@ti.com: minor changelog edits]
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 895f302252 upstream.
The target code was not setting the additional sense length field in the
sense data it returned, which meant that at least the Linux stack
ignored the ASC/ASCQ fields. For example, without this patch, on a
tcm_loop device:
# sg_raw -v /dev/sda 2 0 0 0 0 0
gives
cdb to send: 02 00 00 00 00 00
SCSI Status: Check Condition
Sense Information:
Fixed format, current; Sense key: Illegal Request
Raw sense data (in hex):
70 00 05 00 00 00 00 00
while after the patch we correctly get the following (which matches what
a regular disk returns):
cdb to send: 02 00 00 00 00 00
SCSI Status: Check Condition
Sense Information:
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid command operation code
Raw sense data (in hex):
70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00
00 00
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ce136176fe upstream.
Current SCSI specs say that the "response format" field in the standard
INQUIRY response should be set to 2, and all the real SCSI devices I
have do put 2 here. So let's do that too.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cced5041ed upstream.
sym53c8xx_slave_destroy unconditionally assumes that sym53c8xx_slave_alloc has
succesesfully allocated a sym_lcb. This can lead to a NULL pointer dereference
(exposed by commit 4e6c82b).
Signed-off-by: Stratos Psomadakis <psomas@gentoo.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d640113fe8 upstream.
For UP processor, it is likely that no _MAT method or MADT table defined.
So currently acpi_get_cpuid(...) always return -1 for UP processor.
This is wrong. It should return valid value for CPU0.
In the other hand, BIOS may define multiple CPU handles even for UP
processor, for example
Scope (_PR)
{
Processor (CPU0, 0x00, 0x00000410, 0x06) {}
Processor (CPU1, 0x01, 0x00000410, 0x06) {}
Processor (CPU2, 0x02, 0x00000410, 0x06) {}
Processor (CPU3, 0x03, 0x00000410, 0x06) {}
}
We should only return valid value for CPU0's acpi handle.
And return invalid value for others.
http://marc.info/?t=132329819900003&r=1&w=2
Reported-and-tested-by: wallak@free.fr
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit da4d8b287a upstream.
The call to acpi_os_validate_address in acpi_ds_get_region_arguments was
removed by mistake in commit 9ad19ac(ACPICA: Split large dsopcode and
dsload.c files).
Put it back.
Reported-and-bisected-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9f10f6a520 upstream.
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
ia64 did handle the PXM fields almost consistently, but depending on
sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
use 16/32 bits for PXM if the SRAT has rev 2 or higher.
The patch also adds __init to the two pxm accessor functions, as they
access __initdata now and are called from an __init function only anyway.
Note that the code only uses 16 bits for the PXM field in the processor
proximity field; the patch does not address this as 16 bits are more than
enough.
Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cd298f60a2 upstream.
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits
for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
This patch makes it consistent: Either use 8 bits consistently (SRAT
rev 1 or lower) or 32 bits (SRAT rev 2 or higher).
cc: x86@kernel.org
Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8df0eb7c9d upstream.
In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
In order to know whether or not, we must know what version the SRAT
table has.
This patch stores the SRAT table revision for later consumption
by arch specific __init functions.
Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 39a74fdedd upstream.
smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle. Without the fix, we could have
one cpu which has auto_demotion enabled or has no broadcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5c2a9f06a9 upstream.
kvm -cpu host passes the original cpuid info to the guest.
Latest kvm version seem to return true for mwait_leaf cpuid
function on recent Intel CPUs. But it does not return mwait
C-states (mwait_substates), instead zero is returned.
While real CPUs seem to always return non-zero values, the intel
idle driver should not get active in kvm (mwait_substates == 0)
case and bail out.
Otherwise a Null pointer exception will happen later when the
cpuidle subsystem tries to get active:
[0.984807] BUG: unable to handle kernel NULL pointer dereference at (null)
[0.984807] IP: [<(null)>] (null)
...
[0.984807][<ffffffff8143cf34>] ? cpuidle_idle_call+0xb4/0x340
[0.984807][<ffffffff8159e7bc>] ? __atomic_notifier_call_chain+0x4c/0x70
[0.984807][<ffffffff81001198>] ? cpu_idle+0x78/0xd0
Reference:
https://bugzilla.novell.com/show_bug.cgi?id=726296
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Bruno Friedmann <bruno@ioda-net.ch>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f0e48b6bd4 upstream.
The two DACs for the front output and the surround/center/LFE/back
outputs are wired up out of phase, so when channels are duplicated,
their sound can cancel out each other and result in a weaker bass
response. To fix this, reverse the polarity of the neutron flow to
the front output.
Reported-any-tested-by: Daniel Hill <daniel@enemyplanet.geek.nz>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e268337dfe upstream.
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.
This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open. That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.
That is different from our previous behavior, but much simpler. If
somebody actually finds a load where this matters, we'll need to revert
this commit.
I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve. So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: Jüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0bfc96cb77 upstream.
[ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl
and -ENOIOCTLCMD from sd_compat_ioctl. ]
Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
will pass the command to the underlying block device. This is
well-known, but it is also a large security problem when (via Unix
permissions, ACLs, SELinux or a combination thereof) a program or user
needs to be granted access only to part of the disk.
This patch lets partitions forward a small set of harmless ioctls;
others are logged with printk so that we can see which ioctls are
actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred.
Of course it was being sent to a (partition on a) hard disk, so it would
have failed with ENOTTY and the patch isn't changing anything in
practice. Still, I'm treating it specially to avoid spamming the logs.
In principle, this restriction should include programs running with
CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and
/dev/sdb, it still should not be able to read/write outside the
boundaries of /dev/sda2 independent of the capabilities. However, for
now programs with CAP_SYS_RAWIO will still be allowed to send the
ioctls. Their actions will still be logged.
This patch does not affect the non-libata IDE driver. That driver
however already tests for bd != bd->bd_contains before issuing some
ioctl; it could be restricted further to forbid these ioctls even for
programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ Make it also print the command name when warning - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c3e0ef9a29 upstream.
For 32-bit architectures using standard jiffies the idletime calculation
in uptime_proc_show will quickly overflow. It takes (2^32 / HZ) seconds
of idle-time, or e.g. 12.45 days with no load on a quad-core with HZ=1000.
Switch to 64-bit calculations.
Cc: Michael Abbott <michael.abbott@diamond.ac.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 66f06127f3 upstream.
Just another eGalax device.
Please note that adding this device to have_special_driver
in hid-core.c is not required anymore.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bb9ff21072 upstream.
This patch adds USB ID for the touchpanel in Acer Iconia W500. The panel
supports up to five fingers, therefore the need for a new addition of panel
types.
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e36f690b37 upstream.
This is just a renaming of USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH{N}
to USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH_{PID} to handle more eGalax
devices.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b7ea81a58a upstream.
The AH4/6 ahash input callbacks read out the nexthdr field from the AH
header *after* they overwrite that header. This is obviously not going
to end well. Fix it up.
Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 069294e813 upstream.
The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume
instead of the error code. This appears to be a copy+paste error from
the input case, where nexthdr is expected. This causes the driver to
continuously add AH headers to the datagram until either an allocation
fails and the packet is dropped or the ahash driver hits a synchronous
fallback and the resulting monstrosity is transmitted.
Correct this issue by simply passing the error code unadulterated.
Signed-off-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit eaf5f90735 upstream.
Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.
Here's what appears to happen:
1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1
2 - CPU1: select_parent(P) locks P->d_lock
3 - CPU0: shrink_dentry_list() locks C->d_lock
dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock
4 - CPU1: select_parent(P) locks C->d_lock,
moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1
5 - CPU0: shrink_dentry_list() finds dispose list empty, returns
6 - Goto 2 with CPU0 and CPU1 switched
Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.
One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.
Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:
ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"
Then execute the following loop:
while true; do
echo -bond0 > /sys/class/net/bonding_masters
echo +bond0 > /sys/class/net/bonding_masters
echo -bond1 > /sys/class/net/bonding_masters
echo +bond1 > /sys/class/net/bonding_masters
done
One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent(). But I think a better solution is to stop the
stealing behavior.
This patch adds a new dentry flag that is set when the dentry is added to the
dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.
If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it. With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.
Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 806e23e95f upstream.
There is a potential integer overflow in uvc_ioctl_ctrl_map(). When a
large xmap->menu_count is passed from the userspace, the subsequent call
to kmalloc() will allocate a buffer smaller than expected.
map->menu_count and map->menu_info would later be used in a loop (e.g.
in uvc_query_v4l2_ctrl), which leads to out-of-bound access.
The patch checks the ioctl argument and returns -EINVAL for zero or too
large values in xmap->menu_count.
Signed-off-by: Haogang Chen <haogangchen@gmail.com>
[laurent.pinchart@ideasonboard.com Prevent excessive memory consumption]
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2e885057b7 upstream.
In ELF64, the sh_flags field is 64-bits wide. recordmcount was
erroneously treating it as a 32-bit wide field. For little endian
objects this works because the flags of interest (SHF_EXECINSTR)
reside in the lower 32 bits of the word, and you get the same result
with either a 32-bit or 64-bit read. Big endian objects on the
other hand do not work at all with this error.
The fix: Correctly treat sh_flags as 64-bits wide in elf64 objects.
The symptom I observed was that my
__start_mcount_loc..__stop_mcount_loc was empty even though ftrace
function tracing was enabled.
Link: http://lkml.kernel.org/r/1324345362-12230-1-git-send-email-ddaney.cavm@gmail.com
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit da517a08ac upstream.
SGI UV systems print a message during boot:
UV: Found <num> blades
Due to packaging changes, the blade count is not accurate for
on the next generation of the platform. This patch corrects the
count.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/20120106191900.GA19772@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fed474857e upstream.
Removing the parent of a watched file results in "kernel BUG at
fs/notify/mark.c:139".
To reproduce
add "-w /tmp/audit/dir/watched_file" to audit.rules
rm -rf /tmp/audit/dir
This is caused by fsnotify_destroy_mark() being called without an
extra reference taken by the caller.
Reported by Francesco Cosoleto here:
https://bugzilla.novell.com/show_bug.cgi?id=689860
Fix by removing the BUG_ON and adding a comment about not accessing mark after
the iput.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b4f36f88b3 upstream.
Socket callbacks use svc_xprt_enqueue() to add an xprt to a
pool->sp_sockets list. In normal operation a server thread will later
come along and take the xprt off that list. On shutdown, after all the
threads have exited, we instead manually walk the sv_tempsocks and
sv_permsocks lists to find all the xprt's and delete them.
So the sp_sockets lists don't really matter any more. As a result,
we've mostly just ignored them and hoped they would go away.
Which has gotten us into trouble; witness for example ebc63e531c
"svcrpc: fix list-corrupting race on nfsd shutdown", the result of Ben
Greear noticing that a still-running svc_xprt_enqueue() could re-add an
xprt to an sp_sockets list just before it was deleted. The fix was to
remove it from the list at the end of svc_delete_xprt(). But that only
made corruption less likely--I can see nothing that prevents a
svc_xprt_enqueue() from adding another xprt to the list at the same
moment that we're removing this xprt from the list. In fact, despite
the earlier xpo_detach(), I don't even see what guarantees that
svc_xprt_enqueue() couldn't still be running on this xprt.
So, instead, note that svc_xprt_enqueue() essentially does:
lock sp_lock
if XPT_BUSY unset
add to sp_sockets
unlock sp_lock
So, if we do:
set XPT_BUSY on every xprt.
Empty every sp_sockets list, under the sp_socks locks.
Then we're left knowing that the sp_sockets lists are all empty and will
stay that way, since any svc_xprt_enqueue() will check XPT_BUSY under
the sp_lock and see it set.
And *then* we can continue deleting the xprt's.
(Thanks to Jeff Layton for being correctly suspicious of this code....)
Cc: Ben Greear <greearb@candelatech.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2fefb8a09e upstream.
There's no reason I can see that we need to call sv_shutdown between
closing the two lists of sockets.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 61c8504c42 upstream.
The pool_to and to_pool fields of the global svc_pool_map are freed on
shutdown, but are initialized in nfsd startup only in the
SVC_POOL_PERCPU and SVC_POOL_PERNODE cases.
They *are* initialized to zero on kernel startup. So as long as you use
only SVC_POOL_GLOBAL (the default), this will never be a problem.
You're also OK if you only ever use SVC_POOL_PERCPU or SVC_POOL_PERNODE.
However, the following sequence events leads to a double-free:
1. set SVC_POOL_PERCPU or SVC_POOL_PERNODE
2. start nfsd: both fields are initialized.
3. shutdown nfsd: both fields are freed.
4. set SVC_POOL_GLOBAL
5. start nfsd: the fields are left untouched.
6. shutdown nfsd: now we try to free them again.
Step 4 is actually unnecessary, since (for some bizarre reason), nfsd
automatically resets the pool mode to SVC_POOL_GLOBAL on shutdown.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 364212fdda upstream.
Thomas Lange reported that when he did a 'make localmodconfig', his
config was missing the brcmsmac driver, even though he had the module
loaded.
Looking into this, I found the file:
drivers/net/wireless/brcm80211/brcmsmac/Makefile
had the following in the Makefile:
MODULEPFX := brcmsmac
obj-$(CONFIG_BRCMSMAC) += $(MODULEPFX).o
The way streamline-config.pl works, is parsing all the
obj-$(CONFIG_FOO) += foo.o
lines to find that CONFIG_FOO belongs to the module foo.ko.
But in this case, the brcmsmac.o was not used, but a variable in its place.
By changing streamline-config.pl to remember defined variables in Makefiles
and substituting them when they are used in the obj-X lines, allows
Thomas (and others) to have their brcmsmac module stay configured
when it is loaded and running "make localmodconfig".
Reported-by: Thomas Lange <thomas-lange2@gmx.de>
Tested-by: Thomas Lange <thomas-lange2@gmx.de>
Cc: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d060d963e8 upstream.
Simplify the way lines ending with backslashes (continuation) in Makefiles
is parsed. This is needed to implement a necessary fix.
Tested-by: Thomas Lange <thomas-lange2@gmx.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6c06108be5 upstream.
If ctrls->count is too high the multiplication could overflow and
array_size would be lower than expected. Mauro and Hans Verkuil
suggested that we cap it at 1024. That comes from the maximum
number of controls with lots of room for expantion.
$ grep V4L2_CID include/linux/videodev2.h | wc -l
211
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit dd8df17fe8 upstream.
This patch fixes a failure to recognize SD cards reported on a Dell
Vostro with O2 Micro SD card reader. Patch 49c468f ("mmc: sd: add
support for uhs bus speed mode selection") caused the problem, by
setting the SDHCI_CTRL_HISPD flag even for legacy timings.
Signed-off-by: Alexander Elbs <alex@segv.de>
Acked-by: Philip Rakity <prakity@marvell.com>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c6ced0db08 upstream.
When suspending host, the tuning timer shoule be deactivated.
And the HOST_NEEDS_TUNING flag should be set after tuning timer is
deactivated.
Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Aaron Lu <aaron.lu@amd.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 913047e9e5 upstream.
This patch fixes the wrong comparison before setting the interface
voltage in DDR mode.
The assignment to the variable ddr before comaprison is either
ddr = MMC_1_2V_DDR_MODE; or ddr == MMC_1_8V_DDR_MODE. But the comparison
is done with the extended csd value if ddr == EXT_CSD_CARD_TYPE_DDR_1_2V.
Signed-off-by: Girish K S <girish.shivananjappa@linaro.org>
Acked-by: Subhash Jadavani <subhashj@codeaurora.org>
Acked-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 7c1f59c9d5 upstream.
When adding checks for ACPI resource conflicts to many bus drivers,
not enough attention was paid to the error paths, and for several
drivers this causes 0 to be returned on error in some cases. Fix this
by properly returning a non-zero value on every error.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d34315da91 upstream.
Patch 56e46742e8 broke UBIFS debugging messages:
before that commit when UBIFS debugging was enabled, users saw few useful
debugging messages after mount. However, that patch turned 'dbg_msg()' into
'pr_debug()', so to enable the debugging messages users have to enable them
first via /sys/kernel/debug/dynamic_debug/control, which is very impractical.
This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 72f0d453d8 upstream.
Patch ab50ff6847 broke UBI debugging messages:
before that commit when UBI debugging was enabled, users saw few useful
debugging messages after attaching an MTD device. However, that patch turned
'dbg_msg()' into 'pr_debug()', so to enable the debugging messages users have
to enable them first via /sys/kernel/debug/dynamic_debug/control, which is
very impractical.
This commit makes 'dbg_msg()' to use 'printk()' instead of 'pr_debug()', just
as it was before the breakage.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4a59c797a1 upstream.
Currently it's possible to create a volume without a name. E.g:
ubimkvol -n 32 -s 2MiB -t static /dev/ubi0 -N ""
After that vtbl_check() will always fail because it does not permit
empty strings.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ab936cbcd0 upstream.
Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page(). This function replaces a page in the
radix-tree with a new page. WHen doing this, memory cgroup needs to fix
up the accounting information. memcg need to check PCG_USED bit etc.
In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache(). So, memcg's LRU accounting information should be
fixed, too.
This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid
races with LRU handling.
Background:
replace_page_cache_page() is called by FUSE code in its splice() handling.
Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
because rmdir() checks the whole LRU is empty and there is no account leak.
If a page is on the other LRU than it should be, rmdir() will fail.
This bug was added in March 2011, but no bug report yet. I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.
The result of this bug is that admin cannot destroy a memcg because of
account leak. So, no panic, no deadlock. And, even if an active cgroup
exist, umount can succseed. So no problem at shutdown.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit eb31aae8cb upstream.
Some Dell BIOSes have MCFG tables that don't report the entire
MMCONFIG area claimed by the chipset. If we move PCI devices into
that claimed-but-unreported area, they don't work.
This quirk reads the AMD MMCONFIG MSRs and adds PNP0C01 resources as
needed to cover the entire area.
Example problem scenario:
BIOS-e820: 00000000cfec5400 - 00000000d4000000 (reserved)
Fam 10h mmconf [d0000000, dfffffff]
PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xd0000000-0xd3ffffff] (base 0xd0000000)
pnp 00:0c: [mem 0xd0000000-0xd3ffffff]
pci 0000:00:12.0: reg 10: [mem 0xffb00000-0xffb00fff]
pci 0000:00:12.0: no compatible bridge window for [mem 0xffb00000-0xffb00fff]
pci 0000:00:12.0: BAR 0: assigned [mem 0xd4000000-0xd40000ff]
Reported-by: Lisa Salimbas <lisa.salimbas@canonical.com>
Reported-by: <thuban@singularity.fr>
Tested-by: dann frazier <dann.frazier@canonical.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=31602
References: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/647043
References: https://bugzilla.redhat.com/show_bug.cgi?id=770308
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 45fae74939 upstream.
Info about new measurements are cached in the iint for performance. When
the inode is flushed from cache, the associated iint is flushed as well.
Subsequent access to the inode will cause the inode to be re-measured and
will attempt to add a duplicate entry to the measurement list.
This patch frees the duplicate measurement memory, fixing a memory leak.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9e7860cee1 upstream.
Haogang Chen found out that:
There is a potential integer overflow in process_msg() that could result
in cross-domain attack.
body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH);
When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent
call to xb_read() would write to a zero-length buffer.
The other end of this connection is always the xenstore backend daemon
so there is no guest (malicious or otherwise) which can do this. The
xenstore daemon is a trusted component in the system.
However this seem like a reasonable robustness improvement so we should
have it.
And Ian when read the API docs found that:
The payload length (len field of the header) is limited to 4096
(XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the
limit, its xenstored connection will be immediately killed by
xenstored, which is usually catastrophic from the client's point of
view. Clients (particularly domains, which cannot just reconnect)
should avoid this.
so this patch checks against that instead.
This also avoids a potential integer overflow pointed out by Haogang Chen.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit aff132d95f upstream.
The amount of memory required for tracking chain buffers is rather
large, and when the host credit count is big, memory allocation
failure occurs inside __get_free_pages.
The fix is to limit the number of chains to 100,000. In addition,
the number of host credits is limited to 30,000 IOs. However this
limitation can be overridden this using the command line option
max_queue_depth. The algorithm for calculating the
reply_post_queue_depth is changed so that it is equal to
(reply_free_queue_depth + 16), previously it was (reply_free_queue_depth * 2).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 30c43282f3 upstream.
Added code to release the spinlock that is used to protect the
raid device list before calling a function that can block. The
blocking was causing a reschedule, and subsequently it is tried
to acquire the same lock, resulting in a panic (NMI Watchdog
detecting a CPU lockup).
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5cf9a4e69c upstream.
We only need amd_bus.o for AMD systems with PCI. arch/x86/pci/Makefile
already depends on CONFIG_PCI=y, so this patch just adds the dependency
on CONFIG_AMD_NB.
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 24d25dbfa6 upstream.
This factors out the AMD native MMCONFIG discovery so we can use it
outside amd_bus.c.
amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the
PCI resources. We may also need the MMCONFIG information to work
around BIOS defects in the ACPI MCFG table.
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ae5cd86455 upstream.
This assures that a _CRS reserved host bridge window or window region is
not used if it is not addressable by the CPU. The new code either trims
the window to exclude the non-addressable portion or totally ignores the
window if the entire window is non-addressable.
The current code has been shown to be problematic with 32-bit non-PAE
kernels on systems where _CRS reserves resources above 4GB.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Renninger <trenn@novell.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a776c491ca upstream.
I traced a nasty kexec on panic boot failure to the fact that we had
screaming msi interrupts and we were not disabling the msi messages at
kernel startup. The booting kernel had not enabled those interupts so
was not prepared to handle them.
I can see no reason why we would ever want to leave the msi interrupts
enabled at boot if something else has enabled those interrupts. The pci
spec specifies that msi interrupts should be off by default. Drivers
are expected to enable the msi interrupts if they want to use them. Our
interrupt handling code reprograms the interrupt handlers at boot and
will not be be able to do anything useful with an unexpected interrupt.
This patch applies cleanly all of the way back to 2.6.32 where I noticed
the problem.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e57e0d8e81 upstream.
When we fail to erase a PEB, we free the corresponding erase entry object,
but then re-schedule this object if the error code was something like -EAGAIN.
Obviously, it is a bug to use the object after we have freed it.
Reported-by: Emese Revfy <re.emese@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e801e128b2 upstream.
Under some cases, when scrubbing the PEB if we did not get the lock on
the PEB it fails to scrub. Add that PEB again to the scrub list
Artem: minor amendments.
Signed-off-by: Bhavesh Parekh <bparekh@nvidia.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8a0d551a59 upstream.
Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.
Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.
I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 43717c7dae upstream.
Lukas Razik <linux@razik.name> reports that on his SPARC system,
booting with an NFS root file system stopped working after commit
56463e50 "NFS: Use super.c for NFSROOT mount option parsing."
We found that the network switch to which Lukas' client was attached
was delaying access to the LAN after the client's NIC driver reported
that its link was up. The delay was longer than the timeouts used in
the NFS client during mounting.
NFSROOT worked for Lukas before commit 56463e50 because in those
kernels, the client's first operation was an rpcbind request to
determine which port the NFS server was listening on. When that
request failed after a long timeout, the client simply selected the
default NFS port (2049). By that time the switch was allowing access
to the LAN, and the mount succeeded.
Neither of these client behaviors is desirable, so reverting 56463e50
is really not a choice. Instead, introduce a mechanism that retries
the NFSROOT mount request several times. This is the same tactic that
normal user space NFS mounts employ to overcome server and network
delays.
Signed-off-by: Lukas Razik <linux@razik.name>
[ cel: match kernel coding style, add proper patch description ]
[ cel: add exponential back-off ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Lukas Razik <linux@razik.name>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3df96909b7 upstream.
It would previously write basically random bits to PCI configuration space...
Not very surprising that the GPU tended to stop responding completely. The
resulting MCE even froze the whole machine sometimes.
Now resetting the GPU after a lockup has at least a fighting chance of
succeeding.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 28eebb703e upstream.
We often end up missing fences on older asics with
writeback enabled which leads to delays in the userspace
accel code, so just disable it by default on those asics.
Reported-by: Helge Deller <deller@gmx.de>
Reported-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 92db7f6c86 upstream.
This change was verified to fix both issues with no video I've
investigated. I've also checked checksum calculation with fglrx on:
RV620, HD54xx, HD5450, HD6310, HD6320.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3a90274de3 upstream.
When an invalid NID is given, get_wcaps() returns zero as the error,
but get_wcaps_type() takes it as the normal value and returns a bogus
AC_WID_AUD_OUT value. This confuses the parser.
With this patch, get_wcaps_type() returns -1 when value 0 is given,
i.e. an invalid NID is passed to get_wcaps().
Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=740118
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e7848163aa upstream.
Cards with identical PCI ids but no AC97 config in EEPROM do not have
the ac97 field initialized. We must check for this case to avoid kernel oops.
Signed-off-by: Pavel Hofman <pavel.hofman@ivitera.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d50f2ab6f0 upstream.
Commit 503358ae01 ("ext4: avoid divide by
zero when trying to mount a corrupted file system") fixes CVE-2009-4307
by performing a sanity check on s_log_groups_per_flex, since it can be
set to a bogus value by an attacker.
sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex < 2) { ... }
This patch fixes two potential issues in the previous commit.
1) The sanity check might only work on architectures like PowerPC.
On x86, 5 bits are used for the shifting amount. That means, given a
large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
is essentially 1 << 4 = 16, rather than 0. This will bypass the check,
leaving s_log_groups_per_flex and groups_per_flex inconsistent.
2) The sanity check relies on undefined behavior, i.e., oversized shift.
A standard-confirming C compiler could rewrite the check in unexpected
ways. Consider the following equivalent form, assuming groups_per_flex
is unsigned for simplicity.
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex == 0 || groups_per_flex == 1) {
We compile the code snippet using Clang 3.0 and GCC 4.6. Clang will
completely optimize away the check groups_per_flex == 0, leaving the
patched code as vulnerable as the original. GCC keeps the check, but
there is no guarantee that future versions will do the same.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2f4478ccff upstream.
stresstest needs at least two eraseblocks. Bail out gracefully if that
condition is not met. Fixes the following 'division by zero' OOPS:
[ 619.100000] mtd_stresstest: MTD device size 131072, eraseblock size 131072, page size 2048, count of eraseblocks 1, pages per eraseblock 64, OOB size 64
[ 619.120000] mtd_stresstest: scanning for bad eraseblocks
[ 619.120000] mtd_stresstest: scanned 1 eraseblocks, 0 are bad
[ 619.130000] mtd_stresstest: doing operations
[ 619.130000] mtd_stresstest: 0 operations done
[ 619.140000] Division by zero in kernel.
...
caused by
/* Read or write up 2 eraseblocks at a time - hence 'ebcnt - 1' */
eb %= (ebcnt - 1);
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 342ff28f5a upstream.
Some error paths in mtd_blkdevs were fixed in the following commit:
commit 94735ec404
mtd: mtd_blkdevs: fix error path in blktrans_open
But on these error paths, the block device's `dev->open' count is
already incremented before we check for errors. This meant that, while
the error path was handled correctly on the first time through
blktrans_open(), the device is erroneously considered already open on
the second time through.
This problem can be seen, for instance, when a UBI volume is
simultaneously mounted as a UBIFS partition and read through its
corresponding gluebi mtdblockX device. This results in blktrans_open()
passing its error checks (with `dev->open > 0') without actually having
a handle on the device. Here's a summarized log of the actions and
results with nandsim:
# modprobe nandsim
# modprobe mtdblock
# modprobe gluebi
# modprobe ubifs
# ubiattach /dev/ubi_ctrl -m 0
...
# ubimkvol /dev/ubi0 -N test -s 16MiB
...
# mount -t ubifs ubi0:test /mnt
# ls /dev/mtdblock*
/dev/mtdblock0 /dev/mtdblock1
# cat /dev/mtdblock1 > /dev/null
cat: can't open '/dev/mtdblock4': Device or resource busy
# cat /dev/mtdblock1 > /dev/null
CPU 0 Unable to handle kernel paging request at virtual address
fffffff0, epc == 8031536c, ra == 8031f280
Oops[#1]:
...
Call Trace:
[<8031536c>] ubi_leb_read+0x14/0x164
[<8031f280>] gluebi_read+0xf0/0x148
[<802edba8>] mtdblock_readsect+0x64/0x198
[<802ecfe4>] mtd_blktrans_thread+0x330/0x3f4
[<8005be98>] kthread+0x88/0x90
[<8000bc04>] kernel_thread_helper+0x10/0x18
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 556f063580 upstream.
The array of unsigned long pointed by oops_page_used is allocated
by vmalloc which requires the size to be in bytes.
BITS_PER_LONG is equal to 32.
If we want to allocate memory for 32 pages with one bit per page then
32 / BITS_PER_LONG is equal to 1 byte that is 8 bits.
To fix it we need to multiply the result by sizeof(unsigned long) equal to 4.
Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 093019cf1b upstream.
Commit fa8b18ed didn't prevent the integer overflow and possible
memory corruption. "count" can go negative and bypass the check.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[Not upstream as it was fixed differently for 3.3 with a much more
"intrusive" rework of the driver - gregkh]
There is a race condition involving acm_tty_hangup() and acm_tty_close()
where hangup() would attempt to access tty->driver_data without proper
locking and NULL checking after close() has potentially already set it
to NULL. One possibility to (sporadically) trigger this behavior is to
perform a suspend/resume cycle with a running WWAN data connection.
This patch addresses the issue by introducing a NULL check for
tty->driver_data in acm_tty_hangup() protected by open_mutex and exiting
gracefully when hangup() is invoked on a device that has already been
closed.
Signed-off-by: Thilo-Alexander Ginkel <thilo@ginkel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9ae89b0296 upstream.
mpt2sas_base_detach() call was removed from _scsih_remove() while
doing some code shuffling. Mainly when we work on adding code for
scsih_shutdown(). I have added back mpt2sas_base_detach() which will
get callled from _scsih_remove().
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
commit 79cfbdfa87 upstream.
The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
functions depend on the value of the 'tasks_frozen' argument passed to them
(which indicates whether tasks have been frozen or not).
(Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
CPU_DEAD, CPU_DEAD_FROZEN).
Thus, it is essential that while the callbacks for those notifications are
running, the state of the system with respect to the tasks being frozen or
not remains unchanged, *throughout that duration*. Hence there is a need for
synchronizing the CPU hotplug code with the freezer subsystem.
Since the freezer is involved only in the Suspend/Hibernate call paths, this
patch hooks the CPU hotplug code to the suspend/hibernate notifiers
PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
notifications will always be run with the state of the system really being
what the notifications indicate, _throughout_ their execution time.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f7d9821a6a upstream.
If slave device already has a receive handler registered, then the
error unwind of bonding device enslave function is broken.
The following will leave a pointer to freed memory in the slave
device list, causing a later kernel panic.
# modprobe dummy
# ip li add dummy0-1 link dummy0 type macvlan
# modprobe bonding
# echo +dummy0 >/sys/class/net/bond0/bonding/slaves
The fix is to detach the slave (which removes it from the list)
in the unwind path.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6c15d74def upstream.
At this point if skb->len happens to be 2, the subsequant skb_pull(skb, 4)
call won't work and the skb->len won't be decreased and won't ever reach 0,
resulting in an infinite loop.
With an ASIX 88772 under heavy load, without this patch, rx_fixup() reaches
an infinite loop in less than a minute. With this patch applied,
no infinite loop even after hours of heavy load.
Signed-off-by: Aurelien Jacobs <aurel@gnuage.org>
Cc: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit a8c1f65c79 upstream.
Commit 5b7c840667 ('ipv4: correct IGMP
behavior on v3 query during v2-compatibility mode') added yet another
case for query parsing, which can result in max_delay = 0. Substitute
a value of 1, as in the usual v3 case.
Reported-by: Simon McVittie <smcv@debian.org>
References: http://bugs.debian.org/654876
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit c618759774 upstream.
Problems with NVIDIA's OHCI host controllers persist. After looking
carefully through the spec, I finally realized that when a controller
is reset it then automatically goes into a SUSPEND state in which it
is completely quiescent (no DMA and no IRQs) and from which it will
not awaken until the system puts it into the OPERATIONAL state.
Therefore there's no need to worry about controllers being in the
RESET state for extended periods, or remaining in the OPERATIONAL
state during system shutdown. The proper action for device
initialization is to put the controller into the RESET state (if it's
not there already) and then to issue a software reset. Similarly, the
proper action for device shutdown is simply to do a software reset.
This patch (as1499) implements such an approach. It simplifies
initialization and shutdown, and allows the NVIDIA shutdown-quirk code
to be removed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Andre "Osku" Schmidt <andre.osku.schmidt@googlemail.com>
Tested-by: Arno Augustin <Arno.Augustin@web.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 18b7ede5f7 upstream.
[ removed the dwc3 portion of the patch as it didn't apply to
older kernels - gregkh]
According to USB 3.0 Specification Table 9-22, if
bmAttributes [4:0] are set to zero, it means "no
streams supported", but the way this helper was
defined on Linux, we will *always* have one stream
which might cause several problems.
For example on DWC3, we would tell the controller
endpoint has streams enabled and yet start transfers
with Stream ID set to 0, which would goof up the host
side.
While doing that, convert the macro to an inline
function due to the different checks we now need.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3c8c931671 upstream.
Add support for Chinese Noname HSPA USB modem which is apparently
manufactured by a company called ZD Incorporated (based on texts in the
Windows drivers).
This product is available at least from Dealextreme (SKU 80032) and
possibly in India with name Olive V-MW250. It is based on Qualcomm
MSM6280 chip.
I needed to also add "options usb-storage quirks=0685:7000:i" in modprobe
configuration because udevd or the kernel keeps poking the embedded
fake-cd-rom which fails and causes the device to reset. There might be
a better way to accomplish the same. usb_modeswitch is not needed with
this device.
Signed-off-by: Janne Snabb <snabb@epipe.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5b06162335 upstream.
Add VendorID/ProductID for USB 3G dongle Model VT1000 of Viettel.
Signed-off-by: VU Tuan Duc <ducvt@viettel.com.vn>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 71d85724bd upstream.
I encountered a result of COMP_2ND_BW_ERR while improving how the pwc
webcam driver handles not having the full usb1 bandwidth available to
itself.
I created the following test setup, a NEC xhci controller with a
single TT USB 2 hub plugged into it, with a usb keyboard and a pwc webcam
plugged into the usb2 hub. This caused the following to show up in dmesg
when trying to stream from the pwc camera at its highest alt setting:
xhci_hcd 0000:01:00.0: ERROR: unexpected command completion code 0x23.
usb 6-2.1: Not enough bandwidth for altsetting 9
And usb_set_interface returned -EINVAL, which caused my pwc code to not
do the right thing as it expected -ENOSPC.
This patch makes the xhci driver properly handle COMP_2ND_BW_ERR and makes
usb_set_interface return -ENOSPC as expected.
This should be backported to stable kernels as old as 2.6.32.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bc677d5b64 upstream.
Add a new field num_mapped_sgs to struct urb so that we have a place to
store the number of mapped entries and can also retain the original
value of entries in num_sgs. Previously, usb_hcd_map_urb_for_dma()
would overwrite this with the number of mapped entries, which would
break dma_unmap_sg() because it requires the original number of entries.
This fixes warnings like the following when using USB storage devices:
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695()
ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1]
Modules linked in: ohci_hcd ehci_hcd
Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319
Call Trace:
<IRQ> [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98
[<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43
[<ffffffff811fa5ae>] check_unmap+0x4e4/0x695
[<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50
[<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117
[<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188
[<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22
[<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0
[<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd]
[<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd]
[<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd]
...
---[ end trace f29ac88a5a48c580 ]---
Mapped at:
[<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139
[<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478
[<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa
[<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de
[<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 08e87d0d77 upstream.
Hi, below patch adds the USB-ID of the serial adapters sold by
Multiplex RC (www.multiplex-rc.de).
Signed-off-by: Malte Schröder <maltesch@gmx.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 694c6301e5 upstream.
Fix regression introduced by commit 507ca9bc04 ([PATCH] USB: add
ability for usb-serial drivers to determine if their write urb is
currently being used.) which inverted the logic in write_room so that it
returns zero when the write urb is actually free.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 772aed45b6 upstream.
In musb_init_controller() there's a pm_runtime_put(), but there's no
pm_runtime_get(), which creates a mismatch that causes the driver to
sleep when it shouldn't.
This was introduced in 7acc619[1], but it wasn't triggered in my setup
until 18a2689[2] was merged to Linus' branch at point df0914[3]. IOW;
when PM is working as it was supposed to.
However, it seems most of the time this is used in a way that keeps the
counter above 0, so nobody noticed. Also, it seems to depend on the
configuration used in versions before 3.1, but not later (or in it).
I found the problem by loading isp1704_charger before any usb gadgets:
http://article.gmane.org/gmane.linux.kernel/1226122
All versions after 2.6.39 are affected.
[1] usb: musb: Idle path retention and offmode support for OMAP3
[2] OMAP2+: musb: hwmod adaptation for musb registration
[3] Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
Cc: Hema HK <hemahk@ti.com>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
commit 35284b3d2f upstream.
The Guillemot Webcam Hercules Dualpix Exchange camera
has been reported with a second ID.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 59bf5cf94f upstream.
We were sending data on the stack when uploading firmware, which causes
some machines fits, and is not allowed. Fix this by using the buffer we
already had around for this very purpose.
Reported-by: Wouter M. Koolen <wmkoolen@cwi.nl>
Tested-by: Wouter M. Koolen <wmkoolen@cwi.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e7c8e8605d upstream.
On some failures, the country_code field of an acm structure is freed
without freeing the acm structure itself. Elsewhere, operations including
memcpy and kfree are performed on the country_code field. The patch sets
the country_code field to NULL when it is freed, and likewise sets the
country_code_size field to 0.
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d2eb8c3593 upstream.
During BKL removal in 2.6.38, conversion of files from in-ICB format to normal
format got broken. We call ->writepage with i_data_sem held but udf_get_block()
also acquires i_data_sem thus creating A-A deadlock.
We fix the problem by dropping i_data_sem before calling ->writepage() which is
safe since i_mutex still protects us against any changes in the file. Also fix
pagelock - i_data_sem lock inversion in udf_expand_file_adinicb() by dropping
i_data_sem before calling find_or_create_page().
Reported-by: Matthias Matiak <netzpython@mail-on.us>
Tested-by: Matthias Matiak <netzpython@mail-on.us>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0d19ea8665 upstream.
If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3
Here's an example:
# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
# mount -t cgroup -o name=myhier xxx /cgroup2
But it was broken by commit 32a8cf235e
(cgroup: make the mount options parsing more accurate)
This fixes the regression.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d8cae98cdd upstream.
The documentation for usbmon is out of date; the usbfs "devices" file
now exists in /sys/kernel/debug/usb rather than /proc/bus/usb. This
patch (as1505) updates the documentation accordingly, and also
mentions that the necessary information can be found by running lsusb.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 33c104d415 upstream.
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a9e36da655 upstream.
This patch fixes a crash in reiserfs_delete_xattrs during umount.
When shrink_dcache_for_umount clears the dcache from
generic_shutdown_super, delayed evictions are forced to disk. If an
evicted inode has extended attributes associated with it, it will
need to walk the xattr tree to locate and remove them.
But since shrink_dcache_for_umount will BUG if it encounters active
dentries, the xattr tree must be released before it's called or it will
crash during every umount.
This patch forces the evictions to occur before generic_shutdown_super
by calling shrink_dcache_sb first. The additional evictions caused
by the removal of each associated xattr file and dir will be automatically
handled as they're added to the LRU list.
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a06d789b42 upstream.
When jqfmt mount option is not specified on remount, we mistakenly clear
s_jquota_fmt value stored in superblock. Fix the problem.
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 49908a1b25 upstream.
A update is made to the sched:sched_switch event that adds some
logic to the first parameter of the __print_flags() that shows the
state of tasks. This change cause perf to fail parsing the flags.
A simple fix is needed to have the parser be able to process ops
within the argument.
Reported-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit eddfb67525 upstream.
Prevent a receive data corruption by ensuring that the write to update
the rcvhdrheadn register to generate an interrupt is at the very end
of the receive processing.
Signed-off-by: Ramkrishna Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e8303a3b21 upstream.
Adds the device id needed for the USB Ethernet Adapter delivered by
ASUS with their Zenbook.
Signed-off-by: Aurelien Jacobs <aurel@gnuage.org>
Acked-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e4f387d8db upstream.
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen
as following, which could cause incorrect preempt count.
__trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry =>
get_cpu_var => preempt_disable
__trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit =>
put_cpu_var => preempt_enable
where:
A => B and A -> B means A calls B, but
=> means A will call B through function name, and B will definitely be
called.
-> means A will call B through function pointer, so B might not be
called if the function pointer is not set.
So error happens when only one of probe_hcall_entry and probe_hcall_exit
get called during a hcall.
This patch tries to move the preempt count operations from
probe_hcall_entry and probe_hcall_exit to its callers.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 37fb9a0231 upstream.
When re-enabling interrupts we have code to handle edge sensitive
decrementers by resetting the decrementer to 1 whenever it is negative.
If interrupts were disabled long enough that the decrementer wrapped to
positive we do nothing. This means interrupts can be delayed for a long
time until it finally goes negative again.
While we hope interrupts are never be disabled long enough for the
decrementer to go positive, we have a very good test team that can
drive any kernel into the ground. The softlockup data we get back
from these fails could be seconds in the future, completely missing
the cause of the lockup.
We already keep track of the timebase of the next event so use that
to work out if we should trigger a decrementer exception.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f6efe96edd upstream.
An nvs with malformed contents could cause the processing of the
calibration data to read beyond the end of the buffer. Prevent this
from happening by adding bound checking.
Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2131d3c2f9 upstream.
Check for out of bound FEM index to prevent reading beyond ini
memory end.
Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c055fe0797 upstream.
We used to try to request 8 times more vram than needed, which would
fail if the card has a too small BAR (observed with qemu & kvm).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 1bb0b7d215 upstream.
When using a >8bpp framebuffer, offb advertises truecolor, not directcolor,
and doesn't touch the color map even if it has a corresponding access method
for the real hardware.
Thus it needs to set the pseudo-palette with all 3 components of the color,
like other truecolor framebuffers, not with copies of the color index like
a directcolor framebuffer would do.
This went unnoticed for a long time because it's pretty hard to get offb
to kick in with anything but 8bpp (old BootX under MacOS will do that and
qemu does it).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3f81f8f152 upstream.
Testing on the openSUSE wireless forum has shown that a Linksys
WUSB54GC v3 with USB ID 1737:0077 works with rt2800usb when the ID is
written to /sys/.../new_id. This ID can therefore be moved out of UNKNOWN.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit eea915bb0d upstream.
This oops was reported recently:
firmware_loading_store+0xf9/0x17b
dev_attr_store+0x20/0x22
sysfs_write_file+0x101/0x134
vfs_write+0xac/0xf3
sys_write+0x4a/0x6e
system_call_fastpath+0x16/0x1b
The complete backtrace was unfortunately not captured, but details can be found
here:
https://bugzilla.redhat.com/show_bug.cgi?id=769920
The cause is fairly clear.
Its caused by the fact that firmware_loading_store has a case 0 in its
switch statement that reads and writes the fw_priv->fw poniter without the
protection of the fw_lock mutex. since there is a window between the time that
_request_firmware sets fw_priv->fw to NULL and the time the corresponding sysfs
file is unregistered, its possible for a user space application to race in, and
write a zero to the loading file, causing a NULL dereference in
firmware_loading_store. Fix it by extending the protection of the fw_lock mutex
to cover all of the firware_loading_store function.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2eb7f204db upstream.
The Japanese/Korean/Chinese versions still need updating.
Also, the stable kernel 2.6.x.y descriptions are out of date
and should be updated as well.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bc7a2f3abc upstream.
The old address hasn't worked since the great intrusion of August 2011.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 50b8d25748 upstream.
Test-case:
int main(void)
{
int pid, status;
pid = fork();
if (!pid) {
for (;;) {
if (!fork())
return 0;
if (waitpid(-1, &status, 0) < 0) {
printf("ERR!! wait: %m\n");
return 0;
}
}
}
assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
assert(waitpid(-1, NULL, 0) == pid);
assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
PTRACE_O_TRACEFORK) == 0);
do {
ptrace(PTRACE_CONT, pid, 0, 0);
pid = waitpid(-1, NULL, 0);
} while (pid > 0);
return 1;
}
It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().
The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.
This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.
I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
Reported-and-tested-by: Denys Vlasenko <vda.linux@googlemail.com>
Tested-by: Lukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 157e8bf8b4 upstream.
This reverts commit c0afabd3d5.
It causes failures on Toshiba laptops - instead of disabling the alarm,
it actually seems to enable it on the affected laptops, resulting in
(for example) the laptop powering on automatically five minutes after
shutdown.
There's a patch for it that appears to work for at least some people,
but it's too late to play around with this, so revert for now and try
again in the next merge window.
See for example
http://bugs.debian.org/652869
Reported-and-bisected-by: Andreas Friedrich <afrie@gmx.net> (Toshiba Tecra)
Reported-by: Antonio-M. Corbi Bellot <antonio.corbi@ua.es> (Toshiba Portege R500)
Reported-by: Marco Santos <marco.santos@waynext.com> (Toshiba Portege Z830)
Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> (Toshiba Portege R830)
Cc: Jonathan Nieder <jrnieder@gmail.com>
Requested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit be4f1ac828 upstream.
Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync(). Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.
The older_than_this checks that are now more strictly enforced since
writeback: avoid livelocking WB_SYNC_ALL writeback
by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once. But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened. I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.
Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty. This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 0b8fd3033c upstream.
If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path. This means any inode that is pinned
is skipped. With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it. The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds. This means under certain scenarious time based
metadata writeback never happens.
Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 63a741757d upstream.
This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:
ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO
The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid which was to piggyback on git commit
e79f86b2ef "swiotlb: Use page alignment
for early buffer allocation" which:
We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignment.
So alloc them with page alignment at first, to avoid lose two pages
And doing that fixes the outstanding issue.
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d0e84caeb4 upstream.
If the twl4030-madc device wasn't registered, and another device, such
as twl4030-madc-hwmon, calls twl4030_madc_conversion() a NULL pointer is
dereferenced.
Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 66cc5b8e50 upstream.
Worst case this fixes the following error:
[ 72.086212] (NULL device *): conversion timeout!
Best case it prevents a crash
Signed-off-by: Kyle Manna <kyle@kylemanna.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
commit e178ccb335 upstream.
A mutex is locked on entry into twl4030_madc_conversion().
Immediate return on some error conditions leaves the
mutex locked.
This patch ensures that mutex is always unlocked before
leaving the function.
Signed-off-by: Sanjeev Premi <premi@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 96f1f05af7 upstream.
Since we configure all the queues as CHAINABLE, we need to update the
byte count for all the queues, not only the AGGREGATABLE ones.
Not doing so can confuse the SCD and make the fw assert.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[ Upstream commit 9f28a2fc0b ]
Commit 2c8cec5c10 (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.
As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.
Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.
Reported-by: Robert Gladewitz <gladewitz@gmx.de>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit d01ff0a049 ]
After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0,
we should flush route cache, or it will continue receive packets with local
source address, which should be dropped.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit a76c0adf60 ]
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.
The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.
When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.
Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.
The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.
Chunk
Size Unpatched No Overhead
-------------------------------------
4 15.2 Kbit [!] 12.2 Mbit [!]
8 35.8 Kbit [!] 26.0 Mbit [!]
16 95.5 Kbit [!] 54.4 Mbit [!]
32 106.7 Mbit 102.3 Mbit
64 189.2 Mbit 188.3 Mbit
128 331.2 Mbit 334.8 Mbit
256 537.7 Mbit 536.0 Mbit
512 766.9 Mbit 766.6 Mbit
1024 810.1 Mbit 808.6 Mbit
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 2692ba61a8 ]
Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value. If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.
This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.
1) Avoid overflowing autoclose * HZ.
2) Keep the default autoclose bound consistent across 32- and 64-bit
platforms (INT_MAX / HZ in this patch).
3) Keep the autoclose value consistent between setsockopt() and
getsockopt() calls.
Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 3f1e6d3fd3 ]
gred_change_vq() is called under sch_tree_lock(sch).
This means a spinlock is held, and we are not allowed to sleep in this
context.
We might pre-allocate memory using GFP_KERNEL before taking spinlock,
but this is not suitable for stable material.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit cd7816d149 ]
previous commit 3fb72f1e6e
makes IP-Config wait for carrier on at least one network device.
Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.
Cc: Micha Nelissen <micha@neli.hopto.org>
Cc: Holger Brunck <holger.brunck@keymile.com>
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 7838f2ce36 ]
Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 9cef310fcd ]
Received non stream protocol packets were calling llc_cmsg_rcv that used a
skb after that skb was released by sk_eat_skb. This caused received STP
packets to generate kernel panics.
Signed-off-by: Alexandru Juncu <ajuncu@ixiacom.com>
Signed-off-by: Kunjan Naik <knaik@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit a03ffcf873 ]
x86 jump instruction size is 2 or 5 bytes (near/long jump), not 2 or 6
bytes.
In case a conditional jump is followed by a long jump, conditional jump
target is one byte past the start of target instruction.
Signed-off-by: Markus Kötter <nepenthesdev@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ A combination of upstream commits 1d299bc773 and
e88d246871 ]
Although we provide a proper way for a debugger to control whether
syscall restart occurs, we run into problems because orig_i0 is not
saved and restored properly.
Luckily we can solve this problem without having to make debuggers
aware of the issue. Across system calls, several registers are
considered volatile and can be safely clobbered.
Therefore we use the pt_regs save area of one of those registers, %g6,
as a place to save and restore orig_i0.
Debuggers transparently will do the right thing because they save and
restore this register already.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 21f74d361d ]
This is setting things up so that we can correct the return
value, so that it properly returns the original destination
buffer pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 3e37fd3153 ]
To handle the large physical addresses, just make a simple wrapper
around remap_pfn_range() like MIPS does.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 0b64120cce ]
Some of the sun4v code patching occurs in inline functions visible
to, and usable by, modules.
Therefore we have to patch them up during module load.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit b1f44e13a5 ]
The "(insn & 0x01800000) != 0x01800000" test matches 'restore'
but that is a legitimate place to see the %lo() part of a 32-bit
symbol relocation, particularly in tail calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[ Upstream commit 7cc8583372 ]
This silently was working for many years and stopped working on
Niagara-T3 machines.
We need to set the MSIQ to VALID before we can set it's state to IDLE.
On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
errors. The hypervisor documentation says, rather ambiguously, that
the MSIQ must be "initialized" before one can set the state.
I previously understood this to mean merely that a successful setconf()
operation has been performed on the MSIQ, which we have done at this
point. But it seems to also mean that it has been set VALID too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Upstrem commit: 911ae9434f
There's a bug in the MSIX backup and restore routines that cause a crash on
non-x86 (direct access to PCI space not via read/write). These routines are
unnecessary and were removed by the above commit, so also remove them from
stable to fix the crash.
Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 77e00f2ea9 upstream.
We already do this for cayman, need to also do it for
BTC parts. The default memory and voltage setup is not
adequate for advanced operation. Continuing will
result in an unusable display.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e67d668e14 upstream.
This patch makes use of the set_memory_x() kernel API in order
to make necessary BIOS calls to source NMIs.
This is needed for SLES11 SP2 and the latest upstream kernel as it appears
the NX Execute Disable has grown in its control.
Signed-off by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Signed-off by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e6780f7243 upstream.
It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.
While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?
In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.
But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 55205c916e upstream.
This change fixes a linking problem, which happens if oprofile
is selected to be compiled as built-in:
`oprofile_arch_exit' referenced in section `.init.text' of
arch/arm/oprofile/built-in.o: defined in discarded section
`.exit.text' of arch/arm/oprofile/built-in.o
The problem is appeared after commit 87121ca504, which
introduced oprofile_arch_exit() calls from __init function. Note
that the aforementioned commit has been backported to stable
branches, and the problem is known to be reproduced at least
with 3.0.13 and 3.1.5 kernels.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@nokia.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Link: http://lkml.kernel.org/r/20111222151540.GB16765@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5776ac2eb3 upstream.
According to imx pwm RM, the real period value should be
PERIOD value in PWMPR plus 2.
PWMO (Hz) = PCLK(Hz) / (period +2)
Signed-off-by: Jason Chen <jason.chen@linaro.org>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e30e2fdfe5 upstream.
Currently, the *_global_[un]lock_online() routines are not at all synchronized
with CPU hotplug. Soft-lockups detected as a consequence of this race was
reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
for finding out that the root-cause of this issue is the race condition
between br_write_[un]lock() and CPU hotplug, which results in the lock states
getting messed up).
Fixing this race by just adding {get,put}_online_cpus() at appropriate places
in *_global_[un]lock_online() is not a good option, because, then suddenly
br_write_[un]lock() would become blocking, whereas they have been kept as
non-blocking all this time, and we would want to keep them that way.
So, overall, we want to ensure 3 things:
1. br_write_lock() and br_write_unlock() must remain as non-blocking.
2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
for different sets of CPUs.
3. Either prevent any new CPU online operation in between this lock-unlock, or
ensure that the newly onlined CPU does not proceed with its corresponding
per-cpu spinlock unlocked.
To achieve all this:
(a) We introduce a new spinlock that is taken by the *_global_lock_online()
routine and released by the *_global_unlock_online() routine.
(b) We register a callback for CPU hotplug notifications, and this callback
takes the same spinlock as above.
(c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
initialized in the lock_init() code, all future updates to it are done in
the callback, under the above spinlock.
(d) The above bitmap is used (instead of cpu_online_mask) while locking and
unlocking the per-cpu locks.
The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
br_write_lock-unlock sequence is in progress, the callback keeps spinning,
thus preventing the CPU online operation till the lock-unlock sequence is
complete. This takes care of requirement (3).
The bitmap that we maintain remains unmodified throughout the lock-unlock
sequence, since all updates to it are managed by the callback, which takes
the same spinlock as the one taken by the lock code and released only by the
unlock routine. Combining this with (d) above, satisfies requirement (2).
Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
operations from racing with br_write_lock-unlock, requirement (1) is also
taken care of.
By the way, it is to be noted that a CPU offline operation can actually run
in parallel with our lock-unlock sequence, because our callback doesn't react
to notifications earlier than CPU_DEAD (in order to maintain our bitmap
properly). And this means, since we use our own bitmap (which is stale, on
purpose) during the lock-unlock sequence, we could end up unlocking the
per-cpu lock of an offline CPU (because we had locked it earlier, when the
CPU was online), in order to satisfy requirement (2). But this is harmless,
though it looks a bit awkward.
Debugged-by: Cong Meng <mc@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 78feb35b81 upstream.
My previous patch
34a5b4b6af iwlwifi: do not re-configure
HT40 after associated
Fix the case of HT40 after association on specified AP, but it break the
association for some APs and cause not able to establish connection.
We need to address HT40 before and after addociation.
Reported-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Tested-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 123877b80e upstream.
Check the IEEE80211_TX_CTL_ASSIGN_SEQ flag from mac80211, then decide how to
set the TX_CMD_FLG_SEQ_CTL_MSK bit. Setting the wrong bit in BAR frame whill
make the firmware to increment the sequence number which is incorrect and
cause unknown behavior.
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 10636bc2d6 upstream.
The stations always chooses 1Mbps for all trasmitting frames,
whenever the AP is configured to lock the supported rates.
As the max phy rate is always set with the 4th from highest phy rate,
this assumption might be wrong if we have less than that. Fix that.
Cc: Paul Stewart <pstew@google.com>
Reported-by: Ajay Gummalla <agummalla@google.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f83f71fda2 upstream.
With 16-bit RGB565 colour format pixels are stored by the device in memory
in the following order:
| b3 | b2 | b1 | b0 |
~+-----+-----+-----+-----+
| R5 G6 B5 | R5 G6 B5 |
This corresponds to V4L2_PIX_FMT_RGB565 fourcc, not V4L2_PIX_FMT_RGB565X.
This change is required to avoid trouble when setting up video pipeline
with the s5p-tv devices, so the colour formats at both devices can be
properly matched.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e6f67b8c05 upstream.
lockdep reports a deadlock in jfs because a special inode's rw semaphore
is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not
used when __read_cache_page() calls add_to_page_cache_lru().
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e0197aae59 upstream.
There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.
This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.
$ cat zombie.c
\#include <unistd.h>
int main()
{
if (fork())
pause();
return 0;
}
$
We are hitting this bug pretty regularly on ChromeOS.
This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:
https://lkml.org/lkml/2011/11/1/356
I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Downstream-Bug-Report: http://crosbug.com/23953
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Olof Johansson <olofj@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 111d489f0f upstream.
Currently, the code assumes that the SEQUENCE status bits are mutually
exclusive. They are not...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 913050b91e upstream.
If oprofilefs_ulong_from_user() is called with count equals
zero, *val remains unchanged. Depending on the implementation it
might be uninitialized.
Change oprofilefs_ulong_from_user()'s interface to return count
on success. Thus, we are able to return early if count equals
zero which avoids using *val uninitialized. Fixing all users of
oprofilefs_ulong_ from_user().
This follows write syscall implementation when count is zero:
"If count is zero ... [and if] no errors are detected, 0 will be
returned without causing any other effect." (man 2 write)
Reported-By: Mike Waychison <mikew@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: oprofile-list <oprofile-list@lists.sourceforge.net>
Link: http://lkml.kernel.org/r/20111219153830.GH16765@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ff05b6f7ae upstream.
An integer overflow will happen on 64bit archs if task's sum of rss,
swapents and nr_ptes exceeds (2^31)/1000 value. This was introduced by
commit
f755a04 oom: use pte pages in OOM score
where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000). So there could be an int overflow while computing
176 points *= 1000;
and points may have negative value. Meaning the oom score for a mem hog task
will be one.
196 if (points <= 0)
197 return 1;
For example:
[ 3366] 0 3366 35390480 24303939 5 0 0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.
In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.
The points variable should be of type long instead of int to prevent the
int overflow.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3d3c8f93a2 upstream.
binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()
The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.
This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9f57bd4d6d upstream.
per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc
case to the page boundary, which is bogus for any non-page-aligned
address.
This affects the only in-tree user of this function - sysfs handler
for per-cpu 'crash_notes' physical address. The trouble is that the
crash_notes per-cpu variable is not page-aligned:
crash_notes = 0xc08e8ed4
PER-CPU OFFSET VALUES:
CPU 0: 3711f000
CPU 1: 37129000
CPU 2: 37133000
CPU 3: 3713d000
So, the per-cpu addresses are:
crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4
However, /sys/devices/system/cpu/cpu*/crash_notes says:
/sys/devices/system/cpu/cpu0/crash_notes: 36b57000
/sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
/sys/devices/system/cpu/cpu2/crash_notes: 36b43000
/sys/devices/system/cpu/cpu3/crash_notes: 36b39000
As you can see, all values are rounded down to a page
boundary. Consequently, this is where kexec sets up the NOTE segments,
and thus where the secondary kernel is looking for them. However, when
the first kernel crashes, it saves the notes to the unaligned
addresses, where they are not found.
Fix it by adding offset_in_page() to the translated page address.
-tj: Combined Eugene's and Petr's commit messages.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8521478f67 upstream.
Synaptics touchpads on several Dell laptops, particularly Vostro V13
systems, may not respond properly to PS/2 commands and queries immediately
after resuming from suspend to RAM. This leads to unresponsive touchpad
after suspend/resume cycle.
Adding a 1-second delay after resetting the device allows touchpad to
finish initializing (calibrating?) and start reacting properly.
Reported-by: Daniel Manrique <daniel.manrique@canonical.com>
Tested-by: Daniel Manrique <daniel.manrique@canonical.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 329456d1ff upstream.
This fixes a Data bus error on some SoCs. The first fix for this
problem did not solve it on all devices.
commit 6ae8ec2786
Author: Rafał Miłecki <zajec5@gmail.com>
Date: Tue Jul 5 17:25:32 2011 +0200
ssb: fix init regression of hostmode PCI core
In ssb_pcicore_fix_sprom_core_index() the sprom on the PCI core is
accessed, but the sprom only exists when the ssb bus is connected over
a PCI bus to the rest of the system and not when the SSB Bus is the
main system bus. SoCs sometimes have a PCI host controller and there
this code will not be executed, but there are some old SoCs with an PCI
controller in client mode around and ssb_pcicore_fix_sprom_core_index()
should not be called on these devices too. The PCI controller on these
devices are unused, but without this fix it results in an Data bus
error when it gets initialized.
Cc: Michael Buesch <m@bues.ch>
Cc: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5151412dd4 upstream.
struct request_queue is allocated with __GFP_ZERO so its "node" field is
zero before initialization. This causes an oops if node 0 is offline in
the page allocator because its zonelists are not initialized. From Dave
Young's dmesg:
SRAT: Node 1 PXM 2 0-d0000000
SRAT: Node 1 PXM 2 100000000-330000000
SRAT: Node 0 PXM 1 330000000-630000000
Initmem setup node 1 0000000000000000-000000000affb000
...
Built 1 zonelists in Node order, mobility grouping on.
...
BUG: unable to handle kernel paging request at 0000000000001c08
IP: [<ffffffff8111c355>] __alloc_pages_nodemask+0xb5/0x870
and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on
zonelist->_zonerefs.
The fix is to initialize q->node at the time of allocation so the correct
node is passed to the slab allocator later.
Since blk_init_allocated_queue_node() is no longer needed, merge it with
blk_init_allocated_queue().
[rientjes@google.com: changelog, initializing q->node]
Reported-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Tested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 15062e6a85 upstream.
Emmanuel noticed that when mac80211 stops the queues
for aggregation that can leave a packet pending. This
packet will be given to the driver after the AMPDU
callback, but as a non-aggregated packet which messes
up the sequence number etc.
I also noticed by looking at the code that if packets
are being processed while we clear the WANT_START bit,
they might see it cleared already and queue up on
tid_tx->pending. If the driver then rejects the new
aggregation session we leak the packet.
Fix both of these issues by changing this code to not
stop the queues at all. Instead, let packets queue up
on the tid_tx->pending queue instead of letting them
get to the driver, and add code to recover properly
in case the driver rejects the session.
(The patch looks large because it has to move two
functions to before their new use.)
Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f6a290b419 upstream.
_scsih_smart_predicted_fault is called in an interrupt and therefore
must allocate memory using GFP_ATOMIC.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 44f747fff6 upstream.
zfcp_scsi_slave_destroy erroneously always tried to finish its task
even if the corresponding previous zfcp_scsi_slave_alloc returned
early. This can lead to kernel page faults on accessing uninitialized
fields of struct zfcp_scsi_dev in zfcp_erp_lun_shutdown_wait. Take the
port field of the struct to determine if slave_alloc returned early.
This zfcp bug is exposed by 4e6c82b (in turn fixing f7c9c6b to be
compatible with 21208ae) which can call slave_destroy for a
corresponding previous slave_alloc that did not finish.
This patch is based on James Bottomley's fix suggestion in
http://www.spinics.net/lists/linux-scsi/msg55449.html.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5eb46851de upstream.
cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:
step 1: Process A: Issue an I/O to /dev/sda
step 2: Process A: Get an ioc (iocA here) in get_io_context() which does not
linked with a cic for the device
step 3: Process A: Get a new cic for the device (cicA here) in
cfq_alloc_io_context()
step 4: Process B: Issue an I/O to /dev/sda
step 5: Process B: Get iocA in get_io_context() since process A and B share the
same ioc
step 6: Process B: Get a new cic for the device (cicB here) in
cfq_alloc_io_context() since iocA has not been linked with a
cic for the device yet
step 7: Process A: Link cicA to iocA in cfq_cic_link()
step 8: Process A: Dispatch I/O to driver and finish it
step 9: Process B: Try to link cicB to iocA in cfq_cic_link()
But it fails with showing "cfq: cic link failed!" kernel
message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
The function does not wake up, when there is no I/O to the
device.
When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2984ff38cc upstream.
If we fail allocating the blkpg stats, we free cfqd and cfgq.
But we need to free the IDA cfqd->cic_index as well.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4ed0b57745 upstream.
This prevents an in-kernel division by zero which happens when we are
asking for i915_chipset_val too quickly, or within a race condition
between the power monitoring thread and userspace accesses via debugfs.
The issue can be reproduced easily via the following command:
while ``; do cat /sys/kernel/debug/dri/0/i915_emon_status; done
This is particularly dangerous because it can be triggered by
a non-privileged user by just reading the debugfs entry.
This issue was also found independently by Konstantin Belousov
<kostikbel@gmail.com>, who proposed a similar patch.
Reported-by: Konstantin Belousov <kostikbel@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c3b79770e5 upstream.
The m41t80 driver can read and set the alarm, but it doesn't
seem to have a functional alarm irq.
This causes failures when the generic core sees alarm functions,
but then cannot use them properly for things like UIE mode.
Disabling the alarm functions allows proper error reporting,
and possible fallback to emulated modes. Once someone fixes
the alarm irq functionality, this can be restored.
CC: Matt Turner <mattst88@gmail.com>
CC: Nico Macrionitis <acrux@cruxppc.org>
CC: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Reported-by: Matt Turner <mattst88@gmail.com>
Reported-by: Nico Macrionitis <acrux@cruxppc.org>
Tested-by: Nico Macrionitis <acrux@cruxppc.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 72b36015ba upstream.
Same fix as 731abb9cb2 for ipip and sit tunnel.
Commit 1c5cae815d removed an explicit call to dev_alloc_name in
ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
will now create a valid name, however the tunnel keeps a copy of the
name in the private parms structure. Fix this by copying the name back
after register_netdevice has successfully returned.
This shows up if you do a simple tunnel add, followed by a tunnel show:
$ sudo ip tunnel add mode ipip remote 10.2.20.211
$ ip tunnel
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit
$ sudo ip tunnel add mode sit remote 10.2.20.212
$ ip tunnel
sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16
sit%d: ioctl 89f8 failed: No such device
sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit
Signed-off-by: Ted Feng <artisdom@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e5fe29c719 upstream.
Commit 10299e2e4e (ARM: RX-51:
Enable isp1704 power on/off) added power management for isp1704.
However, the transceiver should be powered on by default,
otherwise USB doesn't work at all for networking during
boot.
All kernels after v3.0 are affected.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Reviewed-by: Sebastian Reichel <sre@debian.org>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 82e14e8bdd upstream.
For cards that have two or more DAIs, snd_soc_resume's loop over all
DAIs ends up calling schedule_work(deferred_resume_work) once per DAI.
Since this is the same work item each time, the 2nd and subsequent
calls return 0 (work item already queued), and trigger the dev_err
message below stating that a work item may have been lost.
Solve this by adjusting the loop to simply calculate whether to run the
resume work immediately or defer it, and then call schedule work (or not)
one time based on that.
Note: This has not been tested in mainline, but only in chromeos-2.6.38;
mainline doesn't support suspend/resume on Tegra, nor does the mainline
Tegra ASoC driver contain multiple DAIs. It has been compile-checked in
mainline.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 02a551c975 upstream.
Huawei use the product code HUAWEI_PRODUCT_E353 (0x1506) for a
number of different devices, which each can appear with a number
of different descriptor sets. Different types of interfaces
can be identified by looking at the subclass and protocol fields
Subclass 1 protocol 8 is actually the data interface of a CDC
ECM set, with subclass 1 protocol 9 as the control interface.
Neither support serial data communcation, and cannot therefore
be supported by this driver.
At the same time, add a few other sets which appear if the
device is configured in "Windows mode" using this modeswitch
message:
55534243000000000000000000000011060000000100000000000000000000
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 414b591fd1 upstream.
This patch adds the controlling interfaces for the Huawei E398.
Thanks to Bjørn Mork <bjorn@mork.no> for extracting the interface
numbers from the windows driver.
Signed-off-by: Alex Hermann <alex@wenlex.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6abff5dc4d upstream.
Add USB IDs for Motorola H24 HSPA USB module.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 935a9fee51 upstream.
Found one system with UEFI/iBFT, kernel does not detect the iBFT during
iscsi_ibft module loading.
Root cause: on x86 (UEFI), we are calling of find_ibft_region() much earlier
- specifically in setup_arch() before ACPI is enabled.
Try to split acpi checking code out and call that later
At that time ACPI iBFT already get permanent mapped with ioremap.
So isa_virt_to_bus() will get wrong phys from right virt address.
We could just skip that phys address printing.
For legacy one, print the found address early.
-v2: update comments and description according to Konrad.
-v3: fix problem about module use case that is found by Konrad.
-v4: use acpi_get_table() instead of acpi_table_parse() to handle module use case that is found by Konrad again..
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 48706d0a91 upstream.
Fix two bugs in fuse_retrieve():
- retrieving more than one page would yield repeated instances of the
first page
- if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the
request page array would overflow
fuse_retrieve() was added in 2.6.36 and these bugs had been there since the
beginning.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5a0dc7365c upstream.
We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 13a79a4741 upstream.
If there is an unwritten but clean buffer in a page and there is a
dirty buffer after the buffer, then mpage_submit_io does not write the
dirty buffer out. As a result, da_writepages loops forever.
This patch fixes the problem by checking dirty flag.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit ea51d132db upstream.
If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.
gdb> bt
#0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
#2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
#3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
#4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
xffff88001e26be40) at mm/filemap.c:2600
#5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
zed out>, pos=<value optimized out>) at mm/filemap.c:2632
#6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
t fs/ext4/file.c:136
#7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
ppos=0xffff88001e26bf48) at fs/read_write.c:406
#8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
000, pos=0xffff88001e26bf48) at fs/read_write.c:435
#9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
4000) at fs/read_write.c:487
#10 <signal handler called>
#11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
#12 0x0000000000000000 in ?? ()
gdb> print offset
$22 = 0xffffffffffffffff
gdb> print idx
$23 = 0xffffffff
gdb> print inode->i_blkbits
$24 = 0xc
gdb> up
#1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
2512 if (ext4_da_should_update_i_disksize(page, end)) {
gdb> print start
$25 = 0x0
gdb> print end
$26 = 0xffffffffffffffff
gdb> print pos
$27 = 0x108000
gdb> print new_i_size
$28 = 0x108000
gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
$29 = 0xd9000
gdb> down
2467 for (i = 0; i < idx; i++)
gdb> print i
$30 = 0xd44acbee
This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fc6cb1cda5 upstream.
/proc/mounts was showing the mount option [no]init_inode_table when
the correct mount option that will be accepted by parse_options() is
[no]init_itable.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d3db728125 upstream.
d312ae878b "xen: use maximum reservation to limit amount of usable RAM"
clamped the total amount of RAM to the current maximum reservation. This is
correct for dom0 but is not correct for guest domains. In order to boot a guest
"pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
future memory expansion the guest must derive max_pfn from the e820 provided by
the toolstack and not the current maximum reservation (which can reflect only
the current maximum, not the guest lifetime max). The existing algorithm
already behaves this correctly if we do not artificially limit the maximum
number of pages for the guest case.
For a guest booted with maxmem=512, memory=128 this results in:
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
-[ 0.000000] Xen: 0000000000100000 - 0000000008100000 (usable)
-[ 0.000000] Xen: 0000000008100000 - 0000000020800000 (unusable)
+[ 0.000000] Xen: 0000000000100000 - 0000000020800000 (usable)
...
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
-[ 0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
+[ 0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
[ 0.000000] initial memory mapped : 0 - 027ff000
[ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
-[ 0.000000] init_memory_mapping: 0000000000000000-0000000008100000
-[ 0.000000] 0000000000 - 0008100000 page 4k
-[ 0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
+[ 0.000000] init_memory_mapping: 0000000000000000-0000000020800000
+[ 0.000000] 0000000000 - 0020800000 page 4k
+[ 0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
[ 0.000000] xen: setting RW the range 27e8000 - 27ff000
[ 0.000000] 0MB HIGHMEM available.
-[ 0.000000] 129MB LOWMEM available.
-[ 0.000000] mapped low ram: 0 - 08100000
-[ 0.000000] low ram: 0 - 08100000
+[ 0.000000] 520MB LOWMEM available.
+[ 0.000000] mapped low ram: 0 - 20800000
+[ 0.000000] low ram: 0 - 20800000
With this change "xl mem-set <domain> 512M" will successfully increase the
guest RAM (by reducing the balloon).
There is no change for dom0.
Reported-and-Tested-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 355840e7a7 upstream.
commit a847627709 in linux-3.0.9
attempted to backport this to 3.0 but only made one change were two
were necessary. This add the second change.
This bug was introduced in 415e72d034
which was in 2.6.36.
There is a small window of time between when a device fails and when
it is removed from the array. During this time we might still read
from it, but we won't write to it - so it is possible that we could
read stale data.
We didn't need the test of 'Faulty' before because the test on
In_sync is sufficient. Since we started allowing reads from the early
part of non-In_sync devices we need a test on Faulty too.
This is suitable for any kernel from 2.6.36 onwards, though the patch
might need a bit of tweaking in 3.0 and earlier.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 859f57ca00 upstream.
[slightly different from the upstream version because of a previous cleanup]
Currently xfs_attr_inactive causes a synchronous transactions if we are
removing a file that has any extents allocated to the attribute fork, and
thus makes XFS extremely slow at removing files with out of line extended
attributes. The code looks a like a relict from the days before the busy
extent list, but with the busy extent list we avoid reusing data and attr
extents that have been freed but not commited yet, so this code is just
as superflous as the synchronous transactions for data blocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c29f7d457a upstream.
The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels. We have the full
inode number in the XFS inode, so use that one for nfs exports. Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.
Reported-by: Guoquan Yang <ygq51@hotmail.com>
Reported-by: Hank Peng <pengxihan@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is for stable kernel branch 3.0 only. Previous and later versions
have different code paths and are not affected by this bug.
This is the same fix as "hwmon: (coretemp) Fix oops on driver load"
but for the CPU offlining case. Sorry for missing it at first.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Durgadoss R <durgadoss.r@intel.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 434a964daa upstream.
Clement Lecigne reports a filesystem which causes a kernel oops in
hfs_find_init() trying to dereference sb->ext_tree which is NULL.
This proves to be because the filesystem has a corrupted MDB extent
record, where the extents file does not fit into the first three extents
in the file record (the first blocks).
In hfs_get_block() when looking up the blocks for the extent file
(HFS_EXT_CNID), it fails the first blocks special case, and falls
through to the extent code (which ultimately calls hfs_find_init())
which is in the process of being initialised.
Hfs avoids this scenario by always having the extents b-tree fitting
into the first blocks (the extents B-tree can't have overflow extents).
The fix is to check at mount time that the B-tree fits into first
blocks, i.e. fail if HFS_I(inode)->alloc_blocks >=
HFS_I(inode)->first_blocks
Note, the existing commit 47f365eb57 ("hfs: fix oops on mount with
corrupted btree extent records") becomes subsumed into this as a special
case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly
acceptable for the catalog B-Tree file to grow beyond three extents,
with the remaining extent descriptors in the extents overfow.
This fixes CVE-2011-2203
Reported-by: Clement LECIGNE <clement.lecigne@netasq.com>
Signed-off-by: Phillip Lougher <plougher@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Moritz Mühlenhoff <jmm@inutil.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 1a51410abe upstream.
Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more. But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.
Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.
Reported-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Moritz Mühlenhoff <jmm@inutil.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8762202dd0 upstream.
I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.
So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.
The following script could reproduce it:
fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"
dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
echo "Found journal: $offset"
found=1
break
fi
offset=`echo "$offset+$blocksize" | bc`
done
if [ $found -ne 1 ];then
echo "Magic \"$magic\" not found"
exit 1
fi
dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
mkdir -p ./mnt
mount -o loop $img ./mnt
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Moritz Mühlenhoff <jmm@inutil.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2ded6e6a94 upstream.
When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.
On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.
This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e58f516ff4 upstream.
When we can't configure the dma channel we want to fall
back to PIO. We do this by setting host->do_dma to zero.
This does not work as do_dma is used to see whether dma
can be used for the current transfer. Instead, we have
to set host->dma to NULL.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0b57d7602b upstream.
wait_for_completion_interruptible_timeout() may return negative value.
In this case, checking if (t > 0) will return true if t is unsigned.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 13c07b0286 upstream.
Exactly like roundup_pow_of_two(1), the rounddown version was buggy for
the case of a compile-time constant '1' argument. Probably because it
originated from the same code, sharing history with the roundup version
from before the bugfix (for that one, see commit 1a06a52ee1: "Fix
roundup_pow_of_two(1)").
However, unlike the roundup version, the fix for rounddown is to just
remove the broken special case entirely. It's simply not needed - the
generic code
1UL << ilog2(n)
does the right thing for the constant '1' argment too. The only reason
roundup needed that special case was because rounding up does so by
subtracting one from the argument (and then adding one to the result)
causing the obvious problems with "ilog2(0)".
But rounddown doesn't do any of that, since ilog2() naturally truncates
(ie "rounds down") to the right rounded down value. And without the
ilog2(0) case, there's no reason for the special case that had the wrong
value.
tl;dr: rounddown_pow_of_two(1) should be 1, not 0.
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Upstream commit d305a6557b.
If addBA responses comes in just after addba_resp_timer has
expired mac80211 will still accept it and try to open the
aggregation session. This causes drivers to be confused and
in some cases even crash.
This patch fixes the race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer
accepted and we do not try to open half-closed session.
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
[some adjustments]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a855b84c3d upstream.
Percpu allocator recorded the cpus which map to the first and last
units in pcpu_first/last_unit_cpu respectively and used them to
determine the address range of a chunk - e.g. it assumed that the
first unit has the lowest address in a chunk while the last unit has
the highest address.
This simply isn't true. Groups in a chunk can have arbitrary positive
or negative offsets from the previous one and there is no guarantee
that the first unit occupies the lowest offset while the last one the
highest.
Fix it by actually comparing unit offsets to determine cpus occupying
the lowest and highest offsets. Also, rename pcu_first/last_unit_cpu
to pcpu_low/high_unit_cpu to avoid confusion.
The chunk address range is used to flush cache on vmalloc area
map/unmap and decide whether a given address is in the first chunk by
per_cpu_ptr_to_phys() and the bug was discovered by invalid
per_cpu_ptr_to_phys() translation for crash_note.
Kudos to Dave Young for tracking down the problem.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: WANG Cong <xiyou.wangcong@gmail.com>
Reported-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
LKML-Reference: <4EC21F67.10905@redhat.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4399c8bf2b upstream.
If target_level == 0, current code breaks out of the while-loop if
SUPERPAGE bit is set. We should also break out if PTE is not present.
If we don't do this, KVM calls to iommu_iova_to_phys() will cause
pfn_to_dma_pte() to create mapping for 4KiB pages.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8140a95d22 upstream.
set dmar->iommu_superpage field to the smallest common denominator
of super page sizes supported by all active VT-d engines. Initialize
this field in intel_iommu_domain_init() API so intel_iommu_map() API
will be able to use iommu_superpage field to determine the appropriate
super page size to use.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 292827cb16 upstream.
iommu_unmap() API expects IOMMU drivers to return the actual page order
of the address being unmapped. Previous code was just returning page
order passed in from the caller. This patch fixes this problem.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9b5cd7f37e upstream.
SBC-3 says:
A TRANSFER LENGTH field set to zero specifies that 256 logical
blocks shall be written. Any other value specifies the number
of logical blocks that shall be written.
The old code was always just returning the value in the TRANSFER LENGTH
byte. Fix this to return 256 if the byte is 0.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 02125a8264 upstream.
__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that. The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root. Without grabbing references. Sure, at the moment of call it had
been pinned down by what we have in *path. And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.
It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?". Dereferencing is not safe and apparmor ended up stepping into
that. d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace. As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.
The fix is fairly straightforward, even though it's bigger than I'd like:
* prepend_path() root argument becomes const.
* __d_path() is never called with NULL/NULL root. It was a kludge
to start with. Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops. apparmor and tomoyo are using it.
* __d_path() returns NULL on path outside of root. The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail. Those who don't want that can (and do)
use d_path().
* __d_path() root argument becomes const. Everyone agrees, I hope.
* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount. In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there. Handling of sysctl()-triggered weirdness is moved to that place.
* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead. That's the other remaining caller of __d_path(),
BTW.
* seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine). However, if it gets path not reachable
from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).
Reviewed-by: John Johansen <john.johansen@canonical.com>
ACKed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 1368edf064 upstream.
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().
This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.
Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d021563888 upstream.
setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:
IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180
*pdpt = 0000000000000000 *pde = f000ff53f000ff53
Oops: 0000 [#1] SMP
Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0
EIP is at setup_zone_migrate_reserve+0xcd/0x180
EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
Call Trace:
[<c02d389c>] __setup_per_zone_wmarks+0xec/0x160
[<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20
[<c08a771c>] init_per_zone_wmark_min+0x27/0x86
[<c020111b>] do_one_initcall+0x2b/0x160
[<c086639d>] kernel_init+0xbe/0x157
[<c05cae26>] kernel_thread_helper+0x6/0xd
Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
CR2: 00000000000012b4
We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.
The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").
Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Dang Bo <bdang@vmware.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arve Hjnnevg <arve@android.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 58a84aa927 upstream.
Commit 70b50f94f1 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times. But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized. So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.
kernel BUG at include/linux/mm.h:386!
invalid opcode: 0000 [#1] SMP
Call Trace:
gup_pud_range+0xb8/0x19d
get_user_pages_fast+0xcb/0x192
? trace_hardirqs_off+0xd/0xf
hva_to_pfn+0x119/0x2f2
gfn_to_pfn_memslot+0x2c/0x2e
kvm_iommu_map_pages+0xfd/0x1c1
kvm_iommu_map_memslots+0x7c/0xbd
kvm_iommu_map_guest+0xaa/0xbf
kvm_vm_ioctl_assigned_device+0x2ef/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP gup_huge_pud+0xf2/0x159
Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b6999b1912 upstream.
With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.
The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.
So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.
This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.
Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1
kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
put_page+0x15/0x37
kvm_release_pfn_clean+0x31/0x36
kvm_iommu_put_pages+0x94/0xb1
kvm_iommu_unmap_memslots+0x80/0xb6
kvm_assign_device+0xba/0x117
kvm_vm_ioctl_assigned_device+0x301/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP put_compound_page+0xd4/0x168
Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cefcc03ffc upstream.
Allow userspace applications to do more parameter setting by providing a
more complete stub DMA driver specifying a wildcard set of formats and
channels and essentially random values for the DMA parameters. This is
required for useful runtime operation of the dummy DMA driver until we
are able to figure out how to power up links and do hw_params() from DAPM.
Sending to stable as without this the dummy driver is not terribly
useful.
Reported-by: Kyung-Kwee Ryu <Kyung-Kwee.Ryu@wolfsonmicro.com>
Tested-by: Kyung-Kwee Ryu <Kyung-Kwee.Ryu@wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 83713fc937 upstream.
The function setup_vpif_input_channel_mode() used the VSCLKDIS register
instead of VIDCLKCTL. This meant that when in HD mode videoport channel 0
used a different clock from channel 1.
Clearly a copy-and-paste error.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Manjunath Hadli <manjunath.hadli@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 11357be924 upstream.
Adding the machine_is_* line was forgotten when converting mach-stmp378x to
mach-mxs.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit f1b21c5256 upstream.
On OMAP-L138 platform, EDMA event queue 0 should be used for audio
transfers so that they are not starved by video data moving on event queue 1.
Commit 48519f0ae0 (ASoC: davinci: let platform
data define edma queue numbers) had a side-effect of changing this behavior
by making the driver actually honor the platform data passed.
Fix this now by passing event queue 0 as the queue to be used for audio
transfers.
Signed-off-by: Manjunathappa, Prakash <prakash.pm@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c9c024b3f3 upstream.
The expiry function compares the timer against current time and does
not expire the timer when the expiry time is >= now. That's wrong. If
the timer is set for now, then it must expire.
Make the condition expiry > now for breaking out the loop.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cce4aa378a upstream.
When no imux is available (e.g. a single capture source),
alc_auto_init_input_src() may trigger an Oops due to the access to -1.
Add a proper zero-check to avoid it.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fc084e0b93 upstream.
There are some AC97 codec and board combinations that have been observed
to take a very long time to respond after the cold reset has completed.
In one case, more than 350 ms was required. To allow users to have sound
on those platforms, we'll wait up to 500ms for the codec to become
ready.
As a board may have multiple codecs, with some faster than others to
reset, we add a module parameter to inform the driver which codecs
should be present.
Reported-by: KotCzarny <tjosko@yahoo.com>
Signed-off-by: David Dillow <dave@thedillows.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit de28f25e82 upstream.
If a device is shutdown, then there might be a pending interrupt,
which will be processed after we reenable interrupts, which causes the
original handler to be run. If the old handler is the (broadcast)
periodic handler the shutdown state might hang the kernel completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b1f919664d upstream.
In order to leave a margin of 12.5% we should >> 3 not >> 5.
Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com>
[jstultz: Modified commit subject]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d06c27b22a upstream.
A update is made to the sched:sched_switch event that adds some
logic to the first parameter of the __print_flags() that shows the
state of tasks. This change cause perf to fail parsing the flags.
A simple fix is needed to have the parser be able to process ops
within the argument.
Reported-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c1be84309c upstream.
When a better rated broadcast device is installed, then the current
active device is not disabled, which results in two running broadcast
devices.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cb59974742 upstream.
Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
ever being released.
Ref_count was added to keep track of subsystem users, not for counting
events. Subsystem is created with ref_count = 1, so there is no need to
increment it for every event, we have nr_events for that. Fix this by
touching ref_count only when we actually have a new user -
subsystem_open().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c0afabd3d5 upstream.
Currently, the RTC code does not disable the alarm in the hardware.
This means that after a sequence such as the one below (the files are in the
RTC sysfs), the box will boot up after 2 minutes even though we've
asked for the alarm to be turned off.
# echo $((`cat since_epoch`)+120) > wakealarm
# echo 0 > wakealarm
# poweroff
Fix this by disabling the alarm when there are no timers to run.
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rabin Vincent <rabin.vincent@stericsson.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4c393a6059 upstream.
With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute. It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.
Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.
Also document the function better, and clean up some loose bits.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4dd2cb4a28 upstream.
If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim. So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.
The scheme is copied from what we do for dquot reclaims.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Ben Myers <bpm@sgi.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit fa8b18edd7 upstream.
This prevents in-memory corruption and possible panics if the on-disk
ACL is badly corrupted.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a backport of critical parts of
commit 7c24d9489f "NFSv4.1: File layout only supports whole file layouts"
It prevents the file layout driver from (incorrectly) using
partial layouts, but ignores the part of the referenced commmit that
relies on additional machinery to change the LAYOUTGET request
based on layout driver.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 550acb1926 upstream.
In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:
kthread_stop irq_wait_for_interrupt
------------ ----------------------
...
... while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
set_current_state(TASK_INTERRUPTIBLE);
...
schedule();
}
Fix this by checking if the thread should stop after modifying the
task's state.
[ tglx: Simplified it a bit ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0bac71af6e upstream.
Johannes' patch for "cfg80211: fix regulatory NULL dereference"
broke user regulaotry hints and it did not address the fact that
last_request was left populated even if the previous regulatory
hint was stale due to the wiphy disappearing.
Fix user reguluatory hints by only bailing out if for those
regulatory hints where a request_wiphy is expected. The stale last_request
considerations are addressed through the previous fixes on last_request
where we reset the last_request to a static world regdom request upon
reset_regdomains(). In this case though we further enhance the effect
by simply restoring reguluatory settings completely.
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a042994dd3 upstream.
There is a theoretical race that if hit will trigger
a crash. The race is between when we issue the first
regulatory hint, regulatory_hint_core(), gets processed
by the workqueue and between when the first device
gets registered to the wireless core. This is not easy
to reproduce but it was easy to do so through the
regulatory simulator I have been working on. This
is a port of the fix I implemented there [1].
[1] a246ccf81f
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b934069c99 upstream.
The last breaking event address is a read-only value, the regset misses the
.set function. If a PTRACE_SETREGSET is done for NT_S390_LAST_BREAK we
get an oops due to a branch to zero:
Kernel BUG at 0000000000000002 verbose debug info unavailable
illegal operation: 0001 #1 SMP
...
Call Trace:
(<0000000000158294> ptrace_regset+0x184/0x188)
<00000000001595b6> ptrace_request+0x37a/0x4fc
<0000000000109a78> arch_ptrace+0x108/0x1fc
<00000000001590d6> SyS_ptrace+0xaa/0x12c
<00000000005c7a42> sysc_noemu+0x16/0x1c
<000003fffd5ec10c> 0x3fffd5ec10c
Last Breaking-Event-Address:
<0000000000158242> ptrace_regset+0x132/0x188
Add a nop .set function to prevent the branch to zero.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 57d1c0c03c upstream.
Masami spotted that we always try to decode the instruction stream as
64bit instructions when running a 64bit kernel, this doesn't work for
ia32-compat proglets.
Use TIF_IA32 to detect if we need to use the 32bit instruction
decoder.
Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2cd1c8d4dc upstream.
Fix an outstanding issue that has been reported since 2.6.37.
Under a heavy loaded machine processing "fork()" calls could
crash with:
BUG: unable to handle kernel paging request at f573fc8c
IP: [<c01abc54>] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
.. snip..
Call Trace:
[<c01ac222>] ? __swap_duplicate+0xc2/0x160
[<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
[<c01ac2e4>] ? swap_duplicate+0x14/0x40
[<c01a0a6b>] ? copy_pte_range+0x45b/0x500
[<c01a0ca5>] ? copy_page_range+0x195/0x200
[<c01328c6>] ? dup_mmap+0x1c6/0x2c0
[<c0132cf8>] ? dup_mm+0xa8/0x130
[<c013376a>] ? copy_process+0x98a/0xb30
[<c013395f>] ? do_fork+0x4f/0x280
[<c01573b3>] ? getnstimeofday+0x43/0x100
[<c010f770>] ? sys_clone+0x30/0x40
[<c06c048d>] ? ptregs_clone+0x15/0x48
[<c06bfb71>] ? syscall_call+0x7/0xb
The problem is that in copy_page_range() we turn lazy mode on,
and then in swap_entry_free() we call swap_count_continued()
which ends up in:
map = kmap_atomic(page, KM_USER0) + offset;
and then later we touch *map.
Since we are running in batched mode (lazy) we don't actually
set up the PTE mappings and the kmap_atomic is not done
synchronously and ends up trying to dereference a page that has
not been set.
Looking at kmap_atomic_prot_pfn(), it uses
'arch_flush_lazy_mmu_mode' and doing the same in
kmap_atomic_prot() and __kunmap_atomic() makes the problem go
away.
Interestingly, commit b8bcfe997e ("x86/paravirt: remove lazy
mode in interrupts") removed part of this to fix an interrupt
issue - but it went to far and did not consider this scenario.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9e6866686b upstream.
In commit f8924e770e ("x86: unify mp_bus_info"), the 32-bit
and 64-bit versions of MP_bus_info were rearranged to match each
other better. Unfortunately it introduced a regression: prior
to that change we used to always set the mp_bus_not_pci bit,
then clear it if we found a PCI bus. After it, we set
mp_bus_not_pci for ISA buses, clear it for PCI buses, and leave
it alone otherwise.
In the cases of ISA and PCI, there's not much difference. But
ISA is not the only non-PCI bus, so it's better to always set
mp_bus_not_pci and clear it only for PCI.
Without this change, Dan's Dell PowerEdge 4200 panics on boot
with a log indicating interrupt routing trouble unless the
"noapic" option is supplied. With this change, the machine
boots reliably without "noapic".
Fixes http://bugs.debian.org/586494
Reported-bisected-and-tested-by: Dan McGrath <troubledaemon@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dan McGrath <troubledaemon@gmail.com>
Cc: Alexey Starikovskiy <aystarik@gmail.com>
[jrnieder@gmail.com: clarified commit message]
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Link: http://lkml.kernel.org/r/20111122215000.GA9151@elie.hsd1.il.comcast.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4cecf6d401 upstream.
(Added the missing signed-off-by line)
In hundreds of days, the __cycles_2_ns calculation in sched_clock
has an overflow. cyc * per_cpu(cyc2ns, cpu) exceeds 64 bits, causing
the final value to become zero. We can solve this without losing
any precision.
We can decompose TSC into quotient and remainder of division by the
scale factor, and then use this to convert TSC into nanoseconds.
Signed-off-by: Salman Qazi <sqazi@google.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111115221121.7262.88871.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 158886cd2c upstream.
When system enters suspend, xHCI driver clears command ring by writing zero
to all the TRBs. However, this also writes zero to the Link TRB, and the ring
is mangled. This may cause driver accesses wrong memory address and the
result is unpredicted.
When clear the command ring, keep the last Link TRB intact, only clear its
cycle bit. This should fix the "command ring full" issue reported by Oliver
Neukum.
This should be backported to stable kernels as old as 2.6.37, since the
commit 89821320 "xhci: Fix command ring replay after resume" is merged.
Signed-off-by: Andiry Xu <andiry.xu@amd.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e3420901eb upstream.
Fix a regression that was introduced by commit
811c926c53 (USB: EHCI: fix HUB TT scheduling
issue with iso transfer).
We detect an error if next == start, but this means uframe 0 can't be allocated
anymore for iso transfer...
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 811c926c53 upstream.
The current TT scheduling doesn't allow to play and then record on a
full-speed device connected to a high speed hub.
The IN iso stream can only start on the first uframe (0-2 for a 165 us)
because of CSPLIT transactions.
For the OUT iso stream there no such restriction. uframe 0-5 are possible.
The idea of this patch is that the first uframe are precious (for IN TT iso
stream) and we should allocate the last uframes first if possible.
For that we reverse the order of uframe allocation (last uframe first).
Here an example :
hid interrupt stream
----------------------------------------------------------------------
uframe | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
----------------------------------------------------------------------
max_tt_usecs | 125 | 125 | 125 | 125 | 125 | 125 | 30 | 0 |
----------------------------------------------------------------------
used usecs on a frame | 13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
----------------------------------------------------------------------
iso OUT stream
----------------------------------------------------------------------
uframe | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
----------------------------------------------------------------------
max_tt_usecs | 125 | 125 | 125 | 125 | 125 | 125 | 30 | 0 |
----------------------------------------------------------------------
used usecs on a frame | 13 | 125 | 39 | 0 | 0 | 0 | 0 | 0 |
----------------------------------------------------------------------
There no place for iso IN stream (uframe 0-2 are used) and we got "cannot
submit datapipe for urb 0, error -28: not enough bandwidth" error.
With the patch this become.
iso OUT stream
----------------------------------------------------------------------
uframe | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
----------------------------------------------------------------------
max_tt_usecs | 125 | 125 | 125 | 125 | 125 | 125 | 30 | 0 |
----------------------------------------------------------------------
used usecs on a frame | 13 | 0 | 0 | 0 | 125 | 39 | 0 | 0 |
----------------------------------------------------------------------
iso IN stream
----------------------------------------------------------------------
uframe | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
----------------------------------------------------------------------
max_tt_usecs | 125 | 125 | 125 | 125 | 125 | 125 | 30 | 0 |
----------------------------------------------------------------------
used usecs on a frame | 13 | 0 | 125 | 40 | 125 | 39 | 0 | 0 |
----------------------------------------------------------------------
Signed-off-by: Matthieu Castet <matthieu.castet@parrot.com>
Signed-off-by: Thomas Poussevin <thomas.poussevin@parrot.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cec28a5428 upstream.
Kingston DT 101 G2 replies a wrong tag while transporting, add an
unusal_devs entry to ignore the tag validation.
Signed-off-by: Qinglin Ye <yestyle@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit b1807719f6 upstream.
Genera Touch told us that 0001 is their single point device
and 0003 is the multitouch one. Apparently, we made the tests
someone having a prototype, and not the final product.
They said it should be safe to do the switch.
This partially reverts 5572da0 ("HID: hid-mulitouch: add support
for the 'Sensing Win7-TwoFinger'").
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 8746c83d53 upstream.
qset->qh.link is an __le64 field and we should be using cpu_to_le64()
to fill it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6a9ce6b654 upstream.
After sleeping on a wait queue, signal_pending(current) should be
checked (not before sleeping).
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit df30b21cb0 upstream.
In comedi_fops, mmap_count is decremented at comedi_vm_ops->close but
it is not incremented at comedi_vm_ops->open. This may result in a negative
counter. The patch introduces the open method to keep the counter
consistent.
The bug was triggerd by this sample code:
mmap(0, ...., comedi_fd);
fork();
exit(0);
Acked-by: Alessandro Rubini <rubini@gnudd.com>
Signed-off-by: Federico Vaga <federico.vaga@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 3ffab428f4 upstream.
This fixes kernel oops when an USB DAQ device is plugged out while it's
communicating with the userspace software.
Signed-off-by: Bernd Porr <berndporr@f2s.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 438957f8d4 upstream.
Interrupts must be disabled prior to calling usb_hcd_unlink_urb_from_ep.
If interrupts are not disabled, it can potentially lead to a deadlock.
The deadlock is readily reproduceable on a slower (ARM based) device
such as the TI Pandaboard.
Signed-off-by: Bart Westgeest <bart@elbrys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit bda63586bc upstream.
Currently the SigmaDSP firmware loader only works correctly on little-endian
systems. Fix this by using the proper endianess conversion functions.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4f718a29fe upstream.
The SigmaDSP firmware loader currently does not perform enough boundary size
checks when processing the firmware. As a result it is possible that a
malformed firmware can cause an out of bounds memory access.
This patch adds checks which ensure that both the action header and the payload
are completely inside the firmware data boundaries before processing them.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 745718132c upstream.
When we tear down a device we try to flush all outstanding
commands in scsi_free_queue(). However the check in
scsi_request_fn() is imperfect as it only signals that
we _might start_ aborting commands, not that we've actually
aborted some.
So move the printk inside the scsi_kill_request function,
this will also give us a hint about which commands are aborted.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is for stable kernel branch 3.0 only. Previous and later versions
have different code paths and are not affected by this bug.
If the CPU microcode is too old, the coretemp driver won't work. But
instead of failing gracefully, it currently oops. Check for NULL
platform device data to avoid this.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Durgadoss R <durgadoss.r@intel.com>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2a1e0fd175 upstream.
When a packet is supposed to sent be as an a-MPDU, mac80211 sets
IEEE80211_TX_CTL_AMPDU to let the driver know. On the other
hand, mac80211 configures the driver for aggregration with the
ampdu_action callback.
There is race between these two mechanisms since the following
scenario can occur when the BA agreement is torn down:
Tx softIRQ drv configuration
========== =================
check OPERATIONAL bit
Set the TX_CTL_AMPDU bit in the packet
clear OPERATIONAL bit
stop Tx AGG
Pass Tx packet to the driver.
In that case the driver would get a packet with TX_CTL_AMPDU set
although it has already been notified that the BA session has been
torn down.
To fix this, we need to synchronize all the Qdisc activity after we
cleared the OPERATIONAL bit. After that step, all the following
packets will be buffered until the driver reports it is ready to get
new packets for this RA / TID. This buffering allows not to run into
another race that would send packets with TX_CTL_AMPDU unset while
the driver hasn't been requested to tear down the BA session yet.
This race occurs in practice and iwlwifi complains with a WARN_ON
when it happens.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 24f50a9d16 upstream.
Nikolay noticed (by code review) that mac80211 can
attempt to stop an aggregation session while it is
already being stopped. So to fix it, check whether
stop is already being done and bail out if so.
Also move setting the STOPPING state into the lock
so things are properly atomic.
Reported-by: Nikolay Martynov <mar.kolya@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit de3584bd62 upstream.
By the time userspace returns with a response to
the regulatory domain request, the wiphy causing
the request might have gone away. If this is so,
reject the update but mark the request as having
been processed anyway.
Cc: Luis R. Rodriguez <lrodriguez@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit e007b857e8 upstream.
MAC addresses have a fixed length. The current
policy allows passing < ETH_ALEN bytes, which
might result in reading beyond the buffer.
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2d1618170e upstream.
priv->work must not be synced while priv->mutex is locked, because
the mutex is taken in the work handler.
Move cancel_work_sync down to after the device shutdown code.
This is safe, because the work handler checks fw_state and bails out
early in case of a race.
Signed-off-by: Michael Buesch <m@bues.ch>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 27c9cd7e60 upstream.
__remove_hrtimer() attempts to reprogram the clockevent device when
the timer being removed is the next to expire. However,
__remove_hrtimer() reprograms the clockevent *before* removing the
timer from the timerqueue and thus when hrtimer_force_reprogram()
finds the next timer to expire it finds the timer we're trying to
remove.
This is especially noticeable when the system switches to NOHz mode
and the system tick is removed. The timer tick is removed from the
system but the clockevent is programmed to wakeup in another HZ
anyway.
Silence the extra wakeup by removing the timer from the timerqueue
before calling hrtimer_force_reprogram() so that we actually program
the clockevent for the next timer to expire.
This was broken by 998adc3 "hrtimers: Convert hrtimers to use
timerlist infrastructure".
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d004e02405 upstream.
ktime_get and ktime_get_ts were calling timekeeping_get_ns()
but later they were not calling arch_gettimeoffset() so architectures
using this mechanism returned 0 ns when calling these functions.
This happened for example when running Busybox's ping which calls
syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually
calls ktime_get. As a result the returned ping travel time was zero.
Signed-off-by: Hector Palacios <hector.palacios@digi.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 884a45d964 upstream.
2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state
transitions) removed is_task_frozen_enough and replaced it with a simple
frozen call. This, however, breaks freezing for a group with stopped tasks
because those cannot be frozen and so the group remains in CGROUP_FREEZING
state (update_if_frozen doesn't count stopped tasks) and never reaches
CGROUP_FROZEN.
Let's add is_task_frozen_enough back and use it at the original locations
(update_if_frozen and try_to_freeze_cgroup). Semantically we consider
stopped tasks as frozen enough so we should consider both cases when
testing frozen tasks.
Testcase:
mkdir /dev/freezer
mount -t cgroup -o freezer none /dev/freezer
mkdir /dev/freezer/foo
sleep 1h &
pid=$!
kill -STOP $pid
echo $pid > /dev/freezer/foo/tasks
echo FROZEN > /dev/freezer/foo/freezer.state
while true
do
cat /dev/freezer/foo/freezer.state
[ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break
sleep 1
done
echo OK
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tomasz Buchert <tomasz.buchert@inria.fr>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 52553ddffa upstream.
Commit fa27271bc8d2("genirq: Fixup poll handling") introduced a
regression that broke irqfixup/irqpoll for some hardware configurations.
Amidst reorganizing 'try_one_irq', that patch removed a test that
checked for 'action->handler' returning IRQ_HANDLED, before acting on
the interrupt. Restoring this test back returns the functionality lost
since 2.6.39. In the current set of tests, after 'action' is set, it
must precede '!action->next' to take effect.
With this and my previous patch to irq/spurious.c, c75d720fca, all
IRQ regressions that I have encountered are fixed.
Signed-off-by: Edward Donovan <edward.donovan@numble.net>
Reported-and-tested-by: Rogério Brito <rbrito@ime.usp.br>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 24ca9a8477 upstream.
By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
to find evidence of socket congestion, we are making the RPC engine believe
that the message was incorrectly sent and so it disconnects the socket
instead of just retrying.
The bug appears to have been introduced by commit
5e3771ce2d (SUNRPC: Ensure that xs_nospace
return values are propagated).
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 2391a0e067 upstream.
This patch makes it possible to set DAI mode to its currently applied
value even if codec is active. This is necessary to allow
aplay -t raw -r 44100 -f S16_LE -c 2 < /dev/urandom &
alsactl store -f backup.state
alsactl restore -f backup.state
to work without returning errors. This patch is based on a patch sent
by Klaus Kurzmann <mok@fluxnetz.de>.
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a29878553a upstream.
commit 6175ddf06b optimized the mem*io
functions that have been used to send commands to the device. these
optimizations somehow corrupted the communication with the lx6464es,
that resulted the device to be unusable with kernels after 2.6.33.
this patch emulates the memcpy_*_io functions via a loop to avoid these
problems.
Signed-off-by: Tim Blechmann <tim@klingt.org>
LKML-Reference: <4ECB5257.4040600@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 11ed0ba175 upstream.
This patch implements a workaround for PL310 erratum 769419. On
revisions of the PL310 prior to r3p2, the Store Buffer does not
automatically drain. This can cause normal, non-cacheable writes to be
retained when the memory system is idle, leading to suboptimal I/O
performance for drivers using coherent DMA.
This patch adds an optional wmb() call to the cpu_idle loop. On systems
with an outer cache, this causes an explicit flush of the store buffer.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a8a6565c76 upstream.
This patch selects ARM_AMBA if OMAP3_EMU is defined because
OC_ETM depends on ARM_AMBA, so fix the link failure[1].
[1],
arch/arm/kernel/built-in.o: In function `etm_remove':
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:609: undefined
reference to `amba_release_regions'
arch/arm/kernel/built-in.o: In function `etb_remove':
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:409: undefined
reference to `amba_release_regions'
arch/arm/kernel/built-in.o: In function `etm_init':
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:640: undefined
reference to `amba_driver_register'
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:646: undefined
reference to `amba_driver_register'
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:648: undefined
reference to `amba_driver_unregister'
arch/arm/kernel/built-in.o: In function `etm_probe':
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:545: undefined
reference to `amba_request_regions'
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:595: undefined
reference to `amba_release_regions'
arch/arm/kernel/built-in.o: In function `etb_probe':
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:347: undefined
reference to `amba_request_regions'
/home/tom/git/omap/linux-2.6-omap/arch/arm/kernel/etm.c:392: undefined
reference to `amba_release_regions'
arch/arm/mach-omap2/built-in.o: In function `emu_init':
/home/tom/git/omap/linux-2.6-omap/arch/arm/mach-omap2/emu.c:62:
undefined reference to `amba_device_register'
/home/tom/git/omap/linux-2.6-omap/arch/arm/mach-omap2/emu.c:63:
undefined reference to `amba_device_register'
make: *** [.tmp_vmlinux1] Error 1
making modules
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 5a4f1844c2 upstream.
Fix a bug which has been on this driver since
it was added by the original commit 984aa6db
which would never clear IRQSTATUS bits.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit c0a39151a4 upstream.
Since CONFIG_USB_GADGET_PXA27X and other macros are renamed to
CONFIG_USB_PXA27X. Update them in arch/arm/mach-pxa and arch/arm/configs
to keep consistent.
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a32839696a upstream.
While the OLPC display appears to be able to handle either positive
or negative sync, the Display Controller only recognises positive sync.
This brings viafb (for XO-1.5) in line with lxfb (for XO-1) and
fixes a recent regression where the XO-1.5 DCON could no longer be
frozen. Thanks to Florian Tobias Schandinat for helping identify
the fix.
Test case: from a vt,
echo 1 > /sys/devices/platform/dcon/freeze
should cause the current screen contents to freeze, rather than garbage being
displayed.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 6c47e5c23a upstream.
Fixes i2c test failures when i2c_algo_bit.bit_test=1.
The hw doesn't actually require a mask, so just set it
to the default mask bits for r1xx-r4xx radeon ddc.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 4cac2eb158 upstream.
Previously we claimed device ID 0x7450, regardless of the vendor, which is
clearly wrong. Now we'll claim that device ID only for AMD.
I suspect this was just a typo in the original code, but it's possible this
change will break shpchp on non-7450 AMD bridges. If so, we'll have to fix
them as we find them.
Reference: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638863
Reported-by: Ralf Jung <ralfjung-e@gmx.de>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit cb0e093162 upstream.
CB tuning is needed to handle potential process variations that might
cause clock jitter for certain PLL settings. However, we were setting
it incorrectly since we were using the wrong M value as a check (M1 when
we needed to use the whole M value). Fix it up, making my HDMI
attached display a little prettier (used to have occasional dots crawl
across the display).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Timo Aaltonen <timo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 9ca1d10d74 upstream.
Unlike the previous one, I don't have known testcases it fixes. I'd
rather not go through the same debug cycle on whatever testcases those
might be.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 406478dc91 upstream.
Fixes rendering failures in Unigine Tropics and Sanctuary and the mesa
"fire" demo.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit d724502a9d upstream.
Fixes i2c test failures when i2c_algo_bit.bit_test=1.
The hw doesn't actually require a mask, so just set it
to the default mask bits for r1xx-r4xx radeon ddc.
I missed this part the first time through.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a5cd335165 upstream.
There is a potential integer overflow in drm_mode_dirtyfb_ioctl()
if userspace passes in a large num_clips. The call to kmalloc would
allocate a small buffer, and the call to fb->funcs->dirty may result
in a memory corruption.
Reported-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 274252862f upstream.
This was broken by commit 7759995c75 (yes,
myself). The basic problem here is since the digest state is only saved
after the last chunk, the state array is only valid when handling the
first chunk of the next buffer. Broken since linux-3.0.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 0f751e641a upstream.
From mhalcrow's original commit message:
Characters with ASCII values greater than the size of
filename_rev_map[] are valid filename characters.
ecryptfs_decode_from_filename() will access kernel memory beyond
that array, and ecryptfs_parse_tag_70_packet() will then decrypt
those characters. The attacker, using the FNEK of the crafted file,
can then re-encrypt the characters to reveal the kernel memory past
the end of the filename_rev_map[] array. I expect low security
impact since this array is statically allocated in the text area,
and the amount of memory past the array that is accessible is
limited by the largest possible ASCII filename character.
This patch solves the issue reported by mhalcrow but with an
implementation suggested by Linus to simply extend the length of
filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
0x00, which is how invalid characters less than 0x7A were previously
being handled.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 32001d6fe9 upstream.
Dirty pages weren't being written back when an mmap'ed eCryptfs file was
closed before the mapping was unmapped. Since f_ops->flush() is not
called by the munmap() path, the lower file was simply being released.
This patch flushes the eCryptfs file in the vm_ops->close() path.
https://launchpad.net/bugs/870326
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.