commit cc400e185c upstream.
Some low-level comedi drivers (incorrectly) point `dev->read_subdev` or
`dev->write_subdev` to a subdevice that does not support asynchronous
commands. Comedi's poll(), read() and write() file operation handlers
assume these subdevices do support asynchronous commands. In
particular, they assume `s->async` is valid (where `s` points to the
read or write subdevice), which it won't be if it has been set
incorrectly. This can lead to a NULL pointer dereference.
Check `s->async` is non-NULL in `comedi_poll()`, `comedi_read()` and
`comedi_write()` to avoid the bug.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 22056e2b46 upstream.
Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings. He tracked it down to two
problems, one of which is addressed by this patch.
It seems that writing to the command3 register after writing to the
command4 register in `labpc_ai_cmd()` messes up the differential
reference bit setting in the command4 register. Set up the command4
register after the command3 register (as in `labpc_ai_rinsn()`) to avoid
the problem.
Thanks to Tuomas for suggesting the fix.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4c4bc25d0f upstream.
Tuomas <tvainikk _at_ gmail _dot_ com> reported problems getting
meaningful output from a Lab-PC+ in differential mode for AI cmds, but
AI insn reads gave correct readings. He tracked it down to two
problems, one of which is addressed by this patch.
It seems the setting of the channel bits for particular scanning modes
was incorrect for differential mode. (Only half the number of channels
are available in differential mode; comedi refers to them as channels 0,
1, 2 and 3, but the hardware documentation refers to them as channels 0,
2, 4 and 6.) In differential mode, the setting of the channel enable
bits in the command1 register should depend on whether the scan enable
bit is set. Effectively, we need to double the comedi channel number
when the scan enable bit is not set in differential mode. The scan
enable bit gets set when the AI scan mode is `MODE_MULT_CHAN_UP` or
`MODE_MULT_CHAN_DOWN`, and gets cleared when the AI scan mode is
`MODE_SINGLE_CHAN` or `MODE_SINGLE_CHAN_INTERVAL`. The existing test
for whether the comedi channel number needs to be doubled in
differential mode is incorrect in `labpc_ai_cmd()`. This patch corrects
the test.
Thanks to Tuomas for suggesting the fix.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 3.0.67, commit 58c9ce6fad (s390/kvm:
Fix store status for ACRS/FPRS), upstream commit
15bc8d8457, added a call to
save_access_regs to save ACRS. But we do not have ARCS in kvm_run in
3.0 yet, so this results in:
arch/s390/kvm/kvm-s390.c: In function 'kvm_s390_vcpu_store_status':
arch/s390/kvm/kvm-s390.c:593: error: 'struct kvm_run' has no member named 's'
Fix it by saving guest_acrs which is where ARCS are in 3.0.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 3.0.67, commit 7a9a20ea77 (dca: check
against empty dca_domains list before unregister provider), upstream
commit c419fcfd07, added a fail path to
unregister_dca_provider. It added there also a call to
raw_spin_unlock_irqrestore. But in 3.0, the lock is not raw, so this
results in:
drivers/dca/dca-core.c: In function 'unregister_dca_provider':
drivers/dca/dca-core.c:413: warning: passing argument 1 of '_raw_spin_unlock_irqrestore' from incompatible pointer type
Fix it by calling spin_unlock_irqrestore properly.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 71b5707e11 upstream.
In cgroup_exit() put_css_set_taskexit() is called without any lock,
which might lead to accessing a freed cgroup:
thread1 thread2
---------------------------------------------
exit()
cgroup_exit()
put_css_set_taskexit()
atomic_dec(cgrp->count);
rmdir();
/* not safe !! */
check_for_release(cgrp);
rcu_read_lock() can be used to make sure the cgroup is alive.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63f43f55c9 upstream.
rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.
It's safe in the protection of dentry->d_lock.
v2: check NULL dentry before acquiring dentry lock.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb214ede76 upstream.
When a HP ProLiant DL980 G7 Server boots a regular kernel,
there will be intermittent lost interrupts which could
result in a hang or (in extreme cases) data loss.
The reason is that this system only supports x2apic physical
mode, while the kernel boots with a logical-cluster default
setting.
This bug can be worked around by specifying the "x2apic_phys" or
"nox2apic" boot option, but we want to handle this system
without requiring manual workarounds.
The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
As all apicids are smaller than 255, BIOS need to pass the
control to the OS with xapic mode, according to x2apic-spec,
chapter 2.9.
Current code handle x2apic when BIOS pass with xapic mode
enabled:
When user specifies x2apic_phys, or FADT indicates PHYSICAL:
1. During madt oem check, apic driver is set with xapic logical
or xapic phys driver at first.
2. enable_IR_x2apic() will enable x2apic_mode.
3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
will install the correct x2apic phys driver and use x2apic phys mode.
Otherwise it will skip the driver will let x2apic_cluster_probe to
take over to install x2apic cluster driver (wrong one) even though FADT
indicates PHYSICAL, because x2apic_phys_probe does not check
FADT PHYSICAL.
Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
problem.
Signed-off-by: Stoney Wang <song-bo.wang@hp.com>
[ updated the changelog and simplified the code ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Zhang Lin-Bao <Linbao.zhang@hp.com>
[ make a patch specially for 3.0.66]
Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f4ffc3a53 upstream.
automount-support is broken on the parisc architecture, because the existing
#if list does not include a check for defined(__hppa__). The HPPA (parisc)
architecture is similiar to other 64bit Linux targets where we have to define
autofs_wqt_t (which is passed back and forth to user space) as int type which
has a size of 32bit across 32 and 64bit kernels.
During the discussion on the mailing list, H. Peter Anvin suggested to invert
the #if list since only specific platforms (specifically those who do not have
a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned
long type.
This suggestion is probably the best way to go, since Arm64 (and maybe others?)
seems to have a non-working automounter. So in the long run even for other new
upcoming architectures this inverted check seem to be the best solution, since
it will not require them to change this #if again (unless they are 64bit only).
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 9067ac85d5.
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().
TASK_ALL has no other users, probably can be killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 9899d11f65.
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.
set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.
As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.
Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().
While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upstream commit 910ffdb18a.
Cleanup and preparation for the next change.
signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.
Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.
This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd97120fc3 upstream.
If a single descriptor crosses a region, the
second chunk length should be decremented
by size translated so far, instead it includes
the full descriptor length.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e75bafbff2 upstream.
svc_age_temp_xprts expires xprts in a two-step process: first it takes
the sv_lock and moves the xprts to expire off their server-wide list
(sv_tempsocks or sv_permsocks) to a local list. Then it drops the
sv_lock and enqueues and puts each one.
I see no reason for this: svc_xprt_enqueue() will take sp_lock, but the
sv_lock and sp_lock are not otherwise nested anywhere (and documentation
at the top of this file claims it's correct to nest these with sp_lock
inside.)
Tested-by: Jason Tibbitts <tibbs@math.uh.edu>
Tested-by: Paweł Sikora <pawel.sikora@agmk.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f116700971 upstream.
In ext4_mb_add_n_trim(), lg_prealloc_lock should be taken when
changing the lg_prealloc_list.
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30ebc5e44d upstream.
We recently introduced a new return -ENODEV in this function but we need
to unlock before returning.
[mchehab@redhat.com: found two patches with the same fix. Merged SOB's/acks into one patch]
Acked-by: Herton R. Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6cdae7416a upstream.
The iteration logic of idr_get_next() is borrowed mostly verbatim from
idr_for_each(). It walks down the tree looking for the slot matching
the current ID. If the matching slot is not found, the ID is
incremented by the distance of single slot at the given level and
repeats.
The implementation assumes that during the whole iteration id is aligned
to the layer boundaries of the level closest to the leaf, which is true
for all iterations starting from zero or an existing element and thus is
fine for idr_for_each().
However, idr_get_next() may be given any point and if the starting id
hits in the middle of a non-existent layer, increment to the next layer
will end up skipping the same offset into it. For example, an IDR with
IDs filled between [64, 127] would look like the following.
[ 0 64 ... ]
/----/ |
| |
NULL [ 64 ... 127 ]
If idr_get_next() is called with 63 as the starting point, it will try
to follow down the pointer from 0. As it is NULL, it will then try to
proceed to the next slot in the same level by adding the slot distance
at that level which is 64 - making the next try 127. It goes around the
loop and finds and returns 127 skipping [64, 126].
Note that this bug also triggers in idr_for_each_entry() loop which
deletes during iteration as deletions can make layers go away leaving
the iteration with unaligned ID into missing layers.
Fix it by ensuring proceeding to the next slot doesn't carry over the
unaligned offset - ie. use round_up(id + 1, slot_distance) instead of
id += slot_distance.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Teigland <teigland@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d092603cc upstream.
"be->mode" is obtained from xenbus_read(), which does a kmalloc() for
the message body. The short string is never released, so do it along
with freeing "be" itself, and make sure the string isn't kept when
backend_changed() doesn't complete successfully (which made it
desirable to slightly re-structure that function, so that the error
cleanup can be done in one place).
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 309a85b686 upstream.
ocfs2_block_group_alloc_discontig() disables chain relink by setting
ac->ac_allow_chain_relink = 0 because it grabs clusters from multiple
cluster groups.
It doesn't keep the credits for all chain relink,but
ocfs2_claim_suballoc_bits overrides this in this call trace:
ocfs2_block_group_claim_bits()->ocfs2_claim_clusters()->
__ocfs2_claim_clusters()->ocfs2_claim_suballoc_bits()
ocfs2_claim_suballoc_bits set ac->ac_allow_chain_relink = 1; then call
ocfs2_search_chain() one time and disable it again, and then we run out
of credits.
Fix is to allow relink by default and disable it in
ocfs2_block_group_alloc_discontig.
Without this patch, End-users will run into a crash due to run out of
credits, backtrace like this:
RIP: 0010:[<ffffffffa0808b14>] [<ffffffffa0808b14>]
jbd2_journal_dirty_metadata+0x164/0x170 [jbd2]
RSP: 0018:ffff8801b919b5b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88022139ddc0 RCX: ffff880159f652d0
RDX: ffff880178aa3000 RSI: ffff880159f652d0 RDI: ffff880087f09bf8
RBP: ffff8801b919b5e8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000001e00 R11: 00000000000150b0 R12: ffff880159f652d0
R13: ffff8801a0cae908 R14: ffff880087f09bf8 R15: ffff88018d177800
FS: 00007fc9b0b6b6e0(0000) GS:ffff88022fd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000040819c CR3: 0000000184017000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dd (pid: 9945, threadinfo ffff8801b919a000, task ffff880149a264c0)
Call Trace:
ocfs2_journal_dirty+0x2f/0x70 [ocfs2]
ocfs2_relink_block_group+0x111/0x480 [ocfs2]
ocfs2_search_chain+0x455/0x9a0 [ocfs2]
...
Signed-off-by: Xiaowei.Hu <xiaowei.hu@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbbf8555a9 upstream.
This patch adds missing bounds checking for the configfs provided
mapped_lun value during target_fabric_make_mappedlun() setup ahead
of se_lun_acl initialization.
This addresses a potential OOPs when using a mapped_lun value that
exceeds the hardcoded TRANSPORT_MAX_LUNS_PER_TPG-1 value within
se_node_acl->device_list[].
Reported-by: Jan Engelhardt <jengelh@inai.de>
Cc: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c10093692 upstream.
On non-BIOS platforms it is possible that the BIOS data area contains
garbage instead of being zeroed or something equivalent (firmware
people: we are talking of 1.5K here, so please do the sane thing.)
We need on the order of 20-30K of low memory in order to boot, which
may grow up to < 64K in the future. We probably want to avoid the
lowest of the low memory. At the same time, it seems extremely
unlikely that a legitimate EBDA would ever reach down to the 128K
(which would require it to be over half a megabyte in size.) Thus,
pick 128K as the cutoff for "this is insane, ignore." We may still
end up reserving a bunch of extra memory on the low megabyte, but that
is not really a major issue these days. In the worst case we lose
512K of RAM.
This code really should be merged with trim_bios_range() in
arch/x86/kernel/setup.c, but that is a bigger patch for a later merge
window.
Reported-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Link: http://lkml.kernel.org/n/tip-oebml055yyfm8yxmria09rja@git.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c189ea64e upstream.
Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
changed ftrace module notifier's priority to INT_MAX in order to
process the ftrace nops before anything else could touch them
(namely kprobes). This was the correct thing to do.
Unfortunately, the ftrace module notifier also contains the ftrace
clean up code. As opposed to the set up code, this code should be
run *after* all the module notifiers have run in case a module is doing
correct clean-up and unregisters its ftrace hooks. Basically, ftrace
needs to do clean up on module removal, as it needs to know about code
being removed so that it doesn't try to modify that code. But after it
removes the module from its records, if a ftrace user tries to remove
a probe, that removal will fail due as the record of that code segment
no longer exists.
Nothing really bad happens if the probe removal is called after ftrace
did the clean up, but the ftrace removal function will return an error.
Correct code (such as kprobes) will produce a WARN_ON() if it fails
to remove the probe. As people get annoyed by frivolous warnings, it's
best to do the ftrace clean up after everything else.
By splitting the ftrace_module_notifier into two notifiers, one that
does the module load setup that is run at high priority, and the other
that is called for module clean up that is run at low priority, the
problem is solved.
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e182bb38d7 upstream.
When idr_find() was fed a negative ID, it used to look up the ID
ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and
move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers
a WARN_ON_ONCE().
__lock_timer() feeds timer_id from userland directly to idr_find()
without sanitizing it which can trigger the above malfunctions. Add a
range check on @timer_id before invoking idr_find() in __lock_timer().
While timer_t is defined as int by all archs at the moment, Andrew
worries that it may be defined as a larger type later on. Make the
test cover larger integers too so that it at least is guaranteed to
not return the wrong timer.
Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional
precaution while moving away from ignoring MSB. Once it's gone we can
remove the guard as long as timer_t isn't larger than int.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f528d980c1 upstream.
When dma_ops are initialized the unity mappings are
created. The init_device_table_dma() function makes sure DMA
from all devices is blocked by default. This opens a short
window in time where DMA to unity mapped regions is blocked
by the IOMMU. Make sure this does not happen by initializing
the device table after dma_ops.
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3ad83d9ef upstream.
Otherwise, ext4 file systems with the quota featured enable will get a
very confusing "No such process" error message if the quota code is
built as a module and the quota_v2 module has not been loaded.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd060956c5 upstream.
1. The idProduct is little endian, so make sure its value to be
compatible with the current CPU. Make no break on big endian processors.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0475352326 upstream.
The module alias should be "ehci-omap" and not
"omap-ehci" to match the platform device name.
The omap-ehci module should now autoload correctly.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f3f687722 upstream.
The USB device descriptor of one identity presented by a few
Huawei morphing devices have serial functions with class codes
02/02/ff, indicating CDC ACM with a vendor specific protocol. This
combination is often used for MSFT RNDIS functions, and the CDC
ACM class driver will therefore ignore such functions.
The CDC ACM class driver cannot support functions with only 2
endpoints. The underlying serial functions of these modems are
also believed to be the same as for alternate device identities
already supported by the option driver. Letting the same driver
handle these functions independently of the current identity
ensures consistent handling and user experience.
There is no need to blacklist these devices in the rndis_host
driver. Huawei serial functions will either have only 2 endpoints
or a CDC ACM functional descriptor with bmCapabilities != 0, making
them correctly ignored as "non RNDIS" by that driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8f0302bbc upstream.
Adding three currently unsupported modems based on information
from .inf driver files:
Diag VID_1BBB&PID_0052&MI_00
AGPS VID_1BBB&PID_0052&MI_01
VOICE VID_1BBB&PID_0052&MI_02
AT VID_1BBB&PID_0052&MI_03
Modem VID_1BBB&PID_0052&MI_05
wwan VID_1BBB&PID_0052&MI_06
Diag VID_1BBB&PID_00B6&MI_00
AT VID_1BBB&PID_00B6&MI_01
Modem VID_1BBB&PID_00B6&MI_02
wwan VID_1BBB&PID_00B6&MI_03
Diag VID_1BBB&PID_00B7&MI_00
AGPS VID_1BBB&PID_00B7&MI_01
VOICE VID_1BBB&PID_00B7&MI_02
AT VID_1BBB&PID_00B7&MI_03
Modem VID_1BBB&PID_00B7&MI_04
wwan VID_1BBB&PID_00B7&MI_05
Updating the blacklist info for the X060S_X200 and X220_X500D,
reserving interfaces for a wwan driver, based on
wwan VID_1BBB&PID_0000&MI_04
wwan VID_1BBB&PID_0017&MI_06
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c419fcfd07 upstream.
When providers get blocked unregister_dca_providers() is called ending up
with dca_providers and dca_domain lists emptied. Dca should be prevented from
trying to unregister any provider if dca_domain list is found empty.
Reported-by: Jiang Liu <jiang.liu@huawei.com>
Tested-by: Gaohuai Han <hangaohuai@huawei.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <djbw@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08dcdbf6a7 ]
It looks like its possible to open thousands of TCP IPv6
sessions on a server, all landing in a single slot of TCP hash
table. Incoming packets have to lookup sockets in a very
long list.
We should hash all bits from foreign IPv6 addresses, using
a salt and hash mix, not a simple XOR.
inet6_ehashfn() can also separately use the ports, instead
of xoring them.
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e55f8b306 ]
If the credit timer is left armed after calling
xen_netbk_remove_xenvif(), then it may fire and attempt to schedule
the vif which will then oops as vif->netbk == NULL.
This may happen both in the fatal error path and during normal
disconnection from the front end.
The sequencing during shutdown is critical to ensure that: a)
vif->netbk doesn't become unexpectedly NULL; and b) the net device/vif
is not freed.
1. Mark as unschedulable (netif_carrier_off()).
2. Synchronously cancel the timer.
3. Remove the vif from the schedule list.
4. Remove it from it netback thread group.
5. Wait for vif->refcnt to become 0.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35876b5ffc ]
netbk_count_requests() could detect an error, call
netbk_fatal_tx_error() but return 0. The vif may then be used
afterwards (e.g., in a call to netbk_tx_error().
Since netbk_fatal_tx_error() could set vif->refcnt to 1, the vif may
be freed immediately after the call to netbk_fatal_tx_error() (e.g.,
if the vif is also removed).
Netback thread Xenwatch thread
-------------------------------------------
netbk_fatal_tx_err() netback_remove()
xenvif_disconnect()
...
free_netdev()
netbk_tx_err() Oops!
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Christopher S. Aker <caker@theshore.net>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 547b4e7181 ]
Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50e244cc79 upstream.
Adjust the console layer to allow a take over call where the caller
already holds the locks. Make the fb layer lock in order.
This is partly a band aid, the fb layer is terminally confused about the
locking rules it uses for its notifiers it seems.
[akpm@linux-foundation.org: remove stray non-ascii char, tidy comment]
[akpm@linux-foundation.org: export do_take_over_console()]
[airlied: cleanup another non-ascii char]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jiri Kosina <jkosina@suse.cz>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae1287865f upstream.
If grub2 loads efifb/vesafb, then when systemd starts it can set the console
font on that framebuffer device, however when we then load the native KMS
driver, the first thing it does is tear down the generic framebuffer driver.
The thing is the generic code is doing the right thing, it frees the font
because otherwise it would leak memory. However we can assume that if you
are removing the generic firmware driver (vesa/efi/offb), that a new driver
*should* be loading soon after, so we effectively leak the font.
However the old code left a dangling pointer in vc->vc_font.data and we
can now reuse that dangling pointer to load the font into the new
driver, now that we aren't freeing it.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=892340
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7139bc1579 upstream.
This patch goes a long way toward fixing the minifail bug, and
it significantly improves the stability of SMP machines such as
the rp3440. When write protecting a page for COW, we need to
purge the existing translation. Otherwise, the COW break
doesn't occur as expected because the TLB may still have a stale entry
which allows writes.
[jejb: fix up checkpatch errors]
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8520e443aa upstream.
Disable hard IRQ before kexec a new kernel image.
Not doing it can result in corrupted data in the memory segments
reserved for the new kernel.
Signed-off-by: Phileas Fogg <phileas-fogg@mail.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d107a20415 upstream.
The Chip Select Configuration Register must be programmed to 0x2 in
order to achieve the correct behavior of the Static Memory Controller.
Without this patch devices wired to DFI and accessed through SMC cannot
be accessed after resume from S2.
Do not rely on the boot loader to program the CSMSADRCFG register by
programming it in the kernel smemc module.
Signed-off-by: Igor Grinberg <grinberg@compulab.co.il>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae5943de8c upstream.
This error happens because PIPEnsControlOut and PIPEnsControlIn unlock the
spin lock for delay, letting in another thread.
The patch moves the current MP_SET_FLAG to before filling
of sUsbCtlRequest for pControlURB and clears it in event of failing.
Any thread calling either function while fMP_CONTROL_READS or fMP_CONTROL_WRITES
flags set will return STATUS_FAILURE.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 754ab5c0e5 upstream.
Comedi has two sorts of minor devices:
(a) normal board minor devices in the range 0 to
COMEDI_NUM_BOARD_MINORS-1 inclusive; and
(b) special subdevice minor devices in the range COMEDI_NUM_BOARD_MINORS
upwards that are used to open the same underlying comedi device as the
normal board minor devices, but with non-default read and write
subdevices for asynchronous commands.
The special subdevice minor devices get created when a board supporting
asynchronous commands is attached to a normal board minor device, and
destroyed when the board is detached from the normal board minor device.
One way to attach or detach a board is by using the COMEDI_DEVCONFIG
ioctl. This should only be used on normal board minors as the special
subdevice minors are too ephemeral. In particular, the change
introduced in commit 7d3135af39 ("staging:
comedi: prevent auto-unconfig of manually configured devices") breaks
horribly for special subdevice minor devices.
Since there's no legitimate use for the COMEDI_DEVCONFIG ioctl on a
special subdevice minor device node, disallow it and return -ENOTTY.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0720a06a75 upstream.
The utf8s_to_utf16s conversion routine needs to be improved. Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.
This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called. A
follow-on patch will add a third caller that does utilize the new
capabilities.
The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations. But that's a subject for a
different patch.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f23de52b6 upstream.
While looking at plymouth on udl I noticed that plymouth was trying
to use its fb plugin not its drm one, it was trying to drmOpen a driver called
usb not udl, noticed that we actually had out driver pointing at the wrong
device.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c49bafa384 upstream.
We added some more error handling in b40971426a "ext4: add error
checking to calls to ext4_handle_dirty_metadata()". But we need to
call kfree() as well to avoid a memory leak.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcf2d804ed upstream.
Some of the error path in ext4_fill_super don't release the
resouces properly. So this patch just try to release them
in the right way.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b531f81b0d upstream.
Commit 99fc86450c "ALSA: usb-mixer:
parse descriptors with structs" introduced a set of useful parsers
for descriptors. Unfortunately the parses for the Processing Unit
Descriptor came with a very subtle bug...
Functions uac_processing_unit_iProcessing() and
uac_processing_unit_specific() were indexing the baSourceID array
forgetting the fields before the iProcessing and process-specific
descriptors.
The problem was observed with Sound Blaster Extigy mixer,
where nNrModes in Up/Down-mix Processing Unit Descriptor
was accessed at offset 10 of the descriptor (value 0)
instead of offset 15 (value 7). In result the resulting
control had interesting limit values:
Simple mixer control 'Channel Routing Mode Select',0
Capabilities: volume volume-joined penum
Playback channels: Mono
Capture channels: Mono
Limits: 0 - -1
Mono: -1 [100%]
Fixed by starting from the bmControls, which was calculated
correctly, instead of baSourceID.
Now the mentioned control is fine:
Simple mixer control 'Channel Routing Mode Select',0
Capabilities: volume volume-joined penum
Playback channels: Mono
Capture channels: Mono
Limits: 0 - 6
Mono: 0 [0%]
Signed-off-by: Pawel Moll <mail@pawelmoll.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7da5804648 upstream.
The quirk for the Roland/Cakewalk A-PRO keyboards accidentally used the
wrong interface number, which prevented the driver from attaching to the
device.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 008e33f733 upstream.
Corrected USB ID for T-Com Sinus 154 data II. ISL3887-based. The
device was tested in managed mode with no security, WEP 128
bit and WPA-PSK (TKIP) with firmware 2.13.1.0.lm87.arm (md5sum:
7d676323ac60d6e1a3b6d61e8c528248). It works.
Signed-off-by: Tomasz Guszkowski <tsg@o2.pl>
Acked-By: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 666b3d803a upstream.
Currently, nlmclnt_lock will break out of the for(;;) loop when
the reclaimer wakes up the blocking lock thread by setting
nlm_lck_denied_grace_period. This causes the lock request to fail
with an ENOLCK error.
The intention was always to ensure that we resend the lock request
after the grace period has expired.
Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67d46b296a upstream.
Rob van der Heij reported the following (paraphrased) on private mail.
The scenario is that I want to avoid backups to fill up the page
cache and purge stuff that is more likely to be used again (this is
with s390x Linux on z/VM, so I don't give it as much memory that
we don't care anymore). So I have something with LD_PRELOAD that
intercepts the close() call (from tar, in this case) and issues
a posix_fadvise() just before closing the file.
This mostly works, except for small files (less than 14 pages)
that remains in page cache after the face.
Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.
The issue is the per-cpu pagevecs for LRU additions. If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release(). The user-visible effect is
that a program that uses fadvise() properly is not obeyed.
A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded. The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.
Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again. If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care. With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.
If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.
The following test program demonstrates the problem. It should never
report that pages are still resident but will without this patch. It
assumes that CPU 0 and 1 exist.
int main() {
int fd;
int pagesize = getpagesize();
ssize_t written = 0, expected;
char *buf;
unsigned char *vec;
int resident, i;
cpu_set_t set;
/* Prepare a buffer for writing */
expected = FILESIZE_PAGES * pagesize;
buf = malloc(expected + 1);
if (buf == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
buf[expected] = 0;
memset(buf, 'a', expected);
/* Prepare the mincore vec */
vec = malloc(FILESIZE_PAGES);
if (vec == NULL) {
printf("ENOMEM\n");
exit(EXIT_FAILURE);
}
/* Bind ourselves to CPU 0 */
CPU_ZERO(&set);
CPU_SET(0, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* open file, unlink and write buffer */
fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
if (fd == -1) {
perror("open");
exit(EXIT_FAILURE);
}
unlink("fadvise-test-file");
while (written < expected) {
ssize_t this_write;
this_write = write(fd, buf + written, expected - written);
if (this_write == -1) {
perror("write");
exit(EXIT_FAILURE);
}
written += this_write;
}
free(buf);
/*
* Force ourselves to another CPU. If fadvise only flushes the local
* CPUs pagevecs then the fadvise will fail to discard all file pages
*/
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(getpid(), sizeof(set), &set) == -1) {
perror("sched_setaffinity");
exit(EXIT_FAILURE);
}
/* sync and fadvise to discard the page cache */
fsync(fd);
if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
perror("posix_fadvise");
exit(EXIT_FAILURE);
}
/* map the file and use mincore to see which parts of it are resident */
buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
if (buf == NULL) {
perror("mmap");
exit(EXIT_FAILURE);
}
if (mincore(buf, expected, vec) == -1) {
perror("mincore");
exit(EXIT_FAILURE);
}
/* Check residency */
for (i = 0, resident = 0; i < FILESIZE_PAGES; i++) {
if (vec[i])
resident++;
}
if (resident != 0) {
printf("Nr unexpected pages resident: %d\n", resident);
exit(EXIT_FAILURE);
}
munmap(buf, expected);
close(fd);
free(vec);
exit(EXIT_SUCCESS);
}
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Rob van der Heij <rvdheij@gmail.com>
Tested-by: Rob van der Heij <rvdheij@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f00110f72 upstream.
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.
Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.
To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x
# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0
# mount -o remount,size=200M nodev /tmp/x
# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above
# dd if=/dev/zero of=/tmp/x/f count=1
# panic here
Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f
Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.
The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:
config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */
This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.
How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676a0675cf upstream.
Running the command:
inotifywait -e unmount /mnt/disk
immediately aborts with a -EINVAL return code. This is however a valid
parameter. This abort occurs only if unmount is the sole event
parameter. If other event parameters are supplied, then the unmount
event wait will work.
The problem was introduced by commit 44b350fc23 ("inotify: Fix mask
checks"). In that commit, it states:
The mask checks in inotify_update_existing_watch() and
inotify_new_watch() are useless because inotify_arg_to_mask()
sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
But instead of removing the useless checks, it did this:
mask = inotify_arg_to_mask(arg);
- if (unlikely(!mask))
+ if (unlikely(!(mask & IN_ALL_EVENTS)))
return -EINVAL;
The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the
check should be:
if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))
But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
anyway, so the check is always going to pass and thus should simply be
removed. Also note that inotify_arg_to_mask completely controls what
mask bits get set from arg, there's no way for invalid bits to get
enabled there.
Lets fix it by simply removing the useless broken checks.
Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15bc8d8457 upstream.
On store status we need to copy the current state of registers
into a save area. Currently we might save stale versions:
The sie state descriptor doesnt have fields for guest ACRS,FPRS,
those registers are simply stored in the host registers. The host
program must copy these away if needed. We do that in vcpu_put/load.
If we now do a store status in KVM code between vcpu_put/load, the
saved values are not up-to-date. Lets collect the ACRS/FPRS before
saving them.
This also fixes some strange problems with hotplug and virtio-ccw,
since the low level machine check handler (on hotplug a machine check
will happen) will revalidate all registers with the content of the
save area.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55c171a6d9 upstream.
Running under a kvm host does not necessarily imply the presence of
a page mapped above the main memory with the virtio information;
however, the code includes a hard coded access to that page.
Instead, check for the presence of the page and exit gracefully
before we hit an addressing exception if it does not exist.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 751efd8610 upstream.
There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().
Assume two tasks, one calling mmu_notifier_unregister() as a result of a
filp_close() ->flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).
A B
t1 srcu_read_lock()
t2 if (!hlist_unhashed())
t3 srcu_read_unlock()
t4 srcu_read_lock()
t5 hlist_del_init_rcu()
t6 synchronize_srcu()
t7 srcu_read_unlock()
t8 hlist_del_rcu() <--- NULL pointer deref.
Additionally, the list traversal in __mmu_notifier_release() is not
protected by the by the mmu_notifier_mm->hlist_lock which can result in
callouts to the ->release() notifier from both mmu_notifier_unregister()
and __mmu_notifier_release().
-stable suggestions:
The stable trees prior to 3.7.y need commits 21a92735f6 and
70400303ce cherry-picked in that order prior to cherry-picking this
commit. The 3.7.y tree already has those two commits.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Sagi Grimberg <sagig@mellanox.co.il>
Cc: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21a92735f6 upstream.
With an RCU based mmu_notifier implementation, any callout to
mmu_notifier_invalidate_range_{start,end}() or
mmu_notifier_invalidate_page() would not be allowed to call schedule()
as that could potentially allow a modification to the mmu_notifier
structure while it is currently being used.
Since srcu allocs 4 machine words per instance per cpu, we may end up
with memory exhaustion if we use srcu per mm. So all mms share a global
srcu. Note that during large mmu_notifier activity exit & unregister
paths might hang for longer periods, but it is tolerable for current
mmu_notifier clients.
Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fa3e78be7 upstream.
A bus_type has a list of devices (klist_devices), but the list and the
subsys_private structure that contains it are not initialized until the
bus_type is registered with bus_register().
The panic/reboot path has fixups that look up devices in pci_bus_type. If
we panic before registering pci_bus_type, the bus_type exists but the list
does not, so mach_reboot_fixups() trips over a null pointer and panics
again:
mach_reboot_fixups
pci_get_device
..
bus_find_device(&pci_bus_type, ...)
bus->p is NULL
Joonsoo reported a problem when panicking before PCI was initialized.
I think this patch should be sufficient to replace the patch he posted
here: https://lkml.org/lkml/2012/12/28/75 ("[PATCH] x86, reboot: skip
reboot_fixups in early boot phase")
Reported-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 76eaca031f upstream.
There is a loophole between Xen's current implementation of
pv-spinlocks and the scheduler. This was triggerable through
a testcase until v3.6 changed the TLB flushing code. The
problem potentially is still there just not observable in the
same way.
What could happen was (is):
1. CPU n tries to schedule task x away and goes into a slow
wait for the runq lock of CPU n-# (must be one with a lower
number).
2. CPU n-#, while processing softirqs, tries to balance domains
and goes into a slow wait for its own runq lock (for updating
some records). Since this is a spin_lock_irqsave in softirq
context, interrupts will be re-enabled for the duration of
the poll_irq hypercall used by Xen.
3. Before the runq lock of CPU n-# is unlocked, CPU n-1 receives
an interrupt (e.g. endio) and when processing the interrupt,
tries to wake up task x. But that is in schedule and still
on_cpu, so try_to_wake_up goes into a tight loop.
4. The runq lock of CPU n-# gets unlocked, but the message only
gets sent to the first waiter, which is CPU n-# and that is
busily stuck.
5. CPU n-# never returns from the nested interruption to take and
release the lock because the scheduler uses a busy wait.
And CPU n never finishes the task migration because the unlock
notification only went to CPU n-#.
To avoid this and since the unlocking code has no real sense of
which waiter is best suited to grab the lock, just send the IPI
to all of them. This causes the waiters to return from the hyper-
call (those not interrupted at least) and do active spinlocking.
BugLink: http://bugs.launchpad.net/bugs/1011792
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f49a59c447 upstream.
According to the other code in this driver and similar
code in rme96 it seems, that spin_lock_irq in
snd_rme32_capture_close function should be paired
with spin_unlock_irq.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dacae5a19b upstream.
snd_ali_pointer function is called with local
interrupts disabled. However it seems very strange to
reenable them in such way.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22affe0ae upstream.
hrtimer_enqueue_reprogram contains a race which could result in
timer.base switch during unlock/lock sequence.
hrtimer_enqueue_reprogram is releasing the lock protecting the timer
base for calling raise_softirq_irqsoff() due to a lock ordering issue
versus rq->lock.
If during that time another CPU calls __hrtimer_start_range_ns() on
the same hrtimer, the timer base might switch, before the current CPU
can lock base->lock again and therefor the unlock_timer_base() call
will unlock the wrong lock.
[ tglx: Added comment and massaged changelog ]
Signed-off-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e716efde75 upstream.
commit 52553ddf(genirq: fix regression in irqfixup, irqpoll)
introduced a potential deadlock by calling the action handler with the
irq descriptor lock held.
Remove the call and let the handling code run even for an interrupt
where only a single action is registered. That matches the goal of
the above commit and avoids the deadlock.
Document the confusing action = desc->action reload in the handling
loop while at it.
Reported-and-tested-by: "Wang, Warner" <warner.wang@hp.com>
Tested-by: Edward Donovan <edward.donovan@numble.net>
Cc: "Wang, Song-Bo (Stoney)" <song-bo.wang@hp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63a3f60341 upstream.
defined(@array) is deprecated in Perl and gives off a warning.
Restructure the code to remove that warning.
[ hpa: it would be interesting to revert to the timeconst.bc script.
It appears that the failures reported by akpm during testing of
that script was due to a known broken version of make, not a problem
with bc. The Makefile rules could probably be restructured to avoid
the make bug, or it is probably old enough that it doesn't matter. ]
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7c45512df9 upstream.
Commit c060f943d0 ("mm: use aligned zone start for pfn_to_bitidx
calculation") fixed out calculation of the index into the pageblock
bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.
However, the _allocation_ of that bitmap had never taken this alignment
requirement into accout, so depending on the exact size and alignment of
the zone, the use of that index could then access past the allocation,
resulting in some very subtle memory corruption.
This was reported (and bisected) by Ingo Molnar: one of his random
config builds would hang with certain very specific kernel command line
options.
In the meantime, commit c060f943d0 has been marked for stable, so this
fix needs to be back-ported to the stable kernels that backported the
commit to use the right alignment.
Bisected-and-tested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch corrects a buffer overflow in kernels from 3.0 to 3.4 when calling
log_prefix() function from call_console_drivers().
This bug existed in previous releases but has been revealed with commit
162a7e7500 (2.6.39 => 3.0) that made changes
about how to allocate memory for early printk buffer (use of memblock_alloc).
It disappears with commit 7ff9554bb5 (3.4 => 3.5)
that does a refactoring of printk buffer management.
In log_prefix(), the access to "p[0]", "p[1]", "p[2]" or
"simple_strtoul(&p[1], &endp, 10)" may cause a buffer overflow as this
function is called from call_console_drivers by passing "&LOG_BUF(cur_index)"
where the index must be masked to do not exceed the buffer's boundary.
The trick is to prepare in call_console_drivers() a buffer with the necessary
data (PRI field of syslog message) to be safely evaluated in log_prefix().
This patch can be applied to stable kernel branches 3.0.y, 3.2.y and 3.4.y.
Without this patch, one can freeze a server running this loop from shell :
$ export DUMMY=`cat /dev/urandom | tr -dc '12345AZERTYUIOPQSDFGHJKLMWXCVBNazertyuiopqsdfghjklmwxcvbn' | head -c255`
$ while true do ; echo $DUMMY > /dev/kmsg ; done
The "server freeze" depends on where memblock_alloc does allocate printk buffer :
if the buffer overflow is inside another kernel allocation the problem may not
be revealed, else the server may hangs up.
Signed-off-by: Alexandre SIMON <Alexandre.Simon@univ-lorraine.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae1c07a6b7 upstream.
For some reason the reading of the RQDPC register was being artificially
limited to 4K. Instead of limiting the value we should read the value and
add the full amount. Otherwise this can lead to a misleading number of
dropped packets when the actual value is in fact much higher.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Vinson Lee <vlee@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 249bfb83cf upstream.
Devices are added to pci_pme_list when drivers use pci_enable_wake()
or pci_wake_from_d3(), but they aren't removed from the list unless
the driver explicitly disables wakeup. Many drivers never disable
wakeup, so their devices remain on the list even after they are
removed, e.g., via hotplug. A subsequent PME poll will oops when
it tries to touch the device.
This patch disables PME# on a device before removing it, which removes
the device from pci_pme_list. This is safe even if the device never
had PME# enabled.
This oops can be triggered by unplugging a Thunderbolt ethernet adapter
on a Macbook Pro, as reported by Daniel below.
[bhelgaas: changelog]
Reference: http://lkml.kernel.org/r/CAMVG2svG21yiM1wkH4_2pen2n+cr2-Zv7TbH3Gj+8MwevZjDbw@mail.gmail.com
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 13d2b4d11d upstream.
This fixes CVE-2013-0228 / XSA-42
Drew Jones while working on CVE-2013-0190 found that that unprivileged guest user
in 32bit PV guest can use to crash the > guest with the panic like this:
-------------
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/vbd-51712/block/xvda/dev
Modules linked in: sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront ext4
mbcache jbd2 xen_blkfront dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Pid: 1250, comm: r Not tainted 2.6.32-356.el6.i686 #1
EIP: 0061:[<c0407462>] EFLAGS: 00010086 CPU: 0
EIP is at xen_iret+0x12/0x2b
EAX: eb8d0000 EBX: 00000001 ECX: 08049860 EDX: 00000010
ESI: 00000000 EDI: 003d0f00 EBP: b77f8388 ESP: eb8d1fe0
DS: 0000 ES: 007b FS: 0000 GS: 00e0 SS: 0069
Process r (pid: 1250, ti=eb8d0000 task=c2953550 task.ti=eb8d0000)
Stack:
00000000 0027f416 00000073 00000206 b77f8364 0000007b 00000000 00000000
Call Trace:
Code: c3 8b 44 24 18 81 4c 24 38 00 02 00 00 8d 64 24 30 e9 03 00 00 00
8d 76 00 f7 44 24 08 00 00 02 80 75 33 50 b8 00 e0 ff ff 21 e0 <8b> 40
10 8b 04 85 a0 f6 ab c0 8b 80 0c b0 b3 c0 f6 44 24 0d 02
EIP: [<c0407462>] xen_iret+0x12/0x2b SS:ESP 0069:eb8d1fe0
general protection fault: 0000 [#2]
---[ end trace ab0d29a492dcd330 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1250, comm: r Tainted: G D ---------------
2.6.32-356.el6.i686 #1
Call Trace:
[<c08476df>] ? panic+0x6e/0x122
[<c084b63c>] ? oops_end+0xbc/0xd0
[<c084b260>] ? do_general_protection+0x0/0x210
[<c084a9b7>] ? error_code+0x73/
-------------
Petr says: "
I've analysed the bug and I think that xen_iret() cannot cope with
mangled DS, in this case zeroed out (null selector/descriptor) by either
xen_failsafe_callback() or RESTORE_REGS because the corresponding LDT
entry was invalidated by the reproducer. "
Jan took a look at the preliminary patch and came up a fix that solves
this problem:
"This code gets called after all registers other than those handled by
IRET got already restored, hence a null selector in %ds or a non-null
one that got loaded from a code or read-only data descriptor would
cause a kernel mode fault (with the potential of crashing the kernel
as a whole, if panic_on_oops is set)."
The way to fix this is to realize that the we can only relay on the
registers that IRET restores. The two that are guaranteed are the
%cs and %ss as they are always fixed GDT selectors. Also they are
inaccessible from user mode - so they cannot be altered. This is
the approach taken in this patch.
Another alternative option suggested by Jan would be to relay on
the subtle realization that using the %ebp or %esp relative references uses
the %ss segment. In which case we could switch from using %eax to %ebp and
would not need the %ss over-rides. That would also require one extra
instruction to compensate for the one place where the register is used
as scaled index. However Andrew pointed out that is too subtle and if
further work was to be done in this code-path it could escape folks attention
and lead to accidents.
Reviewed-by: Petr Matousek <pmatouse@redhat.com>
Reported-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ee364eb31 upstream.
A user reported the following oops when a backup process reads
/proc/kcore:
BUG: unable to handle kernel paging request at ffffbb00ff33b000
IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
[...]
Call Trace:
[<ffffffff811b8aaa>] read_kcore+0x17a/0x370
[<ffffffff811ad847>] proc_reg_read+0x77/0xc0
[<ffffffff81151687>] vfs_read+0xc7/0x130
[<ffffffff811517f3>] sys_read+0x53/0xa0
[<ffffffff81449692>] system_call_fastpath+0x16/0x1b
Investigation determined that the bug triggered when reading
system RAM at the 4G mark. On this system, that was the first
address using 1G pages for the virt->phys direct mapping so the
PUD is pointing to a physical address, not a PMD page.
The problem is that the page table walker in kern_addr_valid() is
not checking pud_large() and treats the physical address as if
it was a PMD. If it happens to look like pmd_none then it'll
silently fail, probably returning zeros instead of real data. If
the data happens to look like a present PMD though, it will be
walked resulting in the oops above.
This patch adds the necessary pud_large() check.
Unfortunately the problem was not readily reproducible and now
they are running the backup program without accessing
/proc/kcore so the patch has not been validated but I think it
makes sense.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.coM>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48856286b6 ]
A buggy or malicious frontend should not be able to confuse netback.
If we spot anything which is not as it should be then shutdown the
device and don't try to continue with the ring in a potentially
hostile state. Well behaved and non-hostile frontends will not be
penalised.
As well as making the existing checks for such errors fatal also add a
new check that ensures that there isn't an insane number of requests
on the ring (i.e. more than would fit in the ring). If the ring
contains garbage then previously is was possible to loop over this
insane number, getting an error each time and therefore not generating
any more pending requests and therefore not exiting the loop in
xen_netbk_tx_build_gops for an externded period.
Also turn various netdev_dbg calls which no precipitate a fatal error
into netdev_err, they are rate limited because the device is shutdown
afterwards.
This fixes at least one known DoS/softlockup of the backend domain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit daf3ec688e ]
TG3_PHY_AUXCTL_SMDSP_ENABLE/DISABLE macros do a blind write to the phy
auxiliary control register and overwrite the EXT_PKT_LEN (bit 14) resulting
in intermittent crc errors on jumbo frames with some link partners. Change
the code to do a read/modify/write.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c13cb8bb4 ]
When netconsole is enabled, logging messages generated during tg3_open
can result in a null pointer dereference for the uninitialized tg3
status block. Use the irq_sync flag to disable polling in the early
stages. irq_sync is cleared when the driver is enabling interrupts after
all initialization is completed.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6caab7b054 ]
If lower layer driver leaves the ip header in the skb fragment, it needs to
be first pulled into skb->data before inspecting ip header length or ip version
number.
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae62ca7b03 ]
commit 35f9c09fe9 (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.
The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.
But some programs pass an arbitrary high value, and the test fails.
The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.
We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)
Many thanks to Willy for providing very clear bug report, bisection
and test programs.
Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6731d2095b ]
There are transients during normal FRTO procedure during which
the packets_in_flight can go to zero between write_queue state
updates and firing the resulting segments out. As FRTO processing
occurs during that window the check must be more precise to
not match "spuriously" :-). More specificly, e.g., when
packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
branch that set cwnd into zero would not be taken and new segments
might be sent out later.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2e5f421211 ]
Commit 9dc274151a (tcp: fix ABC in tcp_slow_start())
uncovered a bug in FRTO code :
tcp_process_frto() is setting snd_cwnd to 0 if the number
of in flight packets is 0.
As Neal pointed out, if no packet is in flight we lost our
chance to disambiguate whether a loss timeout was spurious.
We should assume it was a proper loss.
Reported-by: Pasi Kärkkäinen <pasik@iki.fi>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b5c37fe6e2 ]
On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6ba542a291 ]
In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f94aabd9f ]
Jamie Parsons reported a problem recently, in which the re-initalization of an
association (The duplicate init case), resulted in a loss of receive window
space. He tracked down the root cause to sctp_outq_teardown, which discarded
all the data on an outq during a re-initalization of the corresponding
association, but never reset the outq->outstanding_data field to zero. I wrote,
and he tested this fix, which does a proper full re-initalization of the outq,
fixing this problem, and hopefully future proofing us from simmilar issues down
the road.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab54ee80aa ]
We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.
So simply rename the iphase typedef to a new name. Fixes this compile error:
In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
from next/arch/s390/include/asm/lowcore.h:12,
from next/arch/s390/include/asm/thread_info.h:30,
from include/linux/thread_info.h:54,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:29,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9665d5d624 ]
When releasing a packet socket, the routine packet_set_ring() is reused
to free rings instead of allocating them. But when calling it for the
first time, it fills req->tp_block_nr with the value of rb->pg_vec_len
which in the second invocation makes it bail out since req->tp_block_nr
is greater zero but req->tp_block_size is zero.
This patch solves the problem by passing a zeroed auto-variable to
packet_set_ring() upon each invocation from packet_release().
As far as I can tell, this issue exists even since 69e3c75 (net: TX_RING
and packet mmap), i.e. the original inclusion of TX ring support into
af_packet, but applies only to sockets with both RX and TX ring
allocated, which is probably why this was unnoticed all the time.
Signed-off-by: Phil Sutter <phil.sutter@viprinet.com>
Cc: Johann Baudy <johann.baudy@gnu-log.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd30e94720 ]
They will be created at output, if ever needed. This avoids creating
empty neighbor entries when TPROXYing/Forwarding packets for addresses
that are not even directly reachable.
Note that IPv4 already handles it this way. No neighbor entries are
created for local input.
Tested by myself and customer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 604dfd6efc ]
The return value of pktgen_add_device() is not checked, so
even if we fail to add some device, for example, non-exist one,
we still see "OK:...". This patch fixes it.
After this patch, I got:
# echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0
-bash: echo: write error: No such device
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped:
Result: ERROR: can not add device non-exist
# echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
# cat /proc/net/pktgen/kpktgend_0
Running:
Stopped: eth0
Result: OK: add_device=eth0
(Candidate for -stable)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 794ed393b7 ]
Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.
We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.
It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().
As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.
[<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
[<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
[<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
[<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
[<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
[<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
[<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
[<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
[<ffffffff814ad90e>] ip_finish_output+0x63/0x68
[<ffffffff814ae412>] ip_output+0x61/0x67
[<ffffffff814ab904>] dst_output+0x17/0x1b
[<ffffffff814adb6d>] ip_local_out+0x1e/0x23
[<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
[<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
[<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
[<ffffffff814c3571>] tcp_connect+0x53c/0x587
[<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
[<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
[<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
[<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
[<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
[<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
[<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
[<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
[<ffffffff814632c6>] sys_connect+0x75/0x98
This bug was introduced in linux-2.6.35, in commit
7fee226ad2 (net: add a noref bit on skb dst)
skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5d0feaff23 ]
This was introduced in commit 6dccd16 "r8169: merge with version
6.001.00 of Realtek's r8169 driver". I did not find the version
6.001.00 online, but in 6.002.00 or any later r8169 from Realtek
this hunk is no longer present.
Also commit 05af214 "r8169: fix Ethernet Hangup for RTL8110SC
rev d" claims to have fixed this issue otherwise.
The magic compare mask of 0xfffe000 is dubious as it masks
parts of the Reserved part, and parts of the VLAN tag. But this
does not make much sense as the VLAN tag parts are perfectly
valid there. In matter of fact this seems to be triggered with
any VLAN tagged packet as RxVlanTag bit is matched. I would
suspect 0xfffe0000 was intended to test reserved part only.
Finally, this hunk is evil as it can cause more packets to be
handled than what was NAPI quota causing net/core/dev.c:
net_rx_action(): WARN_ON_ONCE(work > weight) to trigger, and
mess up the NAPI state causing device to hang.
As result, any system using VLANs and having high receive
traffic (so that NAPI poll budget limits rtl_rx) would result
in device hang.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d721a1752b ]
If subtracting 12 from l leaves zero we'd do a zero size allocation,
leading to an oops later when we try to set the NUL terminator.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit adbbf69d1a ]
I changed my email because the vyatta.com mail server is now
redirected to brocade.com; and the Brocade mail system
is not friendly to Linux desktop users.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aacde9ee45 upstream.
Since:
commit b23b025fe2
Author: Ben Greear <greearb@candelatech.com>
Date: Fri Feb 4 11:54:17 2011 -0800
mac80211: Optimize scans on current operating channel.
we do not disable PS while going back to operational channel (on
ieee80211_scan_state_suspend) and deffer that until scan finish.
But since we are allowed to send frames, we can send a frame to AP
without PM bit set, so disable PS on AP side. Then when we switch
to off-channel (in ieee80211_scan_state_resume) we do not enable PS.
Hence we are off-channel with PS disabled, frames are not buffered
by AP.
To fix remove offchannel_ps_disable argument and always enable PS when
going off-channel and disable it when going on-channel, like it was
before.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4965f5667f upstream.
Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep. Convert the recursive calls to an
iterative loop to avoid the problem.
Tested on a machine containing 135 regions. The kernel no longer panicked
with stack overflow.
Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.
Signed-off-by: T Makphaibulchoke <tmac@hp.com>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@gmail.com>
Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48c3375c5f upstream.
This patch (as1640) fixes a memory leak in xhci-hcd. The urb_priv
data structure isn't always deallocated in the handle_tx_event()
routine for non-control transfers. The patch adds a kfree() call so
that all paths end up freeing the memory properly.
This patch should be backported to kernels as old as 2.6.36, that
contain the commit 8e51adccd4 "USB: xHCI:
Introduce urb_priv structure"
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-and-tested-by: Martin Mokrejs <mmokrejs@fold.natur.cuni.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 760973d2a7 upstream.
An isochronous TD is comprised of one isochronous TRB chained to zero or
more normal TRBs. Only the isoc TRB has the TBC and TLBPC fields. The
normal TRBs must set those fields to zeroes. The code was setting the
TBC and TLBPC fields for both isoc and normal TRBs. Fix this.
This should be backported to stable kernels as old as 3.0, that contain
the commit b61d378f2d " xhci 1.0: Set
transfer burst last packet count field."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 200e0d994d upstream.
1. Optimize the match rules with new macro for Huawei USB storage devices,
to avoid to load USB storage driver for the modem interface
with Huawei devices.
2. Add to support new switch command for new Huawei USB dongles.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07c7be3d87 upstream.
1. Define a new macro for USB storage match rules:
matching with Vendor ID and interface descriptors.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e619d0415 upstream.
This patch (as1654) fixes a very old bug in ehci-hcd, connected with
scheduling of periodic split transfers. The calculations for
full/low-speed bus usage are all carried out after the correction for
bit-stuffing has been applied, but the values in the max_tt_usecs
array assume it hasn't been. The array should allow for allocation of
up to 90% of the bus capacity, which is 900 us, not 780 us.
The symptom caused by this bug is that any isochronous transfer to a
full-speed device with a maxpacket size larger than about 980 bytes is
always rejected with a -ENOSPC error.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9bae18954 upstream.
There exists a situation when GC can work in background alone without
any other filesystem activity during significant time.
The nilfs_clean_segments() method calls nilfs_segctor_construct() that
updates superblocks in the case of NILFS_SC_SUPER_ROOT and
THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the
nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
As a result, the update of superblocks doesn't occurred all this time
and in the case of SPOR superblocks keep very old values of last super
root placement.
SYMPTOMS:
Trying to mount a NILFS2 volume after SPOR in such environment ends with
very long mounting time (it can achieve about several hours in some
cases).
REPRODUCING PATH:
1. It needs to use external USB HDD, disable automount and doesn't
make any additional filesystem activity on the NILFS2 volume.
2. Generate temporary file with size about 100 - 500 GB (for example,
dd if=/dev/zero of=<file_name> bs=1073741824 count=200). The size of
file defines duration of GC working.
3. Then it needs to delete file.
4. Start GC manually by means of command "nilfs-clean -p 0". When you
start GC by means of such way then, at the end, superblocks is updated
by once. So, for simulation of SPOR, it needs to wait sometime (15 -
40 minutes) and simply switch off USB HDD manually.
5. Switch on USB HDD again and try to mount NILFS2 volume. As a
result, NILFS2 volume will mount during very long time.
REPRODUCIBILITY: 100%
FIX:
This patch adds checking that superblocks need to update and set
THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.
Reported-by: Sergey Alexandrov <splavgm@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c903f0456b upstream.
At the moment the MSR driver only relies upon file system
checks. This means that anything as root with any capability set
can write to MSRs. Historically that wasn't very interesting but
on modern processors the MSRs are such that writing to them
provides several ways to execute arbitary code in kernel space.
Sample code and documentation on doing this is circulating and
MSR attacks are used on Windows 64bit rootkits already.
In the Linux case you still need to be able to open the device
file so the impact is fairly limited and reduces the security of
some capability and security model based systems down towards
that of a generic "root owns the box" setup.
Therefore they should require CAP_SYS_RAWIO to prevent an
elevation of capabilities. The impact of this is fairly minimal
on most setups because they don't have heavy use of
capabilities. Those using SELinux, SMACK or AppArmor rules might
want to consider if their rulesets on the MSR driver could be
tighter.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f44310b98d upstream.
I get the following warning every day with v3.7, once or
twice a day:
[ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
As explained by Linus as well:
|
| Once we've done the "list_add_rcu()" to add it to the
| queue, we can have (another) IPI to the target CPU that can
| now see it and clear the mask.
|
| So by the time we get to actually send the IPI, the mask might
| have been cleared by another IPI.
|
This patch also fixes a system hang problem, if the data->cpumask
gets cleared after passing this point:
if (WARN_ONCE(!mask, "empty IPI mask"))
return;
then the problem in commit 83d349f35e ("x86: don't send an IPI to
the empty set of CPU's") will happen again.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: peterz@infradead.org
Cc: mina86@mina86.org
Cc: srivatsa.bhat@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20130126075357.GA3205@udknight
[ Tidied up the changelog and the comment in the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a9ab9bdb3 upstream.
The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.
Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:
$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af
("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)
Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d56268fb10 upstream.
Commit 23caaf19b1 (ALSA: usb-mixer: Add support for Audio Class v2.0)
forgot to adjust the length check for UAC 2.0 feature unit descriptors.
This would make the code abort on encountering a feature unit without
per-channel controls, and thus prevented the driver to work with any
device having such a unit, such as the RME Babyface or Fireface UCX.
Reported-by: Florian Hanisch <fhanisch@uni-potsdam.de>
Tested-by: Matthew Robbetts <wingfeathera@gmail.com>
Tested-by: Michael Beer <beerml@sigma6audio.de>
Cc: Daniel Mack <daniel@caiaq.de>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1adb2e2b5f upstream.
When the next beacon is sent, the ath_buf from the previous run is reused.
If getting a new beacon from mac80211 fails, bf->bf_mpdu is not reset, yet
the skb is freed, leading to a double-free on the next beacon tx attempt,
resulting in a system crash.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 15653371c6 upstream.
Subhash Jadavani reported this partial backtrace:
Now consider this call stack from MMC block driver (this is on the ARMv7
based board):
[<c001b50c>] (v7_dma_inv_range+0x30/0x48) from [<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c)
[<c0017b8c>] (dma_cache_maint_page+0x1c4/0x24c) from [<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c)
[<c0017c28>] (___dma_page_cpu_to_dev+0x14/0x1c) from [<c0017ff8>] (dma_map_sg+0x3c/0x114)
This is caused by incrementing the struct page pointer, and running off
the end of the sparsemem page array. Fix this by incrementing by pfn
instead, and convert the pfn to a struct page.
Suggested-by: James Bottomley <JBottomley@Parallels.com>
Tested-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10b8c7dff5 upstream.
When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee50e135ae upstream.
Errors in CAN protocol (location) are reported in data[3] of the can
frame instead of data[2].
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 891348ca0f upstream.
We found a user code which was raising a divide-by-zero trap. That trap
would lead to XPC connections between system-partitions being torn down
due to the die_chain notifier callouts it received.
This also revealed a different issue where multiple callers into
xpc_die_deactivate() would all attempt to do the disconnect in parallel
which would sometimes lock up but often overwhelm the console on very
large machines as each would print at least one line of output at the
end of the deactivate.
I reviewed all the users of the die_chain notifier and changed the code
to ignore the notifier callouts for reasons which will not actually lead
to a system to continue on to call die().
[akpm@linux-foundation.org: fix ia64]
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e7328ae184 upstream.
With virtual machines like qemu, it's pretty common to see "too much
work for irq4" messages nowadays. This happens when a bunch of output
is printed on the emulated serial console. This is caused by too low
PASS_LIMIT. When ISR loops more than the limit, it spits the message.
I've been using a kernel with doubled the limit and I couldn't see no
problems. Maybe it's time to get rid of the message now?
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ram Gupta <ram.gupta5@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f9c9cbb60 upstream.
The right dmi version is in SMBIOS if it's zero in DMI region
This issue was originally found from an oracle bug.
One customer noticed system UUID doesn't match between dmidecode & uek2.
- HP ProLiant BL460c G6 :
# cat /sys/devices/virtual/dmi/id/product_uuid
00000000-0000-4C48-3031-4D5030333531
# dmidecode | grep -i uuid
UUID: 00000000-0000-484C-3031-4D5030333531
From SMBIOS 2.6 on, spec use little-endian encoding for UUID other than
network byte order.
So we need to get dmi version to distinguish. If version is 0.0, the
real version is taken from the SMBIOS version. This is part of original
kernel comment in code.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1d8e614d7 upstream.
As of version 2.6 of the SMBIOS specification, the first 3 fields of the
UUID are supposed to be little-endian encoded.
Also a minor fix to match variable meaning and mute checkpatch.pl
[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Feng Jin <joe.jin@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Abdallah Chatila <abdallah.chatila@ericsson.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afd5e34b2b upstream.
scsi_register_driver will register a prep_fn() function, which
in turn migh need to use the sd_cdp_pool for DIF.
Which hasn't been initialised at this point, leading to
a crash. So reshuffle the init_sd() and exit_sd() paths
to have the driver registered last.
Signed-off-by: Joel D. Diaz <joeldiaz@us.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f815a0a70 upstream.
This patch (as1644) fixes a race that occurs during startup in
uhci-hcd. If the IRQ line is shared with other devices, it's possible
for the handler routine to be called before the data structures are
fully initialized.
The problem is fixed by adding a check to the IRQ handler routine. If
the initialization hasn't finished yet, the routine will return
immediately.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Don Zickus <dzickus@redhat.com>
Tested-by: "Huang, Adrian (ISS Linux TW)" <adrian.huang@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e16721498 upstream.
Right now using pcie_aspm=force will not enable ASPM if the FADT indicates
ASPM is unsupported. However, the semantics of force should probably allow
for this, especially as they did before 3c076351c4 ("PCI: Rework ASPM
disable code")
This patch just skips the clearing of any ASPM setup that the firmware has
carried out on this bus if pcie_aspm=force is being used.
Reference: http://bugs.launchpad.net/bugs/962038
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c1bf08ac26 upstream.
If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.
Here's the error:
WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Hardware name: Bochs
Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Call Trace:
[<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810fd189>] ftrace_bug+0x239/0x280
[<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
[<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
[<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
[<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
[<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff810c2a23>] sys_init_module+0x73/0x220
[<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
---[ end trace 9ef46351e53bbf80 ]---
ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
actual: cc:bb:d2:4b:e1
A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).
Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.
The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.
Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit 262b6d363f upstream.
In the slow path, we are forced to copy the relocations prior to
acquiring the struct mutex in order to handle pagefaults. We forgo
copying the new offsets back into the relocation entries in order to
prevent a recursive locking bug should we trigger a pagefault whilst
holding the mutex for the reservations of the execbuffer. Therefore, we
need to reset the presumed_offsets just in case the objects are rebound
back into their old locations after relocating for this exexbuffer - if
that were to happen we would assume the relocations were valid and leave
the actual pointers to the kernels dangling, instant hang.
Fixes regression from commit bcf50e2775
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sun Nov 21 22:07:12 2010 +0000
drm/i915: Handle pagefaults in execbuffer user relocations
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@fwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
commit 1ee4c55fc9 upstream.
vt6656 has several headers that use the #pragma pack(1) directive to
enable structure packing, but never disable it. The layout of
structures defined in other headers can then depend on which order the
various headers are included in, breaking the One Definition Rule.
In practice this resulted in crashes on x86_64 until the order of header
inclusion was changed for some files in commit 11d404cb56 ('staging:
vt6656: fix headers and add cfg80211.'). But we need a proper fix that
won't be affected by future changes to the order of inclusion.
This removes the #pragma pack(1) directives and adds __packed to the
structure definitions for which packing appears to have been intended.
Reported-and-tested-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 014b9b4ce8 upstream.
When shut down SPI port, it's possible that MRDY has been asserted and a SPI
timer was activated waiting for SRDY assert, in the case, it needs to delete
this timer.
Signed-off-by: Chen Jun <jun.d.chen@intel.com>
Signed-off-by: channing <chao.bi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
commit 2291dff02e upstream.
The driver description files gives these names to the vendor specific
functions on this modem:
Diag VID_19D2&PID_0265&MI_00
NMEA VID_19D2&PID_0265&MI_01
AT cmd VID_19D2&PID_0265&MI_02
Modem VID_19D2&PID_0265&MI_03
Net VID_19D2&PID_0265&MI_04
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99beb2e968 upstream.
The driver description files gives these names to the vendor specific
functions on this modem:
Diagnostics VID_2357&PID_0201&MI_00
NMEA VID_2357&PID_0201&MI_01
Modem VID_2357&PID_0201&MI_03
Networkcard VID_2357&PID_0201&MI_04
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9174adbee4 upstream.
This fixes CVE-2013-0190 / XSA-40
There has been an error on the xen_failsafe_callback path for failed
iret, which causes the stack pointer to be wrong when entering the
iret_exc error path. This can result in the kernel crashing.
In the classic kernel case, the relevant code looked a little like:
popl %eax # Error code from hypervisor
jz 5f
addl $16,%esp
jmp iret_exc # Hypervisor said iret fault
5: addl $16,%esp
# Hypervisor said segment selector fault
Here, there are two identical addls on either option of a branch which
appears to have been optimised by hoisting it above the jz, and
converting it to an lea, which leaves the flags register unaffected.
In the PVOPS case, the code looks like:
popl_cfi %eax # Error from the hypervisor
lea 16(%esp),%esp # Add $16 before choosing fault path
CFI_ADJUST_CFA_OFFSET -16
jz 5f
addl $16,%esp # Incorrectly adjust %esp again
jmp iret_exc
It is possible unprivileged userspace applications to cause this
behaviour, for example by loading an LDT code selector, then changing
the code selector to be not-present. At this point, there is a race
condition where it is possible for the hypervisor to return back to
userspace from an interrupt, fault on its own iret, and inject a
failsafe_callback into the kernel.
This bug has been present since the introduction of Xen PVOPS support
in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68e5254adb upstream.
xhci_alloc_segments_for_ring() builds a list of xhci_segments and links
the tail to head at the end (forming a ring). When it bails out for OOM
reasons half-way through, it tries to destroy its half-built list with
xhci_free_segments_for_ring(), even though it is not a ring yet. This
causes a null-pointer dereference upon hitting the last element.
Furthermore, one of its callers (xhci_ring_alloc()) mistakenly believes
the output parameters to be valid upon this kind of OOM failure, and
calls xhci_ring_free() on them. Since the (incomplete) list/ring should
already be destroyed in that case, this would lead to a use after free.
This patch fixes those issues by having xhci_alloc_segments_for_ring()
destroy its half-built, non-circular list manually and destroying the
invalid struct xhci_ring in xhci_ring_alloc() with a plain kfree().
This patch should be backported to kernels as old as 2.6.31, that
contains the commit 0ebbab3742 "USB: xhci:
Ring allocation and initialization."
A separate patch will need to be developed for kernels older than 3.4,
since the ring allocation code was refactored in that kernel.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
[bwh: Backported to 3.2:
- Adjust context
- Since segment allocation is done directly in xhci_ring_alloc(), walk
the list starting from ring->first_seg when freeing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea2447f700 upstream.
This patch is to prevent non-USB devices that have RMRRs associated with them from
being placed into the SI Domain during init. This fixes the issue where the RMRR info
for devices being placed in and out of the SI Domain gets lost.
Signed-off-by: Thomas Mingarelli <thomas.mingarelli@hp.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Reviewed-by: Donald Dutile <ddutile@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 36caff5d79 upstream.
This patch (as1631) fixes a bug that shows up when a config change
fails for a device under an xHCI controller. The controller needs to
be told to disable the endpoints that have been enabled for the new
config. The existing code does this, but before storing the
information about which endpoints were enabled! As a result, any
second attempt to install the new config is doomed to fail because
xhci-hcd will refuse to enable an endpoint that is already enabled.
The patch optimistically initializes the new endpoints' device
structures before asking the device to switch to the new config. If
the request fails then the endpoint information is already stored, so
we can use usb_hcd_alloc_bandwidth() to disable the endpoints with no
trouble. The rest of the error path is slightly more complex now; we
have to disable the new interfaces and call put_device() rather than
simply deallocating them.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Matthias Schniedermeyer <ms@citd.de>
CC: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
[not upstream as the code involved was removed in the 3.3.0 release]
Fix wii_memory_fixups() the following compile error on 3.0.y tree with
wii_defconfig on 3.0.y tree.
CC arch/powerpc/platforms/embedded6xx/wii.o
arch/powerpc/platforms/embedded6xx/wii.c: In function ‘wii_memory_fixups’:
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:88:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘phys_addr_t’ [-Werror=format]
arch/powerpc/platforms/embedded6xx/wii.c:90:2: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Werror=format]
cc1: all warnings being treated as errors
make[2]: *** [arch/powerpc/platforms/embedded6xx/wii.o] Error 1
make[1]: *** [arch/powerpc/platforms/embedded6xx] Error 2
make: *** [arch/powerpc/platforms] Error 2
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66bea92c69 upstream.
ext4_da_block_invalidatepages is missing a pagevec_init(),
which means that pvec->cold contains random garbage.
This affects whether the page goes to the front or
back of the LRU when ->cold makes it to
free_hot_cold_page()
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9acc5365d upstream.
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table. So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.
Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.
[ hpa: made a number of stylistic changes, marked arrays as static
const, and made less verbose; use "memblock=debug" for full
verbosity. ]
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed4f20943c upstream.
Converting a 64 Bit TOD format value to nanoseconds means that the value
must be divided by 4.096. In order to achieve that we multiply with 125
and divide by 512.
When used within sched_clock() this triggers an overflow after appr.
417 days. Resulting in a sched_clock() return value that is much smaller
than previously and therefore may cause all sort of weird things in
subsystems that rely on a monotonic sched_clock() behaviour.
To fix this implement a tod_to_ns() helper function which converts TOD
values without overflow and call this function from both places that
open coded the conversion: sched_clock() and kvm_s390_handle_wait().
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a71997a32 upstream.
Ensure that the aux table is properly initialized, even when optional features
are missing. Without this, the FDPIC loader did not work. This was meant to
be included in commit d5ab780305.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34ffb33e09 upstream.
The 'ni_at_a2150' module links to `cfc_write_to_buffer` in the
'comedi_fc' module, so selecting 'COMEDI_NI_AT_A2150' in the kernel config
needs to also select 'COMEDI_FC'.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c43435d772 upstream.
comedi_auto_config() associates a Comedi minor device number with an
auto-configured hardware device and comedi_auto_unconfig() disassociates
it. Currently, these use the hardware device's private data pointer to
point to some allocated storage holding the minor device number. This
is a bit of a waste of the hardware device's private data pointer,
preventing it from being used for something more useful by the low-level
comedi device drivers. For example, it would make more sense if
comedi_usb_auto_config() was passed a pointer to the struct
usb_interface instead of the struct usb_device, but this cannot be done
currently because the low-level comedi drivers already use the private
data pointer in the struct usb_interface for something more useful.
This patch stops the comedi core hijacking the hardware device's private
data pointer. Instead, comedi_auto_config() stores a pointer to the
hardware device's struct device in the struct comedi_device_file_info
associated with the minor device number, and comedi_auto_unconfig()
calls new function comedi_find_board_minor() to recover the minor device
number associated with the hardware device.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e43a028752 upstream.
When remembering the direction of a DCR transaction, we should write
to the same variable that we interpret on later when doing vcpu_run
again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6491d4d028 upstream.
The dma_pte_free_pagetable() function will only free a page table page
if it is asked to free the *entire* 2MiB range that it covers. So if a
page table page was used for one or more small mappings, it's likely to
end up still present in the page tables... but with no valid PTEs.
This was fine when we'd only be repopulating it with 4KiB PTEs anyway
but the same virtual address range can end up being reused for a
*large-page* mapping. And in that case were were trying to insert the
large page into the second-level page table, and getting a complaint
from the sanity check in __domain_mapping() because there was already a
corresponding entry. This was *relatively* harmless; it led to a memory
leak of the old page table page, but no other ill-effects.
Fix it by calling dma_pte_clear_range (hopefully redundant) and
dma_pte_free_pagetable() before setting up the new large page.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Ravi Murty <Ravi.Murty@intel.com>
Tested-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96e5d1d3ad upstream.
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit 55c1945eda upstream.
A high speed control or bulk endpoint may have bInterval set to zero,
which means it does not NAK. If bInterval is non-zero, it means the
endpoint NAKs at a rate of 2^(bInterval - 1).
The xHCI code to compute the NAK interval does not handle the special
case of zero properly. The current code unconditionally subtracts one
from bInterval and uses it as an exponent. This causes a very large
bInterval to be used, and warning messages like these will be printed:
usb 1-1: ep 0x1 - rounding interval to 32768 microframes, ep desc says 0 microframes
This may cause the xHCI host hardware to reject the Configure Endpoint
command, which means the HS device will be unusable under xHCI ports.
This patch should be backported to kernels as old as 2.6.31, that contain
commit dfa49c4ad1 "USB: xhci - fix math in
xhci_get_endpoint_interval()".
Reported-by: Vincent Pelletier <plr.vincent@gmail.com>
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07e72b95f5 upstream.
Some touchscreens have buggy firmware which claims
remote wakeup to be enabled after a reset. They nevertheless
crash if the feature is cleared by the host.
Add a check for reset resume before checking for
an enabled remote wakeup feature. On compliant
devices the feature must be cleared after a reset anyway.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77c7f072c8 upstream.
John's NEC 0.96 xHCI host controller needs a longer timeout for a warm
reset to complete. The logs show it takes 650ms to complete the warm
reset, so extend the hub reset timeout to 800ms to be on the safe side.
This commit should be backported to kernels as old as 3.2, that contain
the commit 75d7cf72ab "usbcore: refine
warm reset logic".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: John Covici <covici@ccs.covici.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d16638e3b upstream.
If we do have endpoints named like "ep-a" then bEndpointAddress is
counted internally by the gadget framework.
If we do have endpoints named like "ep-1" then bEndpointAddress is
assigned from the digit after "ep-".
If we do have both, then it is likely that after we used up the
"generic" endpoints we will use the digits and thus assign one
bEndpointAddress to multiple endpoints.
This theory can be proofed by using the completely enabled g_multi.
Without this patch, the mass storage won't enumerate and times out
because it shares endpoints with RNDIS.
This patch also adds fills up the endpoints list so we have in total
endpoints 1 to 15 in + out available while some of them are restricted
to certain types like BULK or ISO. Without this change the nokia gadget
won't load because the system does not provide enough (BULK) endpoints
but it did before ep-a - ep-f were removed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 036915a7a4 upstream.
Adding support "PSC Scanning, Magellan 800i" in cdc-acm
Very simple, but very necessary.
Suitable for all versions of the kernel > 2.6
Signed-off-by: Denis N Ladin <denladin@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8cf65dc386 upstream.
Simple fix to add support for Crucible Technologies COMET Caller ID
USB decoder - a device containing FTDI USB/Serial converter chip,
handling 1200bps CallerID messages decoded from the phone line -
adding correct USB PID is sufficient.
Tested to apply cleanly and work flawlessly against 3.6.9, 3.7.0-rc8
and 3.8.0-rc3 on both amd64 and x86 arches.
Signed-off-by: Tomasz Mloduchowski <q@qdot.me>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ec0085440 upstream.
also known as Alcatel One Touch L100V LTE
The driver description files gives these names to the vendor specific
functions on this modem:
Application1: VID_1BBB&PID_011E&MI_00
Application2: VID_1BBB&PID_011E&MI_01
Modem: VID_1BBB&PID_011E&MI_03
Ethernet: VID_1BBB&PID_011E&MI_04
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94a85b6338 upstream.
In option.c, add some new MEDIATEK PIDs support for MEDIATEK new products. This
is a MEDIATEK inc. release patch.
Signed-off-by: Quentin.Li <snowmanli88@163.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a56f992cda upstream.
This is a very old bug, but there's nothing that prevents the
timer from running while the module is being removed when we
only do del_timer() instead of del_timer_sync().
The timer should normally not be running at this point, but
it's not clearly impossible (or we could just remove this.)
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51861d4eeb upstream.
Those rn50 chip are often connected to console remoting hw and load
detection often fails with those. Just don't try to load detect and
report connect.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0729eeefd upstream.
Éric Piel reported a kernel oops in the "comedi_test" module. It was a
NULL pointer dereference within `waveform_ai_interrupt()` (actually a
timer function) that sometimes occurred when a running asynchronous
command is cancelled (either by the `COMEDI_CANCEL` ioctl or by closing
the device file).
This seems to be a race between the caller of `waveform_ai_cancel()`
which on return from that function goes and tears down the running
command, and the timer function which uses the command. In particular,
`async->cmd.chanlist` gets freed (and the pointer set to NULL) by
`do_become_nonbusy()` in "comedi_fops.c" but a previously scheduled
`waveform_ai_interrupt()` timer function will dereference that pointer
regardless, leading to the oops.
Fix it by replacing the `del_timer()` call in `waveform_ai_cancel()`
with `del_timer_sync()`.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reported-by: Éric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d3135af39 upstream.
When a low-level comedi driver auto-configures a device, a `struct
comedi_dev_file_info` is allocated (as well as a `struct
comedi_device`) by `comedi_alloc_board_minor()`. A pointer to the
hardware `struct device` is stored as a cookie in the `struct
comedi_dev_file_info`. When the low-level comedi driver
auto-unconfigures the device, `comedi_auto_unconfig()` uses the cookie
to find the `struct comedi_dev_file_info` so it can detach the comedi
device from the driver, clean it up and free it.
A problem arises if the user manually unconfigures and reconfigures the
comedi device using the `COMEDI_DEVCONFIG` ioctl so that is no longer
associated with the original hardware device. The problem is that the
cookie is not cleared, so that a call to `comedi_auto_unconfig()` from
the low-level driver will still find it, detach it, clean it up and free
it.
Stop this problem occurring by always clearing the `hardware_device`
cookie in the `struct comedi_dev_file_info` whenever the
`COMEDI_DEVCONFIG` ioctl call is successful.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41b645c862 upstream.
Cold reset on the pxa27x currently fails and
pxa2xx_ac97_try_cold_reset: cold reset timeout (GSR=0x44)
appears in the kernel log. Through trial-and-error (the pxa270 developer's
manual is mostly incoherent on the topic of ac97 reset), I got cold reset to
complete by setting the WARM_RST bit in the GCR register (and later noticed that
pxa3xx does this for cold reset as well). Also, a timeout loop is needed to
wait for the reset to complete.
Tested on a palm treo 680 machine.
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Acked-by: Igor Grinberg <grinberg@compulab.co.il>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 115c9b8192 upstream.
Implement a new netlink attribute type IFLA_EXT_MASK. The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.
This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.
Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask. If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.0:
- Adjust context
- Drop the change in do_setlink() that reverts commit f18da14565
('net: RTNETLINK adjusting values of min_ifinfo_dump_size'), which
was never applied here]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c7ac8679be upstream.
The message size allocated for rtnl ifinfo dumps was limited to
a single page. This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.
Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbf0a1427 upstream.
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f1d06c34f upstream.
On COW, a new hugepage is allocated and charged to the memcg. If the
system is oom or the charge to the memcg fails, however, the fault
handler will return VM_FAULT_OOM which results in an oom kill.
Instead, it's possible to fallback to splitting the hugepage so that the
COW results only in an order-0 page being allocated and charged to the
memcg which has a higher liklihood to succeed. This is expensive
because the hugepage must be split in the page fault handler, but it is
much better than unnecessarily oom killing a process.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb719c59bd upstream.
Incrementing lenExtents even while writing to a hole is bad
for performance as calls to udf_discard_prealloc and
udf_truncate_tail_extent would not return from start if
isize != lenExtents
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a41409c51 upstream, but doesn't
apply, so this version is different for older kernels than 3.7.x
blk_alloc_queue has already done a bdi_init, so do not bdi_init
again in aoeblk_gdalloc. The extra call causes list corruption
in the per-CPU backing dev info stats lists.
Affected users see console WARNINGs about list_del corruption on
percpu_counter_destroy when doing "rmmod aoe" or "aoeflush -a"
when AoE targets have been detected and initialized by the
system.
The patch below applies to v3.6.11, with its v47 aoe driver. It
is expected to apply to all currently maintained stable kernels
except 3.7.y. A related but different fix has been posted for
3.7.y.
References:
RedHat bugzilla ticket with original report
https://bugzilla.redhat.com/show_bug.cgi?id=853064
LKML discussion of bug and fix
http://thread.gmane.org/gmane.linux.kernel/1416336/focus=1416497
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 721e3eba21 upstream.
Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d096ad0f79 upstream.
When a journal-less ext4 filesystem is mounted on a read-only block
device (blockdev --setro will do), each remount (for other, unrelated,
flags, like suid=>nosuid etc) results in a series of scary messages
from kernel telling about I/O errors on the device.
This is becauese of the following code ext4_remount():
if (sbi->s_journal == NULL)
ext4_commit_super(sb, 1);
at the end of remount procedure, which forces writing (flushing) of
a superblock regardless whenever it is dirty or not, if the filesystem
is readonly or not, and whenever the device itself is readonly or not.
We only need call ext4_commit_super when the file system had been
previously mounted read/write.
Thanks to Eric Sandeen for help in diagnosing this issue.
Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7961c7fa4 upstream.
The following race is possible between start_this_handle() and someone
calling jbd2_journal_flush().
Process A Process B
start_this_handle().
if (journal->j_barrier_count) # false
if (!journal->j_running_transaction) { #true
read_unlock(&journal->j_state_lock);
jbd2_journal_lock_updates()
jbd2_journal_flush()
write_lock(&journal->j_state_lock);
if (journal->j_running_transaction) {
# false
... wait for committing trans ...
write_unlock(&journal->j_state_lock);
...
write_lock(&journal->j_state_lock);
if (!journal->j_running_transaction) { # true
jbd2_get_transaction(journal, new_transaction);
write_unlock(&journal->j_state_lock);
goto repeat; # eventually blocks on j_barrier_count > 0
...
J_ASSERT(!journal->j_running_transaction);
# fails
We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
in exclusive mode.
Reported-by: yjwsignal@empal.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c1ecba8d8 upstream.
The VDCTRL4 register does not provide the MXS SET/CLR/TOGGLE feature.
The write in mxsfb_disable_controller() sets the data_cnt for the LCD
DMA to 0 which obviously means the max. count for the LCD DMA and
leads to overwriting arbitrary memory when the display is unblanked.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Juergen Beisert <jbe@pengutronix.de>
Tested-by: Lauri Hintsala <lauri.hintsala@bluegiga.net>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70e227790d upstream.
The timer appears to run too fast/race on 64 bit systems.
Using msecs_to_jiffies seems to cause a deadlock on 64 bit.
A calculation of (MSecond * HZ) / 1000 appears to run satisfactory.
Change BSSIDInfoCount to u32.
After this patch the driver can be successfully connect on little endian 64/32 bit systems.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7730492855 upstream.
After this patch all BYTE/WORD/DWORD types can be replaced with the appropriate u sizes.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab1dd99631 upstream.
Calling RFbSetPower with uCH zero value will cause out of bound array reference.
This causes 64 bit kernels to oops on boot.
Note: Driver does not function on 64 bit kernels and should be
blacklisted on them.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e910d7ebec upstream.
Abort dm ioctl processing if userspace changes the data_size parameter
after we validated it but before we finished copying the data buffer
from userspace.
The dm ioctl parameters are processed in the following sequence:
1. ctl_ioctl() calls copy_params();
2. copy_params() makes a first copy of the fixed-sized portion of the
userspace parameters into the local variable "tmp";
3. copy_params() then validates tmp.data_size and allocates a new
structure big enough to hold the complete data and copies the whole
userspace buffer there;
4. ctl_ioctl() reads userspace data the second time and copies the whole
buffer into the pointer "param";
5. ctl_ioctl() reads param->data_size without any validation and stores it
in the variable "input_param_size";
6. "input_param_size" is further used as the authoritative size of the
kernel buffer.
The problem is that userspace code could change the contents of user
memory between steps 2 and 4. In particular, the data_size parameter
can be changed to an invalid value after the kernel has validated it.
This lets userspace force the kernel to access invalid kernel memory.
The fix is to ensure that the size has not changed at step 4.
This patch shouldn't have a security impact because CAP_SYS_ADMIN is
required to run this code, but it should be fixed anyway.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9366c1ba13 upstream.
The function rb_check_pages() was added to make sure the ring buffer's
pages were sane. This check is done when the ring buffer size is modified
as well as when the iterator is released (closing the "trace" file),
as that was considered a non fast path and a good place to do a sanity
check.
The problem is that the check does not have any locks around it.
If one process were to read the trace file, and another were to read
the raw binary file, the check could happen while the reader is reading
the file.
The issues with this is that the check requires to clear the HEAD page
before doing the full check and it restores it afterward. But readers
require the HEAD page to exist before it can read the buffer, otherwise
it gives a nasty warning and disables the buffer.
By adding the reader lock around the check, this keeps the race from
happening.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cbba75a56 upstream.
Users of jffs2_do_reserve_space() expect they still held
erase_completion_lock after call to it. But there is a path
where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
The patch fixes it.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6567ed140 upstream.
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24ec19b0ae upstream.
In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.
This patch fixes that.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9fbb62eb6 upstream.
mfd_remove_devices would iterate over all devices sharing a parent with
an mfd device regardless of whether they were allocated by the mfd core
or not. This especially caused problems when the device structure was
not contained within a platform_device, because to_platform_device is
used on each device pointer.
This patch defines a device_type for mfd devices and checks this is
present from mfd_remove_devices_fn before processing the device.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Tested-by: Peter Tyser <ptyser@xes-inc.com>
Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f4ad44b26 upstream.
The lockdep warning below is in theory correct but it will be in really weird
rare situation that ends up that deadlock since the tcm fc session is hashed
based the rport id. Nonetheless, the complaining below is about rcu callback
that does the transport_deregister_session() is happening in softirq, where
transport_register_session() that happens earlier is not. This triggers the
lockdep warning below. So, just fix this to make lockdep happy by disabling
the soft irq before calling transport_register_session() in ft_prli.
BTW, this was found in FCoE VN2VN over two VMs, couple of create and destroy
would get this triggered.
v1: was enforcing register to be in softirq context which was not righ. See,
http://www.spinics.net/lists/target-devel/msg03614.html
v2: following comments from Roland&Nick (thanks), it seems we don't have to
do transport_deregister_session() in rcu callback, so move it into ft_sess_free()
but still do kfree() of the corresponding ft_sess struct in rcu callback to
make sure the ft_sess is not freed till the rcu callback.
...
[ 1328.370592] scsi2 : FCoE Driver
[ 1328.383429] fcoe: No FDMI support.
[ 1328.384509] host2: libfc: Link up on port (000000)
[ 1328.934229] host2: Assigned Port ID 00a292
[ 1357.232132] host2: rport 00a393: Remove port
[ 1357.232568] host2: rport 00a393: Port sending LOGO from Ready state
[ 1357.233692] host2: rport 00a393: Delete port
[ 1357.234472] host2: rport 00a393: work event 3
[ 1357.234969] host2: rport 00a393: callback ev 3
[ 1357.235979] host2: rport 00a393: Received a LOGO response closed
[ 1357.236706] host2: rport 00a393: work delete
[ 1357.237481]
[ 1357.237631] =================================
[ 1357.238064] [ INFO: inconsistent lock state ]
[ 1357.238450] 3.7.0-rc7-yikvm+ #3 Tainted: G O
[ 1357.238450] ---------------------------------
[ 1357.238450] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1357.238450] ksoftirqd/0/3 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 1357.238450] (&(&se_tpg->session_lock)->rlock){+.?...}, at: [<ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] {SOFTIRQ-ON-W} state was registered at:
[ 1357.238450] [<ffffffff810834f5>] mark_held_locks+0x6d/0x95
[ 1357.238450] [<ffffffff8108364a>] trace_hardirqs_on_caller+0x12d/0x197
[ 1357.238450] [<ffffffff810836c1>] trace_hardirqs_on+0xd/0xf
[ 1357.238450] [<ffffffff8149caba>] _raw_spin_unlock_irq+0x2d/0x45
[ 1357.238450] [<ffffffffa01e8d10>] __transport_register_session+0xb8/0x122 [target_core_mod]
[ 1357.238450] [<ffffffffa01e8dbe>] transport_register_session+0x44/0x5a [target_core_mod]
[ 1357.238450] [<ffffffffa018e32c>] ft_prli+0x1e3/0x275 [tcm_fc]
[ 1357.238450] [<ffffffffa0160e8d>] fc_rport_recv_req+0x95e/0xdc5 [libfc]
[ 1357.238450] [<ffffffffa015be88>] fc_lport_recv_els_req+0xc4/0xd5 [libfc]
[ 1357.238450] [<ffffffffa015c778>] fc_lport_recv_req+0x12f/0x18f [libfc]
[ 1357.238450] [<ffffffffa015a6d7>] fc_exch_recv+0x8ba/0x981 [libfc]
[ 1357.238450] [<ffffffffa0176d7a>] fcoe_percpu_receive_thread+0x47a/0x4e2 [fcoe]
[ 1357.238450] [<ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] irq event stamp: 275411
[ 1357.238450] hardirqs last enabled at (275410): [<ffffffff810bb6a0>] rcu_process_callbacks+0x229/0x42a
[ 1357.238450] hardirqs last disabled at (275411): [<ffffffff8149c2f7>] _raw_spin_lock_irqsave+0x22/0x8e
[ 1357.238450] softirqs last enabled at (275394): [<ffffffff8103d669>] __do_softirq+0x246/0x26f
[ 1357.238450] softirqs last disabled at (275399): [<ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450]
[ 1357.238450] other info that might help us debug this:
[ 1357.238450] Possible unsafe locking scenario:
[ 1357.238450]
[ 1357.238450] CPU0
[ 1357.238450] ----
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450] <Interrupt>
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450]
[ 1357.238450] *** DEADLOCK ***
[ 1357.238450]
[ 1357.238450] no locks held by ksoftirqd/0/3.
[ 1357.238450]
[ 1357.238450] stack backtrace:
[ 1357.238450] Pid: 3, comm: ksoftirqd/0 Tainted: G O 3.7.0-rc7-yikvm+ #3
[ 1357.238450] Call Trace:
[ 1357.238450] [<ffffffff8149399a>] print_usage_bug+0x1f5/0x206
[ 1357.238450] [<ffffffff8100da59>] ? save_stack_trace+0x2c/0x49
[ 1357.238450] [<ffffffff81082aae>] ? print_irq_inversion_bug.part.14+0x1ae/0x1ae
[ 1357.238450] [<ffffffff81083336>] mark_lock+0x106/0x258
[ 1357.238450] [<ffffffff81084e34>] __lock_acquire+0x2e7/0xe53
[ 1357.238450] [<ffffffff8102903d>] ? pvclock_clocksource_read+0x48/0xb4
[ 1357.238450] [<ffffffff810ba6a3>] ? rcu_process_gp_end+0xc0/0xc9
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff81085ef1>] lock_acquire+0x119/0x143
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff8149c329>] _raw_spin_lock_irqsave+0x54/0x8e
[ 1357.238450] [<ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<ffffffff810bb6a0>] ? rcu_process_callbacks+0x229/0x42a
[ 1357.238450] [<ffffffffa018ddc5>] ft_sess_rcu_free+0x17/0x24 [tcm_fc]
[ 1357.238450] [<ffffffffa018ddae>] ? ft_sess_free+0x1b/0x1b [tcm_fc]
[ 1357.238450] [<ffffffff810bb6d7>] rcu_process_callbacks+0x260/0x42a
[ 1357.238450] [<ffffffff8103d55d>] __do_softirq+0x13a/0x26f
[ 1357.238450] [<ffffffff8149b34e>] ? __schedule+0x65f/0x68e
[ 1357.238450] [<ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450] [<ffffffff8105c83c>] smpboot_thread_fn+0x1a5/0x1aa
[ 1357.238450] [<ffffffff8105c697>] ? smpboot_unregister_percpu_thread+0x47/0x47
[ 1357.238450] [<ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<ffffffff8149b49d>] ? wait_for_common+0xbb/0x10a
[ 1357.238450] [<ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1357.238450] [<ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] [<ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1417.440099] rport-2:0-0: blocked FC remote port time out: removing rport
Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: Open-FCoE <devel@open-fcoe.org>
Cc: Nicholas A. Bellinger <nab@risingtidesystems.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3100d49d3c upstream.
sata_promise's pdc_hard_reset_port() needs to serialize because it
flips a port-specific bit in controller register that's shared by
all ports. The code takes the ata host lock for this, but that's
broken because an interrupt may arrive on our irq during the hard
reset sequence, and that too will take the ata host lock. With
lockdep enabled a big nasty warning is seen.
Fixed by adding private state to the ata host structure, containing
a second lock used only for serializing the hard reset sequences.
This eliminated the lockdep warnings both on my test rig and on
the original reporter's machine.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Adko Branil <adkobranil@yahoo.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a394aac885 upstream.
When the qla2xxx driver loses access to multiple, remote ports, there is a race
condition which can occur which will keep the request stuck on a scsi request
queue indefinitely.
This bad state occurred do to a race condition with how the FCPORT_UPDATE_NEEDED
bit is set in qla2x00_schedule_rport_del(), and how it is cleared in
qla2x00_do_dpc(). The problem port has its drport pointer set, but it has never
been processed by the driver to inform the fc transport that the port has been
lost. qla2x00_schedule_rport_del() sets drport, and then sets the
FCPORT_UPDATE_NEEDED bit. In qla2x00_do_dpc(), the port lists are walked and
any drport pointer is handled and the fc transport informed of the port loss,
then the FCPORT_UPDATE_NEEDED bit is cleared. This leaves a race where the
dpc thread is processing one port removal, another port removal is marked
with a call to qla2x00_schedule_rport_del(), and the dpc thread clears the
bit for both removals, even though only the first removal was actually
handled. Until another event occurs to set FCPORT_UPDATE_NEEDED, the later
port removal is never finished and qla2xxx stays in a bad state which causes
requests to become stuck on request queues.
This patch updates the driver to test and clear FCPORT_UPDATE_NEEDED
atomically. This ensures the port state changes are processed and not lost.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit beecadea1b upstream.
The macro bit(n) is defined as ((u32)1 << n), and thus it doesn't work
with n >= 32, such as in mvs_94xx_assign_reg_set():
if (i >= 32) {
mvi->sata_reg_set |= bit(i);
...
}
The shift ((u32)1 << n) with n >= 32 also leads to undefined behavior.
The result varies depending on the architecture.
This patch changes bit(n) to do a 64-bit shift. It also simplifies
mv_ffc64() using __ffs64(), since invoking ffz() with ~0 is undefined.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d23734209 upstream.
This patch fixes both the transmit and receive portion of sending
fragmented mutlicast and broadcast packets.
The transmit section was broken because the offset for INTFRAG and
LASTFRAG packets were just miscalculated by IEEE1394_GASP_HDR_SIZE (which
was reserved with skb_push() in fwnet_send_packet).
The receive section was broken because in fwnet_incoming_packet is a call
to fwnet_peer_find_by_node_id(). Called with generation == -1 it will
not find a peer and the partial datagrams are associated to a peer.
[Stefan R: The fix to use context->card->generation is not perfect.
It relies on the IR tasklet which processes packets from the prior bus
generation to run before the self-ID-complete worklet which sets the
current card generation. Alas, there is no simple way of a race-free
implementation. Let's do it this way for now.]
Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c170e0686 upstream.
This reverts commit f74b9d365d.
Turns out reverting commit a240dc7b3c
"ath9k_hw: Updated AR9003 tx gain table for 5GHz" was not enough to
bring the tx power back to normal levels on devices like the
Buffalo WZR-HP-G450H, this one needs to be reverted as well.
This revert improves tx power by ~10 db on that device
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: rmanohar@qca.qualcomm.com
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c060f943d0 upstream.
The current calculation in pfn_to_bitidx assumes that (pfn -
zone->zone_start_pfn) >> pageblock_order will return the same bit for
all pfn in a pageblock. If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.
Consider the following with pageblock order = 10, zone start 2MB:
pfn | pfn - zone start | (pfn - zone start) >> page block order
----------------------------------------------------------------
0x26000 | 0x25e00 | 0x97
0x26100 | 0x25f00 | 0x97
0x26200 | 0x26000 | 0x98
0x26300 | 0x26100 | 0x98
This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block. Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.
For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7964c06d66 upstream.
when run the folloing command under shell, it will return error
sh/$ echo 1 > /proc/sys/vm/compact_memory
sh/$ sh: write error: Bad address
After strace, I found the following log:
...
write(1, "1\n", 2) = 3
write(1, "", 4294967295) = -1 EFAULT (Bad address)
write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
) = 31
This tells system return 3(COMPACT_COMPLETE) after write data to
compact_memory.
The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
from sysctl_compaction_handler after compaction_nodes finished.
Signed-off-by: Jason Liu <r64343@freescale.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d99e79ec55 upstream.
The check to whom a device is reserved is done by checking the path
state of the affected channel paths. If it turns out that one path is
flagged as reserved by someone else the whole device is marked as such.
However the meaning of the RESVD_ELSE bit is that the addressed device
is reserved to a different pathgroup (and not reserved to a different
LPAR). If we do this test on a path which is currently not a member of
the pathgroup we could erroneously mark the device as reserved to
someone else.
To fix this collect the reserved state for all potential members of the
pathgroup and only mark the device as reserved if all of those potential
members have the RESVD_ELSE bit set.
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ce73ec6db4 upstream.
The locking in update_vsyscall_tz() is not only unnecessary because the vdso
code copies the data unproteced in __kernel_gettimeofday() but also
introduces a hard to reproduce race condition between update_vsyscall()
and update_vsyscall_tz(), which causes user space process to loop
forever in vdso code.
The following patch removes the locking from update_vsyscall_tz().
Locking is not only unnecessary because the vdso code copies the data
unprotected in __kernel_gettimeofday() but also erroneous because updating
the tb_update_count is not atomic and introduces a hard to reproduce race
condition between update_vsyscall() and update_vsyscall_tz(), which further
causes user space process to loop forever in vdso code.
The below scenario describes the race condition,
x==0 Boot CPU other CPU
proc_P: x==0
timer interrupt
update_vsyscall
x==1 x++;sync settimeofday
update_vsyscall_tz
x==2 x++;sync
x==3 sync;x++
sync;x++
proc_P: x==3 (loops until x becomes even)
Because the ++ operator would be implemented as three instructions and not
atomic on powerpc.
A similar change was made for x86 in commit 6c260d5863
("x86: vdso: Remove bogus locking in update_vsyscall_tz")
Signed-off-by: Shan Hai <shan.hai@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 11ee7e99f3 upstream.
If we build a kernel with CONFIG_RELOCATABLE=y CONFIG_CRASH_DUMP=n,
the kernel fails when we run at a non zero offset. It turns out
we were incorrectly wrapping some of the relocatable kernel code
with CONFIG_CRASH_DUMP.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab48b03ec9 upstream.
If the restart timer is running due to BUS-OFF and the device is
disconnected an dev_put will decrease the usage counter to -1 thus
blocking the interface removal, resulting in the following dmesg
lines repeating every 10s:
can: notifier: receive list not found for dev can0
can: notifier: receive list not found for dev can0
can: notifier: receive list not found for dev can0
unregister_netdevice: waiting for can0 to become free. Usage count = -1
Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53a59fc67f upstream.
Since commit e303297e6c ("mm: extended batches for generic
mmu_gather") we are batching pages to be freed until either
tlb_next_batch cannot allocate a new batch or we are done.
This works just fine most of the time but we can get in troubles with
non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
on large machines where too aggressive batching might lead to soft
lockups during process exit path (exit_mmap) because there are no
scheduling points down the free_pages_and_swap_cache path and so the
freeing can take long enough to trigger the soft lockup.
The lockup is harmless except when the system is setup to panic on
softlockup which is not that unusual.
The simplest way to work around this issue is to limit the maximum
number of batches in a single mmu_gather. 10k of collected pages should
be safe to prevent from soft lockups (we would have 2ms for one) even if
they are all freed without an explicit scheduling point.
This patch doesn't add any new explicit scheduling points because it
relies on zap_pmd_range during page tables zapping which calls
cond_resched per PMD.
The following lockup has been reported for 3.0 kernel with a huge
process (in order of hundreds gigs but I do know any more details).
BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
Supported: Yes
CPU 56
Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
RIP: 0010: _raw_spin_unlock_irqrestore+0x8/0x10
RSP: 0018:ffff883ec1037af0 EFLAGS: 00000206
RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
FS: 00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
Call Trace:
release_pages+0xc5/0x260
free_pages_and_swap_cache+0x9d/0xc0
tlb_flush_mmu+0x5c/0x80
tlb_finish_mmu+0xe/0x50
exit_mmap+0xbd/0x120
mmput+0x49/0x120
exit_mm+0x122/0x160
do_exit+0x17a/0x430
do_group_exit+0x3d/0xb0
get_signal_to_deliver+0x247/0x480
do_signal+0x71/0x1b0
do_notify_resume+0x98/0xb0
int_signal+0x12/0x17
DWARF2 unwinder stuck at int_signal+0x12/0x17
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f90b68309 upstream.
tm_mon is 0..11, whereas vt8500 expects 1..12 for the month field,
causing invalid date errors for January, and causing the day field to
roll over incorrectly.
The century flag is only handled in vt8500_rtc_read_time, but not set in
vt8500_rtc_set_time. This patch corrects the behaviour of the century
flag.
Signed-off-by: Edgar Toernig <froese@gmx.de>
Signed-off-by: Tony Prisk <linux@prisktech.co.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c24bf9b4cc upstream.
The inb/outb macros for CRIS are broken from a number of points of view,
missing () around parameters and they have an unprotected if statement
in them. This was breaking the compile of IPMI on CRIS and thus I was
being annoyed by build regressions, so I fixed them.
Plus I don't think they would have worked at all, since the data values
were missing "&" and the outsl had a "3" instead of a "4" for the size.
From what I can tell, this stuff is not used at all, so this can't be
any more broken than it was before, anyway.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Mikael Starvik <starvik@axis.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cae49ede00 upstream.
We weren't clearing card->tx_skb[port] when processing the TX done interrupt.
If there wasn't another skb ready to transmit immediately, this led to a
double-free because we'd free it *again* next time we did have a packet to
send.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7bf9b7bef8 upstream.
find_vma() is *not* safe when somebody else is removing vmas. Not just
the return value might get bogus just as you are getting it (this instance
doesn't try to dereference the resulting vma), the search itself can get
buggered in rather spectacular ways. IOW, ->mmap_sem really, really is
not optional here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 864aa04cd0 upstream.
When updating the page protection map after calculating the user_pgprot
value, the base protection map is temporarily stored in an unsigned long
type, causing truncation of the protection bits when LPAE is enabled.
This effectively means that calls to mprotect() will corrupt the upper
page attributes, clearing the XN bit unconditionally.
This patch uses pteval_t to store the intermediate protection values,
preserving the upper bits for 64-bit descriptors.
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 354e4aa391 ]
RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation]
All TCP stacks MAY implement the following mitigation. TCP stacks
that implement this mitigation MUST add an additional input check to
any incoming segment. The ACK value is considered acceptable only if
it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
SND.NXT). All incoming segments whose ACK value doesn't satisfy the
above condition MUST be discarded and an ACK sent back.
Move tcp_send_challenge_ack() before tcp_ack() to avoid a forward
declaration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bd090dfc63 ]
We added support for RFC 5961 in latest kernels but TCP fails
to perform exhaustive check of ACK sequence.
We can update our view of peer tsval from a frame that is
later discarded by tcp_ack()
This makes timestamps enabled sessions vulnerable to injection of
a high tsval : peers start an ACK storm, since the victim
sends a dupack each time it receives an ACK from the other peer.
As tcp_validate_incoming() is called before tcp_ack(), we should
not peform tcp_replace_ts_recent() from it, and let callers do it
at the right time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Romain Francoise <romain@orebokech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e371589917 ]
Followup of commit 0c24604b68 (tcp: implement RFC 5961 4.2)
As reported by Vijay Subramanian, we should send a challenge ACK
instead of a dup ack if a SYN flag is set on a packet received out of
window.
This permits the ratelimiting to work as intended, and to increase
correct SNMP counters.
Suggested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c24604b68 ]
Implement the RFC 5691 mitigation against Blind
Reset attack using SYN bit.
Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
incoming packet, instead of resetting the session.
Add a new SNMP counter to count number of challenge acks sent
in response to SYN packets.
(netstat -s | grep TCPSYNChallenge)
Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
because of a SYN flag.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 282f23c6ee ]
Implement the RFC 5691 mitigation against Blind
Reset attack using RST bit.
Idea is to validate incoming RST sequence,
to match RCV.NXT value, instead of previouly accepted
window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)
If sequence is in window but not an exact match, send
a "challenge ACK", so that the other part can resend an
RST with the appropriate sequence.
Add a new sysctl, tcp_challenge_ack_limit, to limit
number of challenge ACK sent per second.
Add a new SNMP counter to count number of challenge acks sent.
(netstat -s | grep TCPChallengeACK)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kiran Kumar Kella <kkiran@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45959ee7aa upstream.
When gcc inlines a function, it does not mark it with the mcount
prologue, which in turn means that inlined functions are not traced
by the function tracer. But if CONFIG_OPTIMIZE_INLINING is set, then
gcc is allowed not to inline a function that is marked inline.
Depending on the options and the compiler, a function may or may
not be traced by the function tracer, depending on whether gcc
decides to inline a function or not. This has caused several
problems in the pass becaues gcc is not always consistent with
what it decides to inline between different gcc versions.
Some places should not be traced (like paravirt native_* functions)
and these are mostly marked as inline. When gcc decides not to
inline the function, and if that function should not be traced, then
the ftrace function tracer will suddenly break when it use to work
fine. This becomes even harder to debug when different versions of
gcc will not inline that function, making the same kernel and config
work for some gcc versions and not work for others.
By making all functions marked inline to not be traced will remove
the ambiguity that gcc adds when it comes to tracing functions marked
inline. All gcc versions will be consistent with what functions are
traced and having volatile working code will be removed.
Note, only the inline macro when CONFIG_OPTIMIZE_INLINING is set needs
to have notrace added, as the attribute __always_inline will force
the function to be inlined and then not traced.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbf0a1427 upstream.
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 04aa530ec0 upstream.
Sankara reported that the genirq core code fails to adjust the
affinity of an interrupt thread in several cases:
1) On request/setup_irq() the call to setup_affinity() happens before
the new action is registered, so the new thread is not notified.
2) For secondary shared interrupts nothing notifies the new thread to
change its affinity.
3) Interrupts which have the IRQ_NO_BALANCE flag set are not moving
the thread either.
Fix this by setting the thread affinity flag right on thread creation
time. This ensures that under all circumstances the thread moves to
the right place. Requires a check in irq_thread_check_affinity for an
existing affinity mask (CONFIG_CPU_MASK_OFFSTACK=y)
Reported-and-tested-by: Sankara Muthukrishnan <sankara.m@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1209041738200.2754@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e25fbe380c upstream.
The following null pointer check is broken.
*option = match_strdup(args);
return !option;
The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.
The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
cut-and-paste code.").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5f50b0c29 upstream.
If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f018458b3 upstream.
It is almost always wrong for NFS to call drop_nlink() after removing a
file. What we really want is to mark the inode's attributes for
revalidation, and we want to ensure that the VFS drops it if we're
reasonably sure that this is the final unlink().
Do the former using the usual cache validity flags, and the latter
by testing if inode->i_nlink == 1, and clearing it in that case.
This also fixes the following warning reported by Neil Brown and
Jeff Layton (among others).
[634155.004438] WARNING:
at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel
snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace:
[634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
[634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f
[634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
[634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40
[634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
[634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230
[634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230
[634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90
[634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100
[634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
[634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f259613a1e upstream.
In rare circumstances, nfs_clone_server() of a v2 or v3 server can get
an error between setting server->destory (to nfs_destroy_server), and
calling nfs_start_lockd (which will set server->nlm_host).
If this happens, nfs_clone_server will call nfs_free_server which
will call nfs_destroy_server and thence nlmclnt_done(NULL). This
causes the NULL to be dereferenced.
So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL.
The other guards there are irrelevant as nlm_host can only be non-NULL
if one of these flags are set - so remove those tests. (Thanks to Trond
for this suggestion).
This is suitable for any stable kernel since 2.6.25.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4f5f64cf0c upstream.
At one point acpi_device_set_id() checks if acpi_device_hid(device)
returns NULL, but that never happens, so system bus devices with an
empty list of PNP IDs are given the dummy HID ("device") instead of
the "system bus HID" ("LNXSYBUS"). Fix the code to use the right
check.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5f165418c upstream.
The commit [88a8516a: ALSA: usbaudio: implement USB autosuspend] added
the support of autopm for USB MIDI output, but it didn't take the MIDI
input into account.
This patch adds the following for fixing the autopm:
- Manage the URB start at the first MIDI input stream open, instead of
the time of instance creation
- Move autopm code to the common substream_open()
- Make snd_usbmidi_input_start/_stop() more robust and add the running
state check
Reviewd-by: Clemens Ladisch <clemens@ladisch.de>
Tested-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2a07f40db upstream.
Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
mempolicy testing. Very nasty. Reading /proc/mounts, /proc/pid/mounts
or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
worse. "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
when commit e17f74af35 "mempolicy: don't call mpol_set_nodemask() when
no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
With slab poisoning, you can then rely on mpol_to_str() to set the bit
for node 0x6b6b, probably in the next page above the caller's stack.
mpol_parse_str() is only called from shmem_parse_options(): no_context
is always true, so call it unused for now, and remove !no_context code.
Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
expect. Then mpol_to_str() can ignore its no_context argument also,
the mpol being appropriately initialized whether contextualized or not.
Rename its no_context unused too, and let subsequent patch remove them
(that's not needed for stable backporting, which would involve rejects).
I don't understand why MPOL_LOCAL is described as a pseudo-policy:
it's a reasonable policy which suffers from a confusing implementation
in terms of MPOL_PREFERRED with MPOL_F_LOCAL. I believe this would be
much more robust if MPOL_LOCAL were recognized in switch statements
throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
empty) nodes mask like everyone else, instead of its preferred_node
variant (I presume an optimization from the days before MPOL_LOCAL).
But that would take me too long to get right and fully tested.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad4b3fb7ff upstream.
Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and
(PageHead) is true, for tail pages. If this is indeed the intended
behavior, which I doubt because it breaks cache cleaning on some ARM
systems, then the nomenclature is highly problematic.
This patch makes sure PageHead is only true for head pages and PageTail
is only true for tail pages, and neither is true for non-compound pages.
[ This buglet seems ancient - seems to have been introduced back in Apr
2008 in commit 6a1e7f777f: "pageflags: convert to the use of new
macros". And the reason nobody noticed is because the PageHead()
tests are almost all about just sanity-checking, and only used on
pages that are actual page heads. The fact that the old code returned
true for tail pages too was thus not really noticeable. - Linus ]
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <Will.Deacon@arm.com>
Cc: Steve Capper <Steve.Capper@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b92b1b89a3 upstream.
Virtio devices may attempt to add descriptors to a virtqueue from atomic
context using GFP_ATOMIC allocation. This is problematic because such
allocations can fall outside of the lowmem mapping, causing virt_to_phys
to report bogus physical addresses which are subsequently passed to
userspace via the buffers for the virtual device.
This patch masks out __GFP_HIGH and __GFP_HIGHMEM from the requested
flags when allocating descriptors for a virtqueue. If an atomic
allocation is requested and later fails, we will return -ENOSPC which
will be handled by the driver.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b66c598401 upstream.
If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.
Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.
After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.
This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.
For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cdc87c5a30 upstream.
TEST_ALPHA() is broken and always returns 0.
[akpm@linux-foundation.org: return false for '@' as well, per Bjorn]
Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit be364c8c0f ]
Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:
#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>
int main(void)
{
int fd;
struct sockaddr_in sa;
fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);
sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));
return 0;
}
As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e196c0e579 ]
Race between bonding_store_slaves_active() and slave manipulation
functions. The bond_for_each_slave use in bonding_store_slaves_active()
is not protected by any synchronization mechanism.
NULL pointer dereference is easy to reach.
Fixed by acquiring the bond->lock for the slave walk.
v2: Make description text < 75 columns
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00ca0de02f upstream.
When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.
If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.
This issue was found with the following testcase:
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b6ca2a6fe upstream.
Writing to dscr_default in sysfs doesn't actually change the DSCR -
we rely on a context switch on each CPU to do the work. There is no
guarantee we will get a context switch in a reasonable amount of time
so fire off an IPI to force an immediate change.
This issue was found with the following test case:
http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50ce5c0683 upstream.
This patch (as1636) is a partial workaround for a hardware bug
affecting OHCI controllers by NVIDIA at least, maybe others too. When
the controller retires a Transfer Descriptor, it is supposed to add
the TD onto the Done Queue. But sometimes this doesn't happen, with
the result that ohci-hcd never realizes the corresponding transfer has
finished. Symptoms can vary; a typical result is that USB audio stops
working after a while.
The patch works around the problem by recognizing that TDs are always
processed in order. Therefore, if a later TD is found on the Done
Queue than all the earlier TDs for the same endpoint must be finished
as well.
Unfortunately this won't solve the problem in cases where the missing
TD is the last one in the endpoint's queue. A complete fix would
require a signficant amount of change to the driver.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6b5e88c0e upstream.
During resume from system suspend the 'data' field of
struct pnp_dev in pnpacpi_set_resources() may be a stale pointer,
due to removal of the associated ACPI device node object in the
previous suspend-resume cycle. This happens, for example, if a
dockable machine is booted in the docking station and then suspended
and resumed and suspended again. If that happens,
pnpacpi_build_resource_template() called from pnpacpi_set_resources()
attempts to use that pointer and crashes.
However, pnpacpi_set_resources() actually checks the device's ACPI
handle, attempts to find the ACPI device node object attached to it
and returns an error code if that fails, so in fact it knows what the
correct value of dev->data should be. Use this observation to update
dev->data with the correct value if necessary and dump a call trace
if that's the case (once).
We still need to fix the root cause of this issue, but preventing
systems from crashing because of it is an improvement too.
Reported-and-tested-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=51071
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4000e62615 upstream.
Add a quirk to correctly report battery capacity on 2010 and 2011
Lenovo Thinkpad models.
The affected models that I tested (x201, t410, t410s, and x220)
exhibit a problem where, when battery capacity reporting unit is mAh,
the values being reported are wrong. Pre-2010 and 2012 models appear
to always report in mWh and are thus unaffected. Also, in mid-2012
Lenovo issued a BIOS update for the 2011 models that fixes the issue
(tested on x220 with a post-1.29 BIOS). No such update is available
for the 2010 models, so those still need this patch.
Problem description: for some reason, the affected Thinkpads switch
the reporting unit between mAh and mWh; generally, mAh is used when a
laptop is plugged in and mWh when it's unplugged, although a
suspend/resume or rmmod/modprobe is needed for the switch to take
effect. The values reported in mAh are *always* wrong. This does
not appear to be a kernel regression; I believe that the values were
never reported correctly. I tested back to kernel 2.6.34, with
multiple machines and BIOS versions.
Simply plugging a laptop into mains before turning it on is enough to
reproduce the problem. Here's a sample /proc/acpi/battery/BAT0/info
from Thinkpad x220 (before a BIOS update) with a 4-cell battery:
present: yes
design capacity: 2886 mAh
last full capacity: 2909 mAh
battery technology: rechargeable
design voltage: 14800 mV
design capacity warning: 145 mAh
design capacity low: 13 mAh
cycle count: 0
capacity granularity 1: 1 mAh
capacity granularity 2: 1 mAh
model number: 42T4899
serial number: 21064
battery type: LION
OEM info: SANYO
Once the laptop switches the unit to mWh (unplug from mains, suspend,
resume), the output changes to:
present: yes
design capacity: 28860 mWh
last full capacity: 29090 mWh
battery technology: rechargeable
design voltage: 14800 mV
design capacity warning: 1454 mWh
design capacity low: 200 mWh
cycle count: 0
capacity granularity 1: 1 mWh
capacity granularity 2: 1 mWh
model number: 42T4899
serial number: 21064
battery type: LION
OEM info: SANYO
Can you see how the values for "design capacity", etc., differ by a
factor of 10 instead of 14.8 (the design voltage of this battery)?
On the battery itself it says: 14.8V, 1.95Ah, 29Wh, so clearly the
values reported in mWh are correct and the ones in mAh are not.
My guess is that this problem has been around ever since those
machines were released, but because the most common Thinkpad
batteries are rated at 10.8V, the error (8%) is small enough that it
simply hasn't been noticed or at least nobody could be bothered to
look into it.
My patch works around the problem by adjusting the incorrectly
reported mAh values by "10000 / design_voltage". The patch also has
code to figure out if it should be activated or not. It only
activates on Lenovo Thinkpads, only when the unit is mAh, and, as an
extra precaution, only when the battery capacity reported through
ACPI does not match what is reported through DMI (I've never
encountered a machine where the first two conditions would be true
but the last would not, but better safe than sorry).
I've been using this patch for close to a year on several systems
without any problems.
References: https://bugzilla.kernel.org/show_bug.cgi?id=41062
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7e14b375b upstream.
The Newport AGILIS model AG-UC8 compact piezo motor controller
(http://search.newport.com/?q=*&x2=sku&q2=AG-UC8)
is yet another device using an FTDI USB-to-serial chip. It works
fine with the ftdi_sio driver when adding
options ftdi-sio product=0x3000 vendor=0x104d
to modprobe.d. udevadm reports "Newport" as the manufacturer,
and "Agilis" as the product name.
Signed-off-by: Martin Teichmann <lkb.teichmann@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f36446cf9b upstream.
The Huawei E173 will normally appear as 12d1:1436 in Linux. But
the modem has another mode with different device ID and a slightly
different set of descriptors. This is the mode used by Windows like
this:
3Modem: USB\VID_12D1&PID_140C&MI_00\6&3A1D2012&0&0000
Networkcard: USB\VID_12D1&PID_140C&MI_01\6&3A1D2012&0&0001
Appli.Inter: USB\VID_12D1&PID_140C&MI_02\6&3A1D2012&0&0002
PC UI Inter: USB\VID_12D1&PID_140C&MI_03\6&3A1D2012&0&0003
All interfaces have the same ff/ff/ff class codes in this mode.
Blacklisting the network interface to allow it to be picked up by
the network driver.
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 18a2f371f5 upstream.
This fixes a regression in 3.7-rc, which has since gone into stable.
Commit 00442ad04a ("mempolicy: fix a memory corruption by refcount
imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the
refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went
on expecting alloc_page_vma() to drop the refcount it had acquired.
This deserves a rework: but for now fix the leak in shmem_alloc_page().
Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use
the same refcounting there as in shmem_alloc_page(), delete its onstack
mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() -
those were invented to let swapin_readahead() make an unknown number of
calls to alloc_pages_vma() with one mempolicy; but since 00442ad04a,
alloc_pages_vma() has kept refcount in balance, so now no problem.
Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Not needed in 3.8 or newer as this driver is removed there. - gregkh]
We get this from user space and nothing has been done to ensure that
these strings are NUL terminated.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 387870f2d6 upstream.
dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.
This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.
Reported-by: Soeren Moch <smoch@web.de>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Soeren Moch <smoch@web.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc4b514f27 upstream.
8852aac25e ("workqueue: mod_delayed_work_on() shouldn't queue timer on
0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
megaraid - it allocated work_struct, casted it to delayed_work and
then pass that into queue_delayed_work().
Previously, this was okay because 0 @delay short-circuited to
queue_work() before doing anything with delayed_work. 8852aac25e
moved 0 @delay test into __queue_delayed_work() after sanity check on
delayed_work making megaraid trigger BUG_ON().
Although megaraid is already fixed by c1d390d8e6 ("megaraid: fix
BUG_ON() from incorrect use of delayed work"), this patch converts
BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
abusers, if there are more, trigger warning but don't crash the
machine.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e69b742a67 upstream.
gcc (rightfully) complains that we are accessing beyond the
end of the fpr array (we do, to access the fpscr).
The only sane thing to do (whether anything in that code can be
called remotely sane is debatable) is to special case fpscr and
handle it as a separate statement.
I initially tried to do it it by making the array access conditional
to index < PT_FPSCR and using a 3rd else leg but for some reason gcc
was unable to understand it and still spewed the warning.
So I ended up with something a tad more intricated but it seems to
build on 32-bit and on 64-bit with and without VSX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39141ddfb6 upstream.
After commit 846a136881 ("ARM: vfp: fix
saving d16-d31 vfp registers on v6+ kernels"), the OMAP 2430SDP board
started crashing during boot with omap2plus_defconfig:
[ 3.875122] mmcblk0: mmc0:e624 SD04G 3.69 GiB
[ 3.915954] mmcblk0: p1
[ 4.086639] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
[ 4.093719] Modules linked in:
[ 4.096954] CPU: 0 Not tainted (3.6.0-02232-g759e00b #570)
[ 4.103149] PC is at vfp_reload_hw+0x1c/0x44
[ 4.107666] LR is at __und_usr_fault_32+0x0/0x8
It turns out that the context save/restore fix unmasked a latent bug
in commit 5aaf254409 ("ARM: 6203/1: Make
VFPv3 usable on ARMv6"). When CONFIG_VFPv3 is set, but the kernel is
booted on a pre-VFPv3 core, the code attempts to save and restore the
d16-d31 VFP registers. These are only present on non-D16 VFPv3+, so
this results in an undefined instruction exception. The code didn't
crash before commit 846a136 because the save and restore code was
only touching d0-d15, present on all VFP.
Fix by implementing a request from Russell King to add a new HWCAP
flag that affirmatively indicates the presence of the d16-d31
registers:
http://marc.info/?l=linux-arm-kernel&m=135013547905283&w=2
and some feedback from Måns to clarify the name of the HWCAP flag.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Martin <dave.martin@linaro.org>
Cc: Måns Rullgård <mans.rullgard@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d93592807 upstream.
Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.
Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
CC: James Bottomley <JBottomley@parallels.com>
CC: linux-scsi@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cc: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a51d4ed01e upstream.
This board is incorrectly detected as having an LVDS connector,
resulting in the VGA output (the only available output on the board)
showing the console only in the top-left 1024x768 pixels, and an extra
LVDS connector appearing in X.
It's a desktop Mini-ITX board using an Atom D525 CPU with an NM10
chipset.
I've had this board for about a year, but this is the first time I
noticed the issue because I've been running it headless for most of its
life.
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-3.0 commit 42ab5316 (ipv4: fix redirect handling) was
backport of mainline commit 9cc20b26 from 3.2-rc3 where hh
member of struct dst_entry was already gone.
However, in 3.0 we still have it and we have to clean it as
well, otherwise it keeps pointing to the cleaned up (and
unusable) hh_cache entry and packets cannot be sent out.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d356cf5a74 upstream.
PMU interrupts start at IRQ_DOVE_PMU_START, not IRQ_DOVE_PMU_START + 1.
Fix the condition. (It may have been less likely to occur had the code
been written "if (irq >= IRQ_DOVE_PMU_START" which imho is the easier
to understand notation, and matches the normal way of thinking about
these things.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d3df93542 upstream.
Fix the acknowledgement of PMU interrupts on Dove: some Dove hardware
has not been sensibly designed so that interrupts can be handled in a
race free manner. The PMU is one such instance.
The pending (aka 'cause') register is a bunch of RW bits, meaning that
these bits can be both cleared and set by software (confirmed on the
Armada-510 on the cubox.)
Hardware sets the appropriate bit when an interrupt is asserted, and
software is required to clear the bits which are to be processed. If
we write ~(1 << bit), then we end up asserting every other interrupt
except the one we're processing. So, we need to do a read-modify-write
cycle to clear the asserted bit.
However, any interrupts which occur in the middle of this cycle will
also be written back as zero, which will also clear the new interrupts.
The upshot of this is: there is _no_ way to safely clear down interrupts
in this register (and other similarly behaving interrupt pending
registers on this device.) The patch below at least stops us creating
new interrupts.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52965cc012 upstream.
Some bcm5974 trackpads have a physical button beneath the physical surface.
This patch sets the property bit so user space applications can detect the
trackpad type and act accordingly.
Signed-off-by: Jussi Pakkanen <jussi.pakkanen@canonical.com>
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29e9bf1841 upstream.
Thermal throttle and power limit events are not defined as MCE errors in x86
architecture and should not generate MCE errors in mcelog.
Current kernel generates fake software defined MCE errors for these events.
This may confuse users because they may think the machine has real MCE errors
while actually only thermal throttle or power limit events happen.
To make it worse, buggy firmware on some platforms may falsely generate
the events. Therefore, kernel reports MCE errors which users think as real
hardware errors. Although the firmware bugs should be fixed, on the other hand,
kernel should not report MCE errors either.
So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count,
package_power_limit_count, core_throttle_count, and package_throttle_count) in
/sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters
for each event on each CPU. Please note that all CPU's on one package report
duplicate counters. It's user application's responsibity to retrieve a package
level counter for one package.
This patch doesn't report package level power limit, core level power limit, and
package level thermal throttle events in mcelog. When the events happen, only
report them in respective counters in sysfs.
Since core level thermal throttle has been legacy code in kernel for a while and
users accepted it as MCE error in mcelog, core level thermal throttle is still
reported in mcelog. In the mean time, the event is counted in a counter in sysfs
as well.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Borislav Petkov <bp@amd64.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ffeb9b0e6 upstream.
In get_sample_period(), unsigned long is not enough:
watchdog_thresh * 2 * (NSEC_PER_SEC / 5)
case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800
case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000
In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5260e458f5 upstream.
Make sure generic close is called at close.
The driver relies on the generic write implementation but did not call
generic close.
Note that the call to kill the read urb is not redundant, as mct_u232
uses an interrupt urb from the second port as the read urb and that
generic close therefore fails to kill it.
Compile-only tested.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit b03e66a6be upstream.
If kdump is triggered with pending IO, controller may not respond causing
kdump to fail.
http://marc.info/?l=linux-ide&m=133032255424658&w=2
During error recovery ata_do_dev_read_id never completes due hang
in mmio_insw.
ata_do_dev_read_id
ata_sff_data_xfer
ioread16_rep
mmio_insw
if DMA start bit is cleared before reset, PIO command is successful
and kdump succeeds.
Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6fdd8e5d0 upstream.
The delayed work function int_in_work() may call usb_reset_device()
and thus, indirectly, the driver's pre_reset method. Trying to
cancel the work synchronously in that situation would deadlock.
Fix by avoiding cancel_work_sync() in the pre_reset method.
If the reset was NOT initiated by int_in_work() this might cause
int_in_work() to run after the post_reset method, with urb_int_in
already resubmitted, so handle that case gracefully.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fae2ae2a90 upstream.
If a signal handler is executed on altstack and another signal comes,
we will end up with rt_sigreturn() on return from the second handler
getting -EPERM from do_sigaltstack(). It's perfectly OK, since we
are not asking to change the settings; in fact, they couldn't have been
changed during the second handler execution exactly because we'd been
on altstack all along. 64bit sigreturn on sparc treats any error from
do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics
we are using on other architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25389bb207 upstream.
Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.
Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81b401100c upstream.
Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45171002b0 upstream.
The Intel 82855PM host bridge / Mobility FireGL 9000 RV250 combination
in an (outdated) ThinkPad T41 needs AGPMode 1 for suspend/resume (under
KMS, that is). So add a quirk for it.
(Change R250 to RV250 in comment for preceding quirk too.)
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b78a4932f5 upstream.
The check whether the IBSS is active and can be removed should be
performed before deinitializing the fields used for the check/search.
Otherwise, the configured BSS will not be found and removed properly.
To make it more clear for the future, rename sdata->u.ibss to the
local pointer ifibss which is used within the checks.
This behaviour was introduced by
f3209bea11
("mac80211: fix IBSS teardown race")
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Ignacy Gawedzki <i@lri.fr>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa10990e02 upstream.
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8c32a5c98 upstream.
Request based dm attempts to re-run the request queue off the
request completion path. If used with a driver that potentially does
end_io from its request_fn, we could deadlock trying to recurse
back into request dispatch. Fix this by punting the request queue
run to kblockd.
Tested to fix a quickly reproducible deadlock in such a scenario.
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 949a05d034 upstream.
On Thu, 2012-11-01 at 16:45 -0700, Michel Lespinasse wrote:
> Looking at the arch/parisc/kernel/sys_parisc.c implementation of
> get_shared_area(), I do have a concern though. The function basically
> ignores the pgoff argument, so that if one creates a shared mapping of
> pages 0-N of a file, and then a separate shared mapping of pages 1-N
> of that same file, both will have the same cache offset for their
> starting address.
>
> This looks like this would create obvious aliasing issues. Am I
> misreading this ? I can't understand how this could work good enough
> to be undetected, so there must be something I'm missing here ???
This turns out to be correct and we need to pay attention to the pgoff as
well as the address when creating the virtual address for the area.
Fortunately, the bug is rarely triggered as most applications which use pgoff
tend to use large values (git being the primary one, and it uses pgoff in
multiples of 16MB) which are larger than our cache coherency modulus, so the
problem isn't often seen in practise.
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e99ddfde6a upstream.
Commit 88a8516a21 (ALSA: usbaudio: implement USB autosuspend) added
autosuspend code to all files making up the snd-usb-audio driver.
However, midi.c is part of snd-usb-lib and is also used by other
drivers, not all of which support autosuspend. Thus, calls to
usb_autopm_get_interface() could fail, and this unexpected error would
result in the MIDI output being completely unusable.
Make it work by ignoring the error that is expected with drivers that do
not support autosuspend.
Reported-by: Colin Fletcher <colin.m.fletcher@googlemail.com>
Reported-by: Devin Venable <venable.devin@gmail.com>
Reported-by: Dr Nick Bailey <nicholas.bailey@glasgow.ac.uk>
Reported-by: Jannis Achstetter <jannis_achstetter@web.de>
Reported-by: Rui Nuno Capela <rncbc@rncbc.org>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49bd665c54 upstream.
SATA MICROCODE DOWNALOAD fails on isci driver. After receiving Register
Device to Host (FIS 0x34) frame Initiator resets phy.
In the frame handler routine response (FIS 0x34) was copied into wrong
buffer and upper layer did not receive any answer which resulted in
timeout and reset.
This patch corrects this bug.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd321acddc upstream.
When host_sleep_config command fails we should return error to
MMC core to indicate the failure for our device.
The misspelled variable is also removed as it's redundant.
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 450faacc62 upstream.
Documentation/networking/ifenslave.c: In function ‘if_getconfig’:
Documentation/networking/ifenslave.c:508:14: warning: variable ‘mtu’ set but not used [-Wunused-but-set-variable]
Documentation/networking/ifenslave.c:508:6: warning: variable ‘metric’ set but not used [-Wunused-but-set-variable]
The purpose of this function is to simply print out the values
it probes, so...
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c718a54649 upstream.
Fix several -Wuninitialized compiler warnings by changing the
order of getting modedb in riva_update_default_var() to set
first the fallback and then the prefered timing.
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cecd353a02 upstream.
Set CommandMailbox with memset before use it. Fix for:
drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’:
arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’
may be used uninitialized in this function [-Wuninitialized]
drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’
was declared here
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bca505f109 upstream.
Fixed compiler warning:
comparison between ‘DAC960_V2_IOCTL_Opcode_T’ and ‘enum <anonymous>’
Renamed enum, added a new enum for SCSI_10.CommandOpcode in
DAC960_V2_ProcessCompletedCommand().
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47ea91b405 upstream.
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window. This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.
__find_resource() looks for gaps in resource allocation among the
children resource windows. When it encounters the last child window it
blindly tries the range next to one allocated to the last child. Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation. This leads to a conflicting
window allocation.
Michal Ludvig reported this issue seen on his platform. The following
patch fixes the problem and has been verified by Michal. I believe this
bug has been there for ages. It got exposed by git commit 2bbc694227
("PCI : ability to relocate assigned pci-resources")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michal Ludvig <mludvig@logix.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4ac9fea01 upstream.
During debug of one SRIOV enabled hotplug device, we found found that
add_size is not passed properly.
The device has devices under two level bridges:
+-[0000:80]-+-00.0-[81-8f]--
| +-01.0-[90-9f]--
| +-02.0-[a0-af]----00.0-[a1-a3]--+-02.0-[a2]--+-00.0 Oracle Corporation Device
| | \-03.0-[a3]--+-00.0 Oracle Corporation Device
Which means later the parent bridge will not try to add a big enough range:
[ 557.455077] pci 0000:a0:00.0: BAR 14: assigned [mem 0xf9000000-0xf93fffff]
[ 557.461974] pci 0000:a0:00.0: BAR 15: assigned [mem 0xf6000000-0xf61fffff pref]
[ 557.469340] pci 0000:a1:02.0: BAR 14: assigned [mem 0xf9000000-0xf91fffff]
[ 557.476231] pci 0000:a1:02.0: BAR 15: assigned [mem 0xf6000000-0xf60fffff pref]
[ 557.483582] pci 0000:a1:03.0: BAR 14: assigned [mem 0xf9200000-0xf93fffff]
[ 557.490468] pci 0000:a1:03.0: BAR 15: assigned [mem 0xf6100000-0xf61fffff pref]
[ 557.497833] pci 0000:a1:03.0: BAR 14: can't assign mem (size 0x200000)
[ 557.504378] pci 0000:a1:03.0: failed to add optional resources res=[mem 0xf9200000-0xf93fffff]
[ 557.513026] pci 0000:a1:02.0: BAR 14: can't assign mem (size 0x200000)
[ 557.519578] pci 0000:a1:02.0: failed to add optional resources res=[mem 0xf9000000-0xf91fffff]
It turns out we did not calculate size1 properly.
static resource_size_t calculate_memsize(resource_size_t size,
resource_size_t min_size,
resource_size_t size1,
resource_size_t old_size,
resource_size_t align)
{
if (size < min_size)
size = min_size;
if (old_size == 1 )
old_size = 0;
if (size < old_size)
size = old_size;
size = ALIGN(size + size1, align);
return size;
}
We should not pass add_size with min_size in calculate_memsize since
that will make add_size not contribute final add_size.
So just pass add_size with size1 to calculate_memsize().
With this change, we should have chance to remove extra addon in
pci_reassign_resource.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbc694227 upstream.
Currently pci-bridges are allocated enough resources to satisfy their immediate
requirements. Any additional resource-requests fail if additional free space,
contiguous to the one already allocated, is not available. This behavior is not
reasonable since sufficient contiguous resources, that can satisfy the request,
are available at a different location.
This patch provides the ability to expand and relocate a allocated resource.
v2: Changelog: Fixed size calculation in pci_reassign_resource()
v3: Changelog : Split this patch. The resource.c changes are already
upstream. All the pci driver changes are in here.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361d94a338 upstream.
Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7af1168693 upstream.
Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e06ef2e8 upstream.
In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bb3e1fc47 upstream.
When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 399f11c3d8 upstream.
Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception. This works for most cases, but causes
a hang on the following test case:
Client Server
------ ------
Open file over NFS v4.1
Write to file
Expire client
Try to lock file
The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery. However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled. The
simplest solution is to wait for session recovery to run before retrying
the lock.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9193983f4 upstream.
The overlay on the i830M has a peculiar failure mode: It works the
first time around after boot-up, but consistenly hangs the second time
it's used.
Chris Wilson has dug out a nice errata:
"1.5.12 Clock Gating Disable for Display Register
Address Offset: 06200h–06203h
"Bit 3
Ovrunit Clock Gating Disable.
0 = Clock gating controlled by unit enabling logic
1 = Disable clock gating function
DevALM Errata ALM049: Overlay Clock Gating Must be Disabled: Overlay
& L2 Cache clock gating must be disabled in order to prevent device
hangs when turning off overlay.SW must turn off Ovrunit clock gating
(6200h) and L2 Cache clock gating (C8h)."
Now I've nowhere found that 0xc8 register and hence couldn't apply the
l2 cache workaround. But I've remembered that part of the magic that
the OVERLAY_ON/OFF commands are supposed to do is to rearrange cache
allocations so that the overlay scaler has some scratch space.
And while pondering how that could explain the hang the 2nd time we
enable the overlay, I've remembered that the old ums overlay code did
_not_ issue the OVERLAY_OFF cmd.
And indeed, disabling the OFF cmd results in the overlay working
flawlessly, so I guess we can workaround the lack of the above
workaround by simply never disabling the overlay engine once it's
enabled.
Note that we have the first part of the above w/a already implemented
in i830_init_clock_gating - leave that as-is to avoid surprises.
v2: Add a comment in the code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47827
Tested-by: Rhys <rhyspuk@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[bwh: Backported to 3.2:
- Adjust context
- s/intel_ring_emit(ring, /OUT_RING(/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 069ddcda37 upstream.
When the eCryptfs mount options do not include '-o acl', but the lower
filesystem's mount options do include 'acl', the MS_POSIXACL flag is not
flipped on in the eCryptfs super block flags. This flag is what the VFS
checks in do_last() when deciding if the current umask should be applied
to a newly created inode's mode or not. When a default POSIX ACL mask is
set on a directory, the current umask is incorrectly applied to new
inodes created in the directory. This patch ignores the MS_POSIXACL flag
passed into ecryptfs_mount() and sets the flag on the eCryptfs super
block depending on the flag's presence on the lower super block.
Additionally, it is incorrect to allow a writeable eCryptfs mount on top
of a read-only lower mount. This missing check did not allow writes to
the read-only lower mount because permissions checks are still performed
on the lower filesystem's objects but it is best to simply not allow a
rw mount on top of ro mount. However, a ro eCryptfs mount on top of a rw
mount is valid and still allowed.
https://launchpad.net/bugs/1009207
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Stefan Beller <stefanbeller@googlemail.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0658a3366d upstream.
The use of kfree(serial) in error cases of usb_serial_probe
was invalid - usb_serial structure allocated in create_serial()
gets reference of usb_device that needs to be put, so we need
to use usb_serial_put() instead of simple kfree().
Signed-off-by: Jan Safrata <jan.nikitenko@gmail.com>
Acked-by: Johan Hovold <jhovold@gmail.com>
Cc: Richard Retanubun <richardretanubun@ruggedcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38fe36a248 upstream.
ICMP tuples have id in src and type/code in dst.
So comparing src.u.all with dst.u.all will always fail here
and ip_xfrm_me_harder() is called for every ICMP packet,
even if there was no NAT.
Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64f509ce71 upstream.
Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.
See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1e0d8b70f upstream.
The correct syntax for gcc -x is "gcc -x assembler", not
"gcc -xassembler". Even though the latter happens to work, the former
is what is documented in the manual page and thus what gcc wrappers
such as icecream do expect.
This isn't a cosmetic change. The missing space prevents icecream from
recognizing compilation tasks it can't handle, leading to silent kernel
miscompilations.
Besides me, credits go to Michael Matz and Dirk Mueller for
investigating the miscompilation issue and tracking it down to this
incorrect -x parameter syntax.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aee77e4acc upstream.
The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
a box where this prevents the interface from using the gigabit line to its full
potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.
The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
(XID 0c900880). TSO is enabled.
I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
'7'=unlimited). This chart shows the results:
http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png
Interesting points:
- With the current DMA burst limit (1024):
- at the default MTU=1500 I get only 842 Mbit/s.
- when going from small MTU, the performance rises monotonically with
increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
a sudden drop to 762 MBit/s from which the throughput rises monotonically
again with further MTU increases.
- With a smaller DMA burst limit (512):
- there's a similar peak at MTU=1076 and another one at MTU=564.
- With unlimited DMA burst:
- at the default MTU=1500 I get nice 940 Mbit/s.
- the throughput rises monotonically with increasing MTU with no strange
peaks.
Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
limit plus 52. Why 52? Because:
20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52
The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
the smaller burst limit, or if any other versions have similar requirements.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a652208e0b ]
Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0c9f79be29 ]
(1<<optname) is undefined behavior in C with a negative optname or
optname larger than 31. In those cases the result of the shift is
not necessarily zero (e.g., on x86).
This patch simplifies the code with a switch statement on optname.
It also allows the compiler to generate better code (e.g., using a
64-bit mask).
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34fa78b59c upstream.
The sigaddset/sigdelset/sigismember functions that are implemented with
bitfield insn cannot allow the sigset argument to be placed in a data
register since the sigset is wider than 32 bits. Remove the "d"
constraint from the asm statements.
The effect of the bug is that sending RT signals does not work, the signal
number is truncated modulo 32.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d55c4c613f upstream.
When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a28ad42a4a upstream.
This is a bugfix for a problem with the following symptoms:
1. A power cut happens
2. After reboot, we try to mount UBIFS
3. Mount fails with "No space left on device" error message
UBIFS complains like this:
UBIFS error (pid 28225): grab_empty_leb: could not find an empty LEB
The root cause of this problem is that when we mount, not all LEBs are
categorized. Only those which were read are. However, the
'ubifs_find_free_leb_for_idx()' function assumes that all LEBs were
categorized and 'c->freeable_cnt' is valid, which is a false assumption.
This patch fixes the problem by teaching 'ubifs_find_free_leb_for_idx()'
to always fall back to LPT scanning if no freeable LEBs were found.
This problem was reported by few people in the past, but Brent Taylor
was able to reproduce it and send me a flash image which cannot be mounted,
which made it easy to hunt the bug. Kudos to Brent.
Reported-by: Brent Taylor <motobud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 445632ad6d upstream.
DAPM shutdown incorrectly uses "list" field of codec struct while
iterating over probed components (codec_dev_list). "list" field
refers to codecs registered in the system, "card_list" field is
used for probed components.
Signed-off-by: Misael Lopez Cruz <misael.lopez@ti.com>
Signed-off-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55c6f4cb6e upstream.
When MCLK is supplied externally and BCLK and LRC are configured as outputs
(codec is master), the PLL values are only calculated correctly on the first
transmission. On subsequent transmissions, at differenct sample rates, the
wrong PLL values are used. Test for f_opclk instead of f_pllout to determine
if the PLL values are needed.
Signed-off-by: Eric Millbrandt <emillbrandt@dekaresearch.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9efade1b3e upstream.
cryptd_queue_worker attempts to prevent simultaneous accesses to crypto
workqueue by cryptd_enqueue_request using preempt_disable/preempt_enable.
However cryptd_enqueue_request might be called from softirq context,
so add local_bh_disable/local_bh_enable to prevent data corruption and
panics.
Bug report at http://marc.info/?l=linux-crypto-vger&m=134858649616319&w=2
v2:
- Disable software interrupts instead of hardware interrupts
Reported-by: Gurucharan Shetty <gurucharan.shetty@gmail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b0a8cc58e6 upstream.
In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.
In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.
Signed-off-by: Takamori Yamaguchi <takamori.yamaguchi@jp.sony.com>
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10e44239f6 upstream.
The recent change for USB-audio disconnection race fixes introduced a
mutex deadlock again. There is a circular dependency between
chip->shutdown_rwsem and pcm->open_mutex, depicted like below, when a
device is opened during the disconnection operation:
A. snd_usb_audio_disconnect() ->
card.c::register_mutex ->
chip->shutdown_rwsem (write) ->
snd_card_disconnect() ->
pcm.c::register_mutex ->
pcm->open_mutex
B. snd_pcm_open() ->
pcm->open_mutex ->
snd_usb_pcm_open() ->
chip->shutdown_rwsem (read)
Since the chip->shutdown_rwsem protection in the case A is required
only for turning on the chip->shutdown flag and it doesn't have to be
taken for the whole operation, we can reduce its window in
snd_usb_audio_disconnect().
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e7abe2556 upstream.
When unbinding a device so that I could pass it through to a KVM VM, I
got the lockdep report below. It looks like a legitimate lock
ordering problem:
- domain_context_mapping_one() takes iommu->lock and calls
iommu_support_dev_iotlb(), which takes device_domain_lock (inside
iommu->lock).
- domain_remove_one_dev_info() starts by taking device_domain_lock
then takes iommu->lock inside it (near the end of the function).
So this is the classic AB-BA deadlock. It looks like a safe fix is to
simply release device_domain_lock a bit earlier, since as far as I can
tell, it doesn't protect any of the stuff accessed at the end of
domain_remove_one_dev_info() anyway.
BTW, the use of device_domain_lock looks a bit unsafe to me... it's
at least not obvious to me why we aren't vulnerable to the race below:
iommu_support_dev_iotlb()
domain_remove_dev_info()
lock device_domain_lock
find info
unlock device_domain_lock
lock device_domain_lock
find same info
unlock device_domain_lock
free_devinfo_mem(info)
do stuff with info after it's free
However I don't understand the locking here well enough to know if
this is a real problem, let alone what the best fix is.
Anyway here's the full lockdep output that prompted all of this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.39.1+ #1
-------------------------------------------------------
bash/13954 is trying to acquire lock:
(&(&iommu->lock)->rlock){......}, at: [<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
but task is already holding lock:
(device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (device_domain_lock){-.-...}:
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f8350>] domain_context_mapping_one+0x600/0x750
[<ffffffff812f84df>] domain_context_mapping+0x3f/0x120
[<ffffffff812f9175>] iommu_prepare_identity_map+0x1c5/0x1e0
[<ffffffff81ccf1ca>] intel_iommu_init+0x88e/0xb5e
[<ffffffff81cab204>] pci_iommu_init+0x16/0x41
[<ffffffff81002165>] do_one_initcall+0x45/0x190
[<ffffffff81ca3d3f>] kernel_init+0xe3/0x168
[<ffffffff8157ac24>] kernel_thread_helper+0x4/0x10
-> #0 (&(&iommu->lock)->rlock){......}:
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
6 locks held by bash/13954:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff811e4464>] sysfs_write_file+0x44/0x170
#1: (s_active#3){++++.+}, at: [<ffffffff811e44ed>] sysfs_write_file+0xcd/0x170
#2: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81372edb>] driver_unbind+0x9b/0xc0
#3: (&__lockdep_no_validate__){+.+.+.}, at: [<ffffffff81373cc7>] device_release_driver+0x27/0x50
#4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [<ffffffff8108974f>] __blocking_notifier_call_chain+0x5f/0xb0
#5: (device_domain_lock){-.-...}, at: [<ffffffff812f6508>] domain_remove_one_dev_info+0x208/0x230
stack backtrace:
Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1
Call Trace:
[<ffffffff810993a7>] print_circular_bug+0xf7/0x100
[<ffffffff8109bf3e>] __lock_acquire+0x195e/0x1e10
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109d57d>] ? trace_hardirqs_on_caller+0x13d/0x180
[<ffffffff8109ca9d>] lock_acquire+0x9d/0x130
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff81571475>] _raw_spin_lock_irqsave+0x55/0xa0
[<ffffffff812f6421>] ? domain_remove_one_dev_info+0x121/0x230
[<ffffffff810972bd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff812f6421>] domain_remove_one_dev_info+0x121/0x230
[<ffffffff812f8b42>] device_notifier+0x72/0x90
[<ffffffff8157555c>] notifier_call_chain+0x8c/0xc0
[<ffffffff81089768>] __blocking_notifier_call_chain+0x78/0xb0
[<ffffffff810897b6>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff81373a5c>] __device_release_driver+0xbc/0xe0
[<ffffffff81373ccf>] device_release_driver+0x2f/0x50
[<ffffffff81372ee3>] driver_unbind+0xa3/0xc0
[<ffffffff813724ac>] drv_attr_store+0x2c/0x30
[<ffffffff811e4506>] sysfs_write_file+0xe6/0x170
[<ffffffff8117569e>] vfs_write+0xce/0x190
[<ffffffff811759e4>] sys_write+0x54/0xa0
[<ffffffff81579a82>] system_call_fastpath+0x16/0x1b
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6ce377afd1 upstream.
Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.
Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix warning about unused variable introduced by commit e681b66f2e
("USB: mos7840: remove invalid disconnect handling") upstream.
A subsequent fix which removed the disconnect function got rid of the
warning but that one was only backported to v3.6.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6e0e543f7 upstream.
Like in the case of native hdmi, which is fixed already in
commit adf00b26d1
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Tue Sep 25 13:23:34 2012 -0300
drm/i915: make sure we write all the DIP data bytes
we need to clear the entire sdvo buffer to avoid upsetting the
display.
Since infoframe buffer writing is now a bit more elaborate, extract it
into it's own function. This will be useful if we ever get around to
properly update the ELD for sdvo. Also #define proper names for the
two buffer indexes with fixed usage.
v2: Cite the right commit above, spotted by Paulo Zanoni.
v3: I'm too stupid to paste the right commit.
v4: Ben Hutchings noticed that I've failed to handle an underflow in
my loop logic, breaking it for i >= length + 8. Since I've just lost C
programmer license, use his solution. Also, make the frustrated 0-base
buffer size a notch more clear.
Reported-and-tested-by: Jürg Billeter <j@bitron.ch>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Cc: Paulo Zanoni <przanoni@gmail.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81014b9d0b upstream.
At least the worst offenders:
- SDVO specifies that the encoder should compute the ecc. Testing also
shows that we must not send the ecc field, so copy the dip_infoframe
struct to a temporay place and avoid the ecc field. This way the avi
infoframe is exactly 17 bytes long, which agrees with what the spec
mandates as a minimal storage capacity (with the ecc field it would
be 18 bytes).
- Only 17 when sending the avi infoframe. The SDVO spec explicitly
says that sending more data than what the device announces results
in undefined behaviour.
- Add __attribute__((packed)) to the avi and spd infoframes, for
otherwise they're wrongly aligned. Noticed because the avi infoframe
ended up being 18 bytes large instead of 17. We haven't noticed this
yet because we don't use the uint16_t fields yet (which are the only
ones that would be wrongly aligned).
This regression has been introduce by
3c17fe4b8f is the first bad commit
commit 3c17fe4b8f
Author: David Härdeman <david@hardeman.nu>
Date: Fri Sep 24 21:44:32 2010 +0200
i915: enable AVI infoframe for intel_hdmi.c [v4]
Patch tested on my g33 with a sdvo hdmi adaptor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=25732
Tested-by: Peter Ross <pross@xvid.org> (G35 SDVO-HDMI)
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59fa624519 upstream.
Siddhesh analyzed a failure in the take over of pi futexes in case the
owner died and provided a workaround.
See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076
The detailed problem analysis shows:
Futex F is initialized with PTHREAD_PRIO_INHERIT and
PTHREAD_MUTEX_ROBUST_NP attributes.
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Succeeds due to the check for F's userspace TID field == 0
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes T2 via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains real ownership of the
pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
T3 --> observes inconsistent state
This problem is independent of UP/SMP, preemptible/non preemptible
kernels, or process shared vs. private. The only difference is that
certain configurations are more likely to expose it.
So as Siddhesh correctly analyzed the following check in
futex_lock_pi_atomic() is the culprit:
if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
We check the userspace value for a TID value of 0 and take over the
futex unconditionally if that's true.
AFAICT this check is there as it is correct for a different corner
case of futexes: the WAITERS bit became stale.
Now the proposed change
- if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+ if (unlikely(ownerdied ||
+ !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) {
solves the problem, but it's not obvious why and it wreckages the
"stale WAITERS bit" case.
What happens is, that due to the WAITERS bit being set (T2 is blocked
on that futex) it enforces T3 to go through lookup_pi_state(), which
in the above case returns an existing pi_state and therefor forces T3
to legitimately fight with T2 over the ownership of the pi_state (via
pi_state->mutex). Probelm solved!
Though that does not work for the "WAITERS bit is stale" problem
because if lookup_pi_state() does not find existing pi_state it
returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to
return -ESRCH to user space because the OWNER_DIED bit is not set.
Now there is a different solution to that problem. Do not look at the
user space value at all and enforce a lookup of possibly available
pi_state. If pi_state can be found, then the new incoming locker T3
blocks on that pi_state and legitimately races with T2 to acquire the
rt_mutex and the pi_state and therefor the proper ownership of the
user space futex.
lookup_pi_state() has the correct order of checks. It first tries to
find a pi_state associated with the user space futex and only if that
fails it checks for futex TID value = 0. If no pi_state is available
nothing can create new state at that point because this happens with
the hash bucket lock held.
So the above scenario changes to:
T1 lock_futex_pi(F);
T2 lock_futex_pi(F);
--> T2 blocks on the futex and creates pi_state which is associated
to T1.
T1 exits
--> exit_robust_list() runs
--> Futex F userspace value TID field is set to 0 and
FUTEX_OWNER_DIED bit is set.
T3 lock_futex_pi(F);
--> Finds pi_state and blocks on pi_state->rt_mutex
T1 --> exit_pi_state_list()
--> Transfers pi_state to waiter T2 and wakes it via
rt_mutex_unlock(&pi_state->mutex)
T2 --> acquires pi_state->mutex and gains ownership of the pi_state
--> Claims ownership of the futex and sets its own TID into the
userspace TID field of futex F
--> returns to user space
This covers all gazillion points on which T3 might come in between
T1's exit_robust_list() clearing the TID field and T2 fixing it up. It
also solves the "WAITERS bit stale" problem by forcing the take over.
Another benefit of changing the code this way is that it makes it less
dependent on untrusted user space values and therefor minimizes the
possible wreckage which might be inflicted.
As usual after staring for too long at the futex code my brain hurts
so much that I really want to ditch that whole optimization of
avoiding the syscall for the non contended case for PI futexes and rip
out the maze of corner case handling code. Unfortunately we can't as
user space relies on that existing behaviour, but at least thinking
about it helps me to preserve my mental sanity. Maybe we should
nevertheless :)
Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 60713a0ca7 ]
As documented in RFC4861 (Neighbor Discovery for IP version 6) 7.2.6.,
unsolicited neighbour advertisements should be sent to the all-nodes
multicast address.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 789336360e ]
When creating an L2TPv3 Ethernet session, if register_netdev() should fail for
any reason (for example, automatic naming for "l2tpeth%d" interfaces hits the
32k-interface limit), the netdev is freed in the error path. However, the
l2tp_eth_sess structure's dev pointer is left uncleared, and this results in
l2tp_eth_delete() then attempting to unregister the same netdev later in the
session teardown. This results in an oops.
To avoid this, clear the session dev pointer in the error path.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8f363b77ee ]
Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.
The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)
Steps to Reproduce:
1. Register tcp_illinois:
# sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
# watch -d ss -i
3. Establish new TCP conn to machine
Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.
This is only related to reading stats. The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params(). Thus, simply fix tcp_illinois_info().
Function tcp_illinois_info() / get_info() is called without
socket lock. Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable. Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.
Cc: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 39707c2a3b ]
Driver anchors the tx urbs and defers the urb submission if
a transmit request comes when the interface is suspended.
Anchoring urb increments the urb reference count. These
deferred urbs are later accessed by calling usb_get_from_anchor()
for submission during interface resume. usb_get_from_anchor()
unanchors the urb but urb reference count remains same.
This causes the urb reference count to remain non-zero
after usb_free_urb() gets called and urb never gets freed.
Hence call usb_put_urb() after anchoring the urb to properly
balance the reference count for these deferred urbs. Also,
unanchor these deferred urbs during disconnect, to free them
up.
Signed-off-by: Hemant Kumar <hemantk@codeaurora.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 14edd87dc6 ]
Commit a02e4b7dae4551(Demark default hoplimit as zero) only changes the
hoplimit checking condition and default value in ip6_dst_hoplimit, not
zeros all hoplimit default value.
Keep the zeroing ip6_template_metrics[RTAX_HOPLIMIT - 1] to force it as
const, cause as a37e6e344910(net: force dst_default_metrics to const
section)
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a3374c42aa ]
tcp_ioctl() tries to take into account if tcp socket received a FIN
to report correct number bytes in receive queue.
But its flaky because if the application ate the last skb,
we return 1 instead of 0.
Correct way to detect that FIN was received is to test SOCK_DONE.
Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6d772ac557 ]
On some suspend/resume operations involving wimax device, we have
noticed some intermittent memory corruptions in netlink code.
Stéphane Marchesin tracked this corruption in netlink_update_listeners()
and suggested a patch.
It appears netlink_release() should use kfree_rcu() instead of kfree()
for the listeners structure as it may be used by other cpus using RCU
protection.
netlink_release() must set to NULL the listeners pointer when
it is about to be freed.
Also have to protect netlink_update_listeners() and
netlink_has_listeners() if listeners is NULL.
Add a nl_deref_protected() lockdep helper to properly document which
locks protects us.
Reported-by: Jonathan Kliegman <kliegs@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stéphane Marchesin <marcheu@google.com>
Cc: Sam Leffler <sleffler@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0914f7961b upstream.
When disconnect callback is called, each component should wake up
sleepers and check card->shutdown flag for avoiding the endless sleep
blocking the proper resource release.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0830dbd4e upstream.
For more strict protection for wild disconnections, a refcount is
introduced to the card instance, and let it up/down when an object is
referred via snd_lookup_*() in the open ops.
The free-after-last-close check is also changed to check this refcount
instead of the empty list, too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34f3c89fda upstream.
Replace mutex with rwsem for codec->shutdown protection so that
concurrent accesses are allowed.
Also add the protection to snd_usb_autosuspend() and
snd_usb_autoresume(), too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 978520b75f upstream.
Close some races at disconnection of a USB audio device by adding the
chip->shutdown_mutex and chip->shutdown check at appropriate places.
The spots to put bandaids are:
- PCM prepare, hw_params and hw_free
- where the usb device is accessed for communication or get speed, in
mixer.c and others; the device speed is now cached in subs->speed
instead of accessing to chip->dev
The accesses in PCM open and close don't need the mutex protection
because these are already handled in the core PCM disconnection code.
The autosuspend/autoresume codes are still uncovered by this patch
because of possible mutex deadlocks. They'll be covered by the
upcoming change to rwsem.
Also the mixer codes are untouched, too. These will be fixed in
another patch, too.
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b0573c07f upstream.
Fix races at PCM disconnection:
- while a PCM device is being opened or closed
- while the PCM state is being changed without lock in prepare,
hw_params, hw_free ops
Reported-by: Matthieu CASTET <matthieu.castet@parrot.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3300fb4f88 upstream.
Don't assume bank 0 is selected at device probe time. This may not be
the case. Force bank selection at first register access to guarantee
that we read the right registers upon driver loading.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d96b10639 upstream.
The DNS resolver's use of the sunrpc cache involves a 'ttl' number
(relative) rather that a timeout (absolute). This confused me when
I wrote
commit c5b29f885a
"sunrpc: use seconds since boot in expiry cache"
and I managed to break it. The effect is that any TTL is interpreted
as 0, and nothing useful gets into the cache.
This patch removes the use of get_expiry() - which really expects an
expiry time - and uses get_uint() instead, treating the int correctly
as a ttl.
This fixes a regression that has been present since 2.6.37, causing
certain NFS accesses in certain environments to incorrectly fail.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a007c4c3e9 upstream.
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2b1bc308f4 upstream.
If the state recovery machinery is triggered by the call to
nfs4_async_handle_error() then we can deadlock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97a5486826 upstream.
Since commit c7f404b ('vfs: new superblock methods to override
/proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
device name reported back to userland.
nfs_path() always generates a trailing slash when the given dentry is
the root of an NFS mount, but userland may expect the original device
name to be returned verbatim (as it used to be). Make this
canonicalisation optional and change the callers accordingly.
[jrnieder@gmail.com: use flag instead of bool argument]
Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu>
Reference: http://bugs.debian.org/669314
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acce94e68a upstream.
In very busy v3 environment, rpc.mountd can respond to the NULL
procedure but not the MNT procedure in a timely manner causing
the MNT procedure to time out. The problem is the mount system
call returns EIO which causes the mount to fail, instead of
ETIMEDOUT, which would cause the mount to be retried.
This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
the rpc_call_sync() call in nfs_mount() which causes
ETIMEDOUT to be returned on timed out connections.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit badecb001a upstream.
The 'ssid' field of the cfg80211_ibss_params is a u8 pointer and
its length is likely to be less than IEEE80211_MAX_SSID_LEN most
of the time.
This patch fixes the ssid copy in ieee80211_ibss_join() by using
the SSID length to prevent it from reading beyond the string.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[rewrapped commit message, small rewording]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4a4f1a5808 upstream.
Due to pskb_may_pull() checking the skb length, all
non-management frames are checked on input whether
their 802.11 header is fully present. Also add that
check for management frames and remove a check that
is now duplicate. This prevents accessing skb data
beyond the frame end.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7fbf70ee9 upstream.
Per IEEE Std. 802.11-2012, Sec 8.2.4.4.1, the sequence Control field is
not present in control frames. We noticed this problem when processing
Block Ack Requests.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Javier Lopez <jlopex@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7dd111e8ee upstream.
The mesh header can have address extension by a 4th
or a 5th and 6th address, but never both. Drop such
frames in 802.11 -> 802.3 conversion along with any
frames that have the wrong extension.
Reviewed-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4a9fafc77 upstream.
No driver initializes chan->max_antenna_gain to something sensible, and
the only place where it is being used right now is inside ath9k. This
leads to ath9k potentially using less tx power than it can use, which can
decrease performance/range in some rare cases.
Rather than going through every single driver, this patch initializes
chan->orig_mag in wiphy_register(), ignoring whatever value the driver
left in there. If a driver for some reason wishes to limit it independent
from regulatory rulesets, it can do so internally.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d0f9dfb31 upstream.
If the call to core_dev_release_virtual_lun0() fails, then nothing
sets ret to anything other than 0, so even though everything is
torn down and freed, target_core_init_configfs() will seem to succeed
and the module will be loaded. Fix this by passing the return value
on up the chain.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf7e1abe43 upstream.
Some hardware has correct (!= 0xff) value of tssi_bounds[4] in the
EEPROM, but step is equal to 0xff. This results on ridiculous delta
calculations and completely broke TX power settings.
Reported-and-tested-by: Pavel Lucik <pavel.lucik@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8c6e30936a upstream.
bf->bf_next is only while buffers are chained as part of an A-MPDU
in the tx queue. When a tid queue is flushed (e.g. on tearing down
an aggregation session), frames can be enqueued again as normal
transmission, without bf_next being cleared. This can lead to the
old pointer being dereferenced again later.
This patch might fix crashes and "Failed to stop TX DMA!" messages.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef5d437f71 upstream.
On s390 any write to a page (even from kernel itself) sets architecture
specific page dirty bit. Thus when a page is written to via buffered
write, HW dirty bit gets set and when we later map and unmap the page,
page_remove_rmap() finds the dirty bit and calls set_page_dirty().
Dirtying of a page which shouldn't be dirty can cause all sorts of
problems to filesystems. The bug we observed in practice is that
buffers from the page get freed, so when the page gets later marked as
dirty and writeback writes it, XFS crashes due to an assertion
BUG_ON(!PagePrivate(page)) in page_buffers() called from
xfs_count_page_state().
Similar problem can also happen when zero_user_segment() call from
xfs_vm_writepage() (or block_write_full_page() for that matter) set the
hardware dirty bit during writeback, later buffers get freed, and then
page unmapped.
Fix the issue by ignoring s390 HW dirty bit for page cache pages of
mappings with mapping_cap_account_dirty(). This is safe because for
such mappings when a page gets marked as writeable in PTE it is also
marked dirty in do_wp_page() or do_page_fault(). When the dirty bit is
cleared by clear_page_dirty_for_io(), the page gets writeprotected in
page_mkclean(). So pagecache page is writeable if and only if it is
dirty.
Thanks to Hugh Dickins for pointing out mapping has to have
mapping_cap_account_dirty() for things to work and proposing a cleaned
up variant of the patch.
The patch has survived about two hours of running fsx-linux on tmpfs
while heavily swapping and several days of running on out build machines
where the original problem was triggered.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6365201d8 upstream.
The X86_32-only disable_hlt/enable_hlt mechanism was used by the
32-bit floppy driver. Its effect was to replace the use of the
HLT instruction inside default_idle() with cpu_relax() - essentially
it turned off the use of HLT.
This workaround was commented in the code as:
"disable hlt during certain critical i/o operations"
"This halt magic was a workaround for ancient floppy DMA
wreckage. It should be safe to remove."
H. Peter Anvin additionally adds:
"To the best of my knowledge, no-hlt only existed because of
flaky power distributions on 386/486 systems which were sold to
run DOS. Since DOS did no power management of any kind,
including HLT, the power draw was fairly uniform; when exposed
to the much hhigher noise levels you got when Linux used HLT
caused some of these systems to fail.
They were by far in the minority even back then."
Alan Cox further says:
"Also for the Cyrix 5510 which tended to go castors up if a HLT
occurred during a DMA cycle and on a few other boxes HLT during
DMA tended to go astray.
Do we care ? I doubt it. The 5510 was pretty obscure, the 5520
fixed it, the 5530 is probably the oldest still in any kind of
use."
So, let's finally drop this.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephen Hemminger <shemminger@vyatta.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org
[ If anyone cares then alternative instruction patching could be
used to replace HLT with a one-byte NOP instruction. Much simpler. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aaeb61a97b upstream.
`pc236_detach()` is called by the comedi core if it attempted to attach
a device and failed. `pc236_detach()` calls `pc236_intr_disable()` if
the comedi device private data pointer (`devpriv`) is non-null. This
test is insufficient as `pc236_intr_disable()` accesses hardware
registers and the attach routine may have failed before it has saved
their I/O base addresses.
Fix it by checking `dev->iobase` is non-zero before calling
`pc236_intr_disable()` as that means the I/O base addresses have been
saved and the hardware registers can be accessed. It also implies the
comedi device private data pointer is valid, so there is no need to
check it.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4045f72bcf upstream.
This patch fix corruption which can manifest itself by following crash
when switching on rfkill switch with rt2x00 driver:
https://bugzilla.redhat.com/attachment.cgi?id=615362
Pointer key->u.ccmp.tfm of group key get corrupted in:
ieee80211_rx_h_michael_mic_verify():
/* update IV in key information to be able to detect replays */
rx->key->u.tkip.rx[rx->security_idx].iv32 = rx->tkip_iv32;
rx->key->u.tkip.rx[rx->security_idx].iv16 = rx->tkip_iv16;
because rt2x00 always set RX_FLAG_MMIC_STRIPPED, even if key is not TKIP.
We already check type of the key in different path in
ieee80211_rx_h_michael_mic_verify() function, so adding additional
check here is reasonable.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7840487cd6 upstream.
The i2c core driver will turn the platform device ID to busnum
When using platfrom device ID as -1, it means dynamically assigned
the busnum. When writing code, we need to make sure the busnum,
and call i2c_register_board_info(int busnum, ...) to register device
if using -1, we do not know the value of busnum
In order to solve this issue, set the platform device ID as a fix number
Here using 0 to match the busnum used in i2c_regsiter_board_info()
Signed-off-by: Bo Shen <voice.shen@atmel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 910a578f7e upstream.
We copy head count to a 16 bit field, this works by chance on LE but on
BE guest gets 0. Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43a09f7fb0 upstream.
The command cancellation code doesn't check whether find_trb_seg()
couldn't find the segment that contains the TRB to be canceled. This
could cause a NULL pointer deference later in the function when next_trb
is called. It's unlikely to happen unless something is wrong with the
command ring pointers, so add some debugging in case it happens.
This patch should be backported to stable kernels as old as 3.0, that
contain the commit b63f4053cc "xHCI:
handle command after aborting the command ring".
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e681b66f2e upstream.
Remove private zombie flag used to signal disconnect and to prevent
control urb from being submitted from interrupt urb completion handler.
The control urb will not be re-submitted as both the control urb and the
interrupt urb is killed on disconnect.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28c3ae9a8c upstream.
The private int_urb is never allocated so the submission from the
control completion handler will always fail. Remove this odd piece of
broken code.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3eb55cc4ed upstream.
The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 084817d793 upstream.
Move interface data allocation to attach so that it is deallocated on
errors in usb-serial probe.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f7bc505166 upstream.
I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea0dbebffe upstream.
Make sure to allocate the control-message buffer dynamically as some
platforms cannot do DMA from stack.
Note that only the first byte of the old buffer was used.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c323dc023b upstream.
BIOS vendors keep changing the BIOS versions. Only match the beginning
of the string to match all Lucid tablets with board name M11JB.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66081a7251 upstream.
The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.
Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bc1e68ed6 upstream.
The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().
All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9d2bb2ee5 upstream.
This reverts commit 55420c24a0.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f878b657ce upstream.
Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.
Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.
Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f40b90972 upstream.
When booting a secondary CPU, the primary CPU hands two sets of page
tables via the secondary_data struct:
(1) swapper_pg_dir: a normal, cacheable, shared (if SMP) mapping
of the kernel image (i.e. the tables used by init_mm).
(2) idmap_pgd: an uncached mapping of the .idmap.text ELF
section.
The idmap is generally used when enabling and disabling the MMU, which
includes early CPU boot. In this case, the secondary CPU switches to
swapper as soon as it enters C code:
struct mm_struct *mm = &init_mm;
unsigned int cpu = smp_processor_id();
/*
* All kernel threads share the same mm context; grab a
* reference and switch to it.
*/
atomic_inc(&mm->mm_count);
current->active_mm = mm;
cpumask_set_cpu(cpu, mm_cpumask(mm));
cpu_switch_mm(mm->pgd, mm);
This causes a problem on ARMv7, where the identity mapping is treated as
strongly-ordered leading to architecturally UNPREDICTABLE behaviour of
exclusive accesses, such as those used by atomic_inc.
This patch re-orders the secondary_start_kernel function so that we
switch to swapper before performing any exclusive accesses.
Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: David McKay <david.mckay@st.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eedce141cd upstream.
The genalloc code uses the bitmap API from include/linux/bitmap.h and
lib/bitmap.c, which is based on long values. Both bitmap_set from
lib/bitmap.c and bitmap_set_ll, which is the lockless version from
genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
the bitmap.
That one uses (1 << bits) - 1, 0b111, if you are setting the first three
bits. This means that the API counts from the least significant bits
(LSB from now on) to the MSB. The LSB in the first long is bit 0, then.
The same works for the lookup functions.
The genalloc code uses longs for the bitmap, as it should. In
include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
bits[0] as its last member. When allocating the struct, genalloc should
reserve enough space for the bitmap. This should be a proper number of
longs that can fit the amount of bits in the bitmap.
However, genalloc allocates an integer number of bytes that fit the
amount of bits, but may not be an integer amount of longs. 9 bytes, for
example, could be allocated for 70 bits.
This is a problem in itself if the Least Significat Bit in a long is in
the byte with the largest address, which happens in Big Endian machines.
This means genalloc is not allocating the byte in which it will try to
set or check for a bit.
This may end up in memory corruption, where genalloc will try to set the
bits it has not allocated. In fact, genalloc may not set these bits
because it may find them already set, because they were not zeroed since
they were not allocated. And that's what causes a BUG when
gen_pool_destroy is called and check for any set bits.
What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab
will be cleared, not only the requested bytes. Since struct
gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
multiples of 8, we get lucky and allocate and clear the right amount of
bytes.
Hower, this is not the case with SLOB or with older code that did memset
after allocating instead of using __GFP_ZERO.
So, a simple module as this (running 3.6.0), will cause a crash when
rmmod'ed.
[root@phantom-lp2 foo]# cat foo.c
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/genalloc.h>
MODULE_LICENSE("GPL");
MODULE_VERSION("0.1");
static struct gen_pool *foo_pool;
static __init int foo_init(void)
{
int ret;
foo_pool = gen_pool_create(10, -1);
if (!foo_pool)
return -ENOMEM;
ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
if (ret) {
gen_pool_destroy(foo_pool);
return ret;
}
return 0;
}
static __exit void foo_exit(void)
{
gen_pool_destroy(foo_pool);
}
module_init(foo_init);
module_exit(foo_exit);
[root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
CONFIG_SLOB=y
[root@phantom-lp2 foo]# insmod ./foo.ko
[root@phantom-lp2 foo]# rmmod foo
------------[ cut here ]------------
kernel BUG at lib/genalloc.c:243!
cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
sp: c0000000bb0e7be0
msr: 8000000000029032
current = 0xc0000000bb0e0000
paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01
pid = 13044, comm = rmmod
kernel BUG at lib/genalloc.c:243!
[c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
[c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
[c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
--- Exception: c00 (System Call) at 000000800753d1a0
SP (fffd0b0e640) is in userspace
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20f1de659b upstream.
Fix possible overflow of the buffer used for expanding environment
variables when building file list.
In the extremely unlikely case of an attacker having control over the
environment variables visible to gen_init_cpio, control over the
contents of the file gen_init_cpio parses, and gen_init_cpio was built
without compiler hardening, the attacker can gain arbitrary execution
control via a stack buffer overflow.
$ cat usr/crash.list
file foo ${BIG}${BIG}${BIG}${BIG}${BIG}${BIG} 0755 0 0
$ BIG=$(perl -e 'print "A" x 4096;') ./usr/gen_init_cpio usr/crash.list
*** buffer overflow detected ***: ./usr/gen_init_cpio terminated
This also replaces the space-indenting with tabs.
Patch based on existing fix extracted from grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b63f4053cc upstream.
According to xHCI spec section 4.6.1.1 and section 4.6.1.2,
after aborting a command on the command ring, xHC will
generate a command completion event with its completion
code set to Command Ring Stopped at least. If a command is
currently executing at the time of aborting a command, xHC
also generate a command completion event with its completion
code set to Command Abort. When the command ring is stopped,
software may remove, add, or rearrage Command Descriptors.
To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci. When the command ring is stopped,
software will find the command trbs described by command
descriptors in cancel_cmd_list and modify it to No Op
command. If software can't find the matched trbs, we can
think it had been finished.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Note from Sarah: The TRB_TYPE_LINK_LE32 macro is not in the 3.0 stable
kernel, so I added it to this patch.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6e4468b9a0 upstream.
The patch is used to cancel command when the command isn't
acknowledged and a timeout occurs.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b92cc66c04 upstream.
Software have to abort command ring and cancel command
when a command is failed or hang. Otherwise, the command
ring will hang up and can't handle the others. An example
of a command that may hang is the Address Device Command,
because waiting for a SET_ADDRESS request to be acknowledged
by a USB device is outside of the xHC's ability to control.
To cancel a command, software will initialize a command
descriptor for the cancel command, and add it into a
cancel_cmd_list of xhci.
Sarah: Fixed missing newline on "Have the command ring been stopped?"
debugging statement.
This patch should be backported to kernels as old as 3.0, that contain
the commit 7ed603ecf8 "xhci: Add an
assertion to check for virt_dev=0 bug." That commit papers over a NULL
pointer dereference, and this patch fixes the underlying issue that
caused the NULL pointer dereference.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c181bc5b5d upstream.
Adding cmd_ring_state for command ring. It helps to verify
the current command ring state for controlling the command
ring operations.
This patch should be backported to kernels as old as 3.0. The commit
7ed603ecf8 "xhci: Add an assertion to
check for virt_dev=0 bug." papers over the NULL pointer dereference that
I now believe is related to a timed out Set Address command. This (and
the four patches that follow it) contain the real fix that also allows
VIA USB 3.0 hubs to consistently re-enumerate during the plug/unplug
stress tests.
Signed-off-by: Elric Fu <elricfu1@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: Miroslav Sabljic <miroslav.sabljic@avl.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2856cc2e4d ]
On a 2-node machine with 256GB of ram we get 512 lines of
console output, which is just too much.
This mimicks Yinghai Lu's x86 commit c2b91e2eec
(x86_64/mm: check and print vmemmap allocation continuous) except that
we aren't ever going to get contiguous block pointers in between calls
so just print when the virtual address or node changes.
This decreases the output by an order of 16.
Also demote this to KERN_DEBUG.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a27032eee8 ]
There are multiple errors in how sys_sparc64_personality() handles
personality flags stored in top three bytes.
- directly comparing current->personality against PER_LINUX32 doesn't work
in cases when any of the personality flags stored in the top three bytes
are used.
- directly forcefully setting personality to PER_LINUX32 or PER_LINUX
discards any flags stored in the top three bytes
Fix the first one by properly using personality() macro to compare only
PER_MASK bytes.
Fix the second one by setting only the bits that should be set, instead of
overwriting the whole value.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e793d8c674 ]
There was a serious disconnect in the logic happening in
sparc_pmu_disable_event() vs. sparc_pmu_enable_event().
Event disable is implemented by programming a NOP event into the PCR.
However, event enable was not reversing this operation. Instead, it
was setting the User/Priv/Hypervisor trace enable bits.
That's not sparc_pmu_enable_event()'s job, that's what
sparc_pmu_enable() and sparc_pmu_disable() do .
The intent of sparc_pmu_enable_event() is clear, since it first clear
out the event type encoding field. So fix this by OR'ing in the event
encoding rather than the trace enable bits.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 08280e6c4c ]
If the MM is not active, only report the top-level PC. Do not try to
access the address space.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 55c2770e41 ]
we want syscall_trace_leave() called on exit from any syscall;
skipping its call in case we'd done force_successful_syscall_return()
is broken...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c67525849 ]
After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
sock case").. tcp resets are always lost, when routing is asymmetric.
Yes, backing out that patch will result in misrouting of resets for
dead connections which used interface binding when were alive, but we
actually cannot do anything here. What's died that's died and correct
handling normal unbound connections is obviously a priority.
Comment to comment:
> This has few benefits:
> 1. tcp_v6_send_reset already did that.
It was done to route resets for IPv6 link local addresses. It was a
mistake to do so for global addresses. The patch fixes this as well.
Actually, the problem appears to be even more serious than guaranteed
loss of resets. As reported by Sergey Soloviev <sol@eqv.ru>, those
misrouted resets create a lot of arp traffic and huge amount of
unresolved arp entires putting down to knees NAT firewalls which use
asymmetric routing.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5175a5e76b ]
This is the revised patch for fixing rds-ping spinlock recursion
according to Venkat's suggestions.
RDS ping/pong over TCP feature has been broken for years(2.6.39 to
3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
ping/pong which both need to lock "struct sock *sk". However, this
lock has already been hold before rds_tcp_data_ready() callback is
triggerred. As a result, we always facing spinlock resursion which
would resulting in system panic.
Given that RDS ping is only used to test the connectivity and not for
serious performance measurements, we can queue the pong transmit to
rds_wq as a delayed response.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
CC: David S. Miller <davem@davemloft.net>
CC: James Morris <james.l.morris@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bcf603f6d upstream.
On CougarPoint and PantherPoint PCH chips, the timing generator may fail
to start after DP training completes. This is due to a bug in the
FDI autotraining detect logic (which will stall the timing generator and
re-enable it once training completes), so disable it to avoid silent DP
mode setting failures.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Timo Aaltonen <timo.aaltonen@canonical.com>
commit a595c1ce4c upstream.
We weren't checking whether the resource was in use before calling
res_free(), so applications which called STREAMOFF on a v4l2 device that
wasn't already streaming would cause a BUG() to be hit (MythTV).
Reported-by: Larry Finger <larry.finger@lwfinger.net>
Reported-by: Jay Harbeston <jharbestonus@gmail.com>
Signed-off-by: Devin Heitmueller <dheitmueller@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
commit 168bfeef7b upstream.
If none of the elements in scrubrates[] matches, this loop will cause
__amd64_set_scrub_rate() to incorrectly use the n+1th element.
As the function is designed to use the final scrubrates[] element in the
case of no match, we can fix this bug by simply terminating the array
search at the n-1th element.
Boris: this code is fragile anyway, see here why:
http://marc.info/?l=linux-kernel&m=135102834131236&w=2
It will be rewritten more robustly soonish.
Reported-by: Denis Kirjanov <kirjanov@gmail.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f5320d597 upstream.
notify_on_release must be triggered when the last process in a cgroup is
move to another. But if the first(and only) process in a cgroup is moved to
another, notify_on_release is not triggered.
# mkdir /cgroup/cpu/SRC
# mkdir /cgroup/cpu/DST
#
# echo 1 >/cgroup/cpu/SRC/notify_on_release
# echo 1 >/cgroup/cpu/DST/notify_on_release
#
# sleep 300 &
[1] 8629
#
# echo 8629 >/cgroup/cpu/SRC/tasks
# echo 8629 >/cgroup/cpu/DST/tasks
-> notify_on_release for /SRC must be triggered at this point,
but it isn't.
This is because put_css_set() is called before setting CGRP_RELEASABLE
in cgroup_task_migrate(), and is a regression introduce by the
commit:74a1166d(cgroups: make procs file writable), which was merged
into v3.0.
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Ben Blum <bblum@andrew.cmu.edu>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 301a29da6e upstream.
The current code assumes that CSIZE is 0000060, which appears to be
wrong on some arches (such as powerpc).
Signed-off-by: Nicolas Boullis <nboullis@debian.org>
Acked-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5211187f7 upstream.
If the write endpoint is interrupt type, usb_sndintpipe() should
be passed to usb_fill_int_urb() instead of usb_sndbulkpipe().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Cc: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a349e23d1c upstream.
In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
(-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
/and/ the process has a pending signal then %eip (and %eax) are
corrupted when returning to the main process after handling the
signal. The application may then crash with SIGSEGV or a SIGILL or it
may have subtly incorrect behaviour (depending on what instruction it
returned to).
The occurs because handle_signal() is incorrectly thinking that there
is a system call that needs to restarted so it adjusts %eip and %eax
to re-execute the system call instruction (even though user space had
not done a system call).
If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
(-516) then handle_signal() only corrupted %eax (by setting it to
-EINTR). This may cause the application to crash or have incorrect
behaviour.
handle_signal() assumes that regs->orig_ax >= 0 means a system call so
any kernel entry point that is not for a system call must push a
negative value for orig_ax. For example, for physical interrupts on
bare metal the inverse of the vector is pushed and page_fault() sets
regs->orig_ax to -1, overwriting the hardware provided error code.
xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
instead of -1.
Classic Xen kernels pushed %eax which works as %eax cannot be both
non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
other non-system call entry points and avoids some of the tests in
handle_signal().
There were similar bugs in xen_failsafe_callback() of both 32 and
64-bit guests. If the fault was corrected and the normal return path
was used then 0 was incorrectly pushed as the value for orig_ax.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1bbbbe779a upstream.
On systems with very large memory (1 TB in our case), BIOS may report a
reserved region or a hole in the E820 map, even above the 4 GB range. Exclude
these from the direct mapping.
[ hpa: this should be done not just for > 4 GB but for everything above the legacy
region (1 MB), at the very least. That, however, turns out to require significant
restructuring. That work is well underway, but is not suitable for rc/stable. ]
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31fd84b95e upstream.
The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:
kernel/sys.c: In function 'override_release':
kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fdc858a466 upstream.
The sharpsl_pcmcia_ops structure gets passed into
sa11xx_drv_pcmcia_probe, where it gets accessed at run-time,
unlike all other pcmcia drivers that pass their structures
into platform_device_add_data, which makes a copy.
This means the gcc warning is valid and the structure
must not be marked as __initdata.
Without this patch, building collie_defconfig results in:
drivers/pcmcia/pxa2xx_sharpsl.c:22:31: fatal error: mach-pxa/hardware.h: No such file or directory
compilation terminated.
make[3]: *** [drivers/pcmcia/pxa2xx_sharpsl.o] Error 1
make[2]: *** [drivers/pcmcia] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pavel Machek <pavel@suse.cz>
Cc: linux-pcmcia@lists.infradead.org
Cc: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts 12d63702c5 which was commit
303a7ce920 upstream.
Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
commit cd0b16c1c3 upstream.
If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.
Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reported-by: Larry McVoy <lm@bitmover.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 627072b06c upstream.
The tile tool chain uses the .eh_frame information for backtracing.
The vmlinux build drops any .eh_frame sections at link time, but when
present in kernel modules, it causes a module load failure due to the
presence of unsupported pc-relative relocations. When compiling to
use compiler feedback support, the compiler by default omits .eh_frame
information, so we don't see this problem. But when not using feedback,
we need to explicitly suppress the .eh_frame.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 2101aa5bb0 which is
commit 60ea8226cb upstream.
To quote Ben:
This is not needed, as there is no QUEUE_FLAG_BYPASS in 3.0.y.
To quote Tejun:
I don't think it will break anything as it simply changes
assignment to |= to avoid overwriting existing flags. That
said, any patch can break anything, so if possible it would be
better to drop for 3.0.y.
So I'll revert this to be safe.
Cc: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10f571d091 upstream.
Add chip details for E-mu 1010 PCIe card. It has the same
chip as found in E-mu 1010b but it uses different PCI id.
Signed-off-by: Maxim Kachur <mcdebugger@duganet.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 68766a2edc upstream.
In case we detect a problem and bail out, we fail to set "ret" to a
nonzero value, and udf_load_logicalvol will mistakenly report success.
Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abce9ac292 upstream.
tpm_write calls tpm_transmit without checking the return value and
assigns the return value unconditionally to chip->pending_data, even if
it's an error value.
This causes three bugs.
So if we write to /dev/tpm0 with a tpm_param_size bigger than
TPM_BUFSIZE=0x1000 (e.g. 0x100a)
and a bufsize also bigger than TPM_BUFSIZE (e.g. 0x100a)
tpm_transmit returns -E2BIG which is assigned to chip->pending_data as
-7, but tpm_write returns that TPM_BUFSIZE bytes have been successfully
been written to the TPM, altough this is not true (bug #1).
As we did write more than than TPM_BUFSIZE bytes but tpm_write reports
that only TPM_BUFSIZE bytes have been written the vfs tries to write
the remaining bytes (in this case 10 bytes) to the tpm device driver via
tpm_write which then blocks at
/* cannot perform a write until the read has cleared
either via tpm_read or a user_read_timer timeout */
while (atomic_read(&chip->data_pending) != 0)
msleep(TPM_TIMEOUT);
for 60 seconds, since data_pending is -7 and nobody is able to
read it (since tpm_read luckily checks if data_pending is greater than
0) (#bug 2).
After that the remaining bytes are written to the TPM which are
interpreted by the tpm as a normal command. (bug #3)
So if the last bytes of the command stream happen to be a e.g.
tpm_force_clear this gets accidentally sent to the TPM.
This patch fixes all three bugs, by propagating the error code of
tpm_write and returning -E2BIG if the input buffer is too big,
since the response from the tpm for a truncated value is bogus anyway.
Moreover it returns -EBUSY to userspace if there is a response ready to be
read.
Signed-off-by: Peter Huewe <peter.huewe@infineon.com>
Signed-off-by: Kent Yoder <key@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49d859d78c upstream.
If the CPU declares that RDRAND is available, go through a guranteed
reseed sequence, and make sure that it is actually working (producing
data.) If it does not, disable the CPU feature flag.
Allow RDRAND to be disabled on the command line (as opposed to at
compile time) for a user who has special requirements with regards to
random numbers.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628c6246d4 upstream.
Architectural inlines to get random ints and longs using the RDRAND
instruction.
Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware. This enables RDRAND to be used directly, bypassing the
kernel random number pool.
For technical documentation, see:
http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/
In this patch, this is *only* used for the nonblocking random number
pool. RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random. The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.
Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.
The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Originally-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 09e05d4805 upstream.
ext3 users of data=journal mode with blocksize < pagesize were occasionally
hitting assertion failure in journal_commit_transaction() checking whether the
transaction has at least as many credits reserved as buffers attached. The
core of the problem is that when a file gets truncated, buffers that still need
checkpointing or that are attached to the committing transaction are left with
buffer_mapped set. When this happens to buffers beyond i_size attached to a
page stradding i_size, subsequent write extending the file will see these
buffers and as they are mapped (but underlying blocks were freed) things go
awry from here.
The assertion failure just coincidentally (and in this case luckily as we would
start corrupting filesystem) triggers due to journal_head not being properly
cleaned up as well.
Under some rare circumstances this bug could even hit data=ordered mode users.
There the assertion won't trigger and we would end up corrupting the
filesystem.
We fix the problem by unmapping buffers if possible (in lots of cases we just
need a buffer attached to a transaction as a place holder but it must not be
written out anyway). And in one case, we just have to bite the bullet and wait
for transaction commit to finish.
Reviewed-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0829184711 upstream.
radeon_i2c_fini() walks thru the list of I2C bus recs rdev->i2c_bus[]
to destroy each of them.
radeon_ext_tmds_enc_destroy() however also has code to destroy it's
associated I2C bus rec which has been obtained by radeon_i2c_lookup()
and is therefore also in the i2c_bus[] list.
This causes a double free resulting in a kernel panic when unloading
the radeon driver.
Removing destroy code from radeon_ext_tmds_enc_destroy() fixes this
problem.
agd5f: fix compiler warning
Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82e6bfe2fb upstream.
Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when
a running state is saved to userspace and then reinstated from there.
Make sure that private xt_limit area is initialized with correct values.
Otherwise, random matchings due to use of uninitialized memory.
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2614f86490 upstream.
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:
- the passed expectation is not inserted in the expectation table
and its timer was not initialized, since we have refreshed one
matching/existing expectation.
- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
timer is in some undefined state just after the allocation,
until it is appropriately initialized.
This can be a problem for the SIP helper during the expectation
addition:
...
if (nf_ct_expect_related(rtp_exp) == 0) {
if (nf_ct_expect_related(rtcp_exp) != 0)
nf_ct_unexpect_related(rtp_exp);
...
Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:
spin_lock_bh(&nf_conntrack_lock);
if (del_timer(&exp->timeout)) {
nf_ct_unlink_expect(exp);
nf_ct_expect_put(exp);
}
spin_unlock_bh(&nf_conntrack_lock);
Note that del_timer always returns false if the timer has been
initialized. However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.
To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.
Thanks to Patrick McHardy for participating in the discussion of
this patch.
I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f22eb25cf5 upstream.
Via-headers are parsed beginning at the first character after the Via-address.
When the address is translated first and its length decreases, the offset to
start parsing at is incorrect and header parameters might be missed.
Update the offset after translating the Via-address to fix this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f509c689a upstream.
We're hitting bug while trying to reinsert an already existing
expectation:
kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
[<ffffffff812d423a>] ? in4_pton+0x72/0x131
[<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
[<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
[<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
[<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
[<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]
We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.
Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 07153c6ec0 upstream.
It was reported that the Linux kernel sometimes logs:
klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392
ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.
The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.
The patch closes netfilter bugzilla id 771.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b423f6a40 upstream.
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.
This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.
Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 283283c4da upstream.
After commit 39f618b4fd (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5aa8b57200 upstream.
For IPv6, sizeof(struct ipv6hdr) = 40, thus the following
expression will result negative:
datalen = pkt_dev->cur_pkt_size - 14 -
sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
pkt_dev->pkt_overhead;
And, the check "if (datalen < sizeof(struct pktgen_hdr))" will be
passed as "datalen" is promoted to unsigned, therefore will cause
a crash later.
This is a quick fix by checking if "datalen" is negative. The following
patch will increase the default value of 'min_pkt_size' for IPv6.
This bug should exist for a long time, so Cc -stable too.
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26cff4e2aa upstream.
Adding two (or more) timers with large values for "expires" (they have
to reside within tv5 in the same list) leads to endless looping
between cascade() and internal_add_timer() in case CONFIG_BASE_SMALL
is one and jiffies are crossing the value 1 << 18. The bug was
introduced between 2.6.11 and 2.6.12 (and survived for quite some
time).
This patch ensures that when cascade() is called timers within tv5 are
not added endlessly to their own list again, instead they are added to
the next lower tv level tv4 (as expected).
Signed-off-by: Christian Hildner <christian.hildner@siemens.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Link: http://lkml.kernel.org/r/98673C87CB31274881CFFE0B65ECC87B0F5FC1963E@DEFTHW99EA4MSX.ww902.siemens.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 012a121184 upstream.
As detailed in the thread titled "viafb PLL/clock tweaking causes XO-1.5
instability," enabling or disabling the IGA1/IGA2 clocks causes occasional
stability problems during suspend/resume cycles on this platform.
This is rather odd, as the documentation suggests that clocks have two
states (on/off) and the default (stable) configuration is configured to
enable the clock only when it is needed. However, explicitly enabling *or*
disabling the clock triggers this system instability, suggesting that there
is a 3rd state at play here.
Leaving the clock enable/disable registers alone solves this problem.
This fixes spurious reboots during suspend/resume behaviour introduced by
commit b692a63a.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8c4321f3d upstream.
Line 0 and 1 were both written to line 0 (on the display) and all subsequent
lines had an offset of -1. The result was that the last line on the display
was never overwritten by writes to /dev/fbN.
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Acked-by: Bernie Thompson <bernie@plugable.com>
Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c99af3752b upstream.
Cloudlinux have a product called lve that includes a kernel module. This
was previously GPLed but is now under a proprietary license, but the
module continues to declare MODULE_LICENSE("GPL") and makes use of some
EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.
Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Alex Lyashkov <umka@cloudlinux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 49999ab27e upstream.
In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING
mount pending flag is not cleared.
One effect of this is when using the "browse" option, directory entry
attributes show up with all "?"s due to the incorrect callback and
subsequent failure return (when in fact no callback should be made).
Signed-off-by: Ian Kent <ikent@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60ea8226cb upstream.
A queue newly allocated with blk_alloc_queue_node() has only
QUEUE_FLAG_BYPASS set. For request-based drivers,
blk_init_allocated_queue() is called and q->queue_flags is overwritten
with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
initial bypass is still in effect.
In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
instead of overwriting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd0608e71e upstream.
The hypervisor will trap it. However without this patch,
we would crash as the .read_tscp is set to NULL. This patch
fixes it and sets it to the native_read_tscp call.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a7bbda5b1 upstream.
We actually do not do anything about it. Just return a default
value of zero and if the kernel tries to write anything but 0
we BUG_ON.
This fixes the case when an user tries to suspend the machine
and it blows up in save_processor_state b/c 'read_cr8' is set
to NULL and we get:
kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
invalid opcode: 0000 [#1] SMP
Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270
.. snip..
Call Trace:
[<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
[<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
[<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a519fc7a70 upstream.
Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 790198f74c upstream.
Fix two bugs of the /dev/fw* character device concerning the
FW_CDEV_IOC_GET_INFO ioctl with nonzero fw_cdev_get_info.bus_reset.
(Practically all /dev/fw* clients issue this ioctl right after opening
the device.)
Both bugs are caused by sizeof(struct fw_cdev_event_bus_reset) being 36
without natural alignment and 40 with natural alignment.
1) Memory corruption, affecting i386 userland on amd64 kernel:
Userland reserves a 36 bytes large buffer, kernel writes 40 bytes.
This has been first found and reported against libraw1394 if
compiled with gcc 4.7 which happens to order libraw1394's stack such
that the bug became visible as data corruption.
2) Information leak, affecting all kernel architectures except i386:
4 bytes of random kernel stack data were leaked to userspace.
Hence limit the respective copy_to_user() to the 32-bit aligned size of
struct fw_cdev_event_bus_reset.
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7253b85cc6 upstream.
arm: Add ARM ERRATA 775420 workaround
Workaround for the 775420 Cortex-A9 (r2p2, r2p6,r2p8,r2p10,r3p0) erratum.
In case a date cache maintenance operation aborts with MMU exception, it
might cause the processor to deadlock. This workaround puts DSB before
executing ISB if an abort may occur on cache maintenance.
Based on work by Kouei Abe and feedback from Catalin Marinas.
Signed-off-by: Kouei Abe <kouei.abe.cp@rms.renesas.com>
[ horms@verge.net.au: Changed to implementation
suggested by catalin.marinas@arm.com ]
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 35c2a7f490 upstream.
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():
BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
[<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
[<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
[<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.
But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0a996eeed upstream.
This fault was detected using the kgdb test suite on boot and it
crashes recursively due to the fact that CONFIG_KPROBES on mips adds
an extra die notifier in the page fault handler. The crash signature
looks like this:
kgdbts:RUN bad memory access test
KGDB: re-enter exception: ALL breakpoints killed
Call Trace:
[<807b7548>] dump_stack+0x20/0x54
[<807b7548>] dump_stack+0x20/0x54
The fix for now is to have kgdb return immediately if the fault type
is DIE_PAGE_FAULT and allow the kprobe code to decide what is supposed
to happen.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a520d52e99 upstream.
The Linux EC driver includes a mechanism to detect GPE storms,
and switch from interrupt-mode to polling mode. However, polling
mode sometimes doesn't work, so the workaround is problematic.
Also, different systems seem to need the threshold for detecting
the GPE storm at different levels.
ACPI_EC_STORM_THRESHOLD was initially 20 when it's created, and
was changed to 8 in 2.6.28 commit 06cf7d3c7 "ACPI: EC: lower interrupt storm
threshold" to fix kernel bug 11892 by forcing the laptop in that bug to
work in polling mode. However in bug 45151, it works fine in interrupt
mode if we lift the threshold back to 20.
This patch makes the threshold a module parameter so that user has a
flexible option to debug/workaround this issue.
The default is unchanged.
This is also a preparation patch to fix specific systems:
https://bugzilla.kernel.org/show_bug.cgi?id=45151
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 303a7ce920 upstream.
Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 846a136881 upstream.
Michael Olbrich reported that his test program fails when built with
-O2 -mcpu=cortex-a8 -mfpu=neon, and a kernel which supports v6 and v7
CPUs:
volatile int x = 2;
volatile int64_t y = 2;
int main() {
volatile int a = 0;
volatile int64_t b = 0;
while (1) {
a = (a + x) % (1 << 30);
b = (b + y) % (1 << 30);
assert(a == b);
}
}
and two instances are run. When built for just v7 CPUs, this program
works fine. It uses the "vadd.i64 d19, d18, d16" VFP instruction.
It appears that we do not save the high-16 double VFP registers across
context switches when the kernel is built for v6 CPUs. Fix that.
Tested-By: Michael Olbrich <m.olbrich@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d3d688da8 upstream.
Unloading the omap2 nand driver missed to release the memory region which will
result in not being able to request it again if one want to load the driver
later on.
This patch fixes following error when loading omap2 module after unloading:
---8<---
~ $ rmmod omap2
~ $ modprobe omap2
[ 37.420928] omap2-nand: probe of omap2-nand.0 failed with error -16
~ $
--->8---
This error was introduced in 67ce04bf27 which
was the first commit of this driver.
Signed-off-by: Andreas Bießmann <andreas@biessmann.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bb0a13a134 upstream.
If override size is too big, the module was actually loaded instead of
failing, because retval was not set.
This lead to memory corruption with the use of the freed structs nandsim
and nand_chip.
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1f55c680e upstream.
Update driver autcpu12-nvram.c so it compiles; map_read32/map_write32
no longer exist in the kernel so the driver is totally broken.
Additionally, map_info name passed to simple_map_init is incorrect.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d35be8bab9 upstream.
In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.
However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.
In order to achieve that, do the following:
1. Don't modify cpusets during suspend/resume. At all.
In particular, don't move the tasks from one cpuset to another, and
don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
during the CPU hotplug operations that are carried out in the
suspend/resume path.
2. However, cpusets and sched domains are related. We just want to avoid
altering cpusets alone. So, to keep the sched domains updated, build
a single sched domain (containing all active cpus) during each of the
CPU hotplug operations carried out in s/r path, effectively ignoring
the cpusets' cpus_allowed masks.
(Since userspace is frozen while doing all this, it will go unnoticed.)
3. During the last CPU online operation during resume, build the sched
domains by looking up the (unaltered) cpusets' cpus_allowed masks.
That will bring back the system to the same original state as it was in
before suspend.
Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 00442ad04a upstream.
Commit cc9a6c8776 ("cpuset: mm: reduce large amounts of memory barrier
related damage v3") introduced a potential memory corruption.
shmem_alloc_page() uses a pseudo vma and it has one significant unique
combination, vma->vm_ops=NULL and vma->policy->flags & MPOL_F_SHARED.
get_vma_policy() does NOT increase a policy ref when vma->vm_ops=NULL
and mpol_cond_put() DOES decrease a policy ref when a policy has
MPOL_F_SHARED. Therefore, when a cpuset update race occurs,
alloc_pages_vma() falls in 'goto retry_cpuset' path, decrements the
reference count and frees the policy prematurely.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63f74ca21f upstream.
When shared_policy_replace() fails to allocate new->policy is not freed
correctly by mpol_set_shared_policy(). The problem is that shared
mempolicy code directly call kmem_cache_free() in multiple places where
it is easy to make a mistake.
This patch creates an sp_free wrapper function and uses it. The bug was
introduced pre-git age (IOW, before 2.6.12-rc2).
[mgorman@suse.de: Editted changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b22d127a39 upstream.
shared_policy_replace() use of sp_alloc() is unsafe. 1) sp_node cannot
be dereferenced if sp->lock is not held and 2) another thread can modify
sp_node between spin_unlock for allocating a new sp node and next
spin_lock. The bug was introduced before 2.6.12-rc2.
Kosaki's original patch for this problem was to allocate an sp node and
policy within shared_policy_replace and initialise it when the lock is
reacquired. I was not keen on this approach because it partially
duplicates sp_alloc(). As the paths were sp->lock is taken are not that
performance critical this patch converts sp->lock to sp->mutex so it can
sleep when calling sp_alloc().
[kosaki.motohiro@jp.fujitsu.com: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 869833f2c5 upstream.
Dave Jones' system call fuzz testing tool "trinity" triggered the
following bug error with slab debugging enabled
=============================================================================
BUG numa_policy (Not tainted): Poison overwritten
-----------------------------------------------------------------------------
INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
__slab_alloc+0x3d3/0x445
kmem_cache_alloc+0x29d/0x2b0
mpol_new+0xa3/0x140
sys_mbind+0x142/0x620
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
__slab_free+0x2e/0x1de
kmem_cache_free+0x25a/0x260
__mpol_put+0x27/0x30
remove_vma+0x68/0x90
exit_mmap+0x118/0x140
mmput+0x73/0x110
exit_mm+0x108/0x130
do_exit+0x162/0xb90
do_group_exit+0x4f/0xc0
sys_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x (null) flags=0x20000000004080
INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
The problem is that the structure is being prematurely freed due to a
reference count imbalance. In the following case mbind(addr, len) should
replace the memory policies of both vma1 and vma2 and thus they will
become to share the same mempolicy and the new mempolicy will have the
MPOL_F_SHARED flag.
+-------------------+-------------------+
| vma1 | vma2(shmem) |
+-------------------+-------------------+
| |
addr addr+len
alloc_pages_vma() uses get_vma_policy() and mpol_cond_put() pair for
maintaining the mempolicy reference count. The current rule is that
get_vma_policy() only increments refcount for shmem VMA and
mpol_conf_put() only decrements refcount if the policy has
MPOL_F_SHARED.
In above case, vma1 is not shmem vma and vma->policy has MPOL_F_SHARED!
The reference count will be decreased even though was not increased
whenever alloc_page_vma() is called. This has been broken since commit
[52cd3b07: mempolicy: rework mempolicy Reference Counting] in 2008.
There is another serious bug with the sharing of memory policies.
Currently, mempolicy rebind logic (it is called from cpuset rebinding)
ignores a refcount of mempolicy and override it forcibly. Thus, any
mempolicy sharing may cause mempolicy corruption. The bug was
introduced by commit [68860ec1: cpusets: automatic numa mempolicy
rebinding].
Ideally, the shared policy handling would be rewritten to either
properly handle COW of the policy structures or at least reference count
MPOL_F_SHARED based exclusively on information within the policy.
However, this patch takes the easier approach of disabling any policy
sharing between VMAs. Each new range allocated with sp_alloc will
allocate a new policy, set the reference count to 1 and drop the
reference count of the old policy. This increases the memory footprint
but is not expected to be a major problem as mbind() is unlikely to be
used for fine-grained ranges. It is also inefficient because it means
we allocate a new policy even in cases where mbind_range() could use the
new_policy passed to it. However, it is more straight-forward and the
change should be invisible to the user.
[mgorman@suse.de: Edited changelog]
Reported-by: Dave Jones <davej@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d34694c1a upstream.
Commit 05f144a0d5 ("mm: mempolicy: Let vma_merge and vma_split handle
vma->vm_policy linkages") removed vma->vm_policy updates code but it is
the purpose of mbind_range(). Now, mbind_range() is virtually a no-op
and while it does not allow memory corruption it is not the right fix.
This patch is a revert.
[mgorman@suse.de: Edited changelog]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad1be8d345 upstream.
When register_netdev fails, the init'ed NAPIs by netif_napi_add must be
deleted with netif_napi_del, and also when driver unloads, it should
delete the NAPI before unregistering netdevice using unregister_netdev.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 477206a018 upstream.
The r8169 may get stuck or show bad behaviour after activating TSO :
the net_device is not stopped when it has no more TX descriptors.
This problem comes from TX_BUFS_AVAIL which may reach -1 when all
transmit descriptors are in use. The patch simply tries to keep positive
values.
Tested with 8111d(onboard) on a D510MO, and with 8111e(onboard) on a
Zotac 890GXITX.
Signed-off-by: Julien Ducourthial <jducourt@free.fr>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a15cd2ff4 upstream.
With runtime PM, if the ethernet cable is disconnected, the device is
transitioned to D3 state to conserve energy. If the system is shutdown
in this state, any register accesses in rtl_shutdown are dropped on
the floor. As the device was programmed by .runtime_suspend() to wake
on link changes, it is thus brought back up as soon as the link recovers.
Resuming every suspended device through the driver core would slow things
down and it is not clear how many devices really need it now.
Original report and D0 transition patch by Sameer Nanda. Patch has been
changed to comply with advices by Rafael J. Wysocki and the PM folks.
Reported-by: Sameer Nanda <snanda@chromium.org>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 811fd3010c upstream.
Realtek has specified that the post 8168c gigabit chips and the post
8105e fast ethernet chips recover automatically from a Rx FIFO overflow.
The driver does not need to clear the RxFIFOOver bit of IntrStatus and
it should rather avoid messing it.
The implementation deserves some explanation:
1. events outside of the intr_event bit mask are now ignored. It enforces
a no-processing policy for the events that either should not be there
or should be ignored.
2. RxFIFOOver was already ignored in rtl_cfg_infos[RTL_CFG_1] for the
whole 8168 line of chips with two exceptions:
- RTL_GIGA_MAC_VER_22 since b5ba6d12bd
("use RxFIFO overflow workaround for 8168c chipset.").
This one should now be correctly handled.
- RTL_GIGA_MAC_VER_11 (8168b) which requires a different Rx FIFO
overflow processing.
Though it does not conform to Realtek suggestion above, the updated
driver includes no change for RTL_GIGA_MAC_VER_12 and RTL_GIGA_MAC_VER_17.
Both are 8168b. RTL_GIGA_MAC_VER_12 is common and a bit old so I'd rather
wait for experimental evidence that the change suggested by Realtek really
helps or does not hurt in unexpected ways.
Removed case statements in rtl8169_interrupt are only 8168 relevant.
3. RxFIFOOver is masked for post 8105e 810x chips, namely the sole 8105e
(RTL_GIGA_MAC_VER_30) itself.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: hayeswang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10953db8e1 upstream
The link down would occur when reseting PHY. And it would take about 2 ~ 5
seconds from link down to link up. If the delay of pm_schedule_suspend is
not long enough, the device would enter runtime_suspend before link up.
After link up, the device would wake up and reset PHY again. Then, you
would find the driver keep in a loop of runtime_suspend and rumtime_resume.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit deb9d93c89 upstream.
8168d and above allow jumbo frames beyond 8k. Bump the received
packet length check before enabling jumbo frames on these chipsets.
Frame length indication covers bits 0..13 of the first Rx descriptor
32 bits for the 8169 and 8168. I only have authoritative documentation
for the allowed use of the extra (13) bit with the 8169 and 8168c.
Realtek's drivers use the same mask for the 816x and the fast ethernet
only 810x.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d58d46b5d8 upstream.
- fix features : jumbo frames and checksumming can not be used at the
same time.
- introduce hw_jumbo_{enable / disable} helpers. Their content has been
creatively extracted from Realtek's own drivers. As an illustration,
it would be nice to know how/if the MaxTxPacketSize register operates
when the device can work with a 9k jumbo frame as its documentation
(8168c) can not be applied beyond ~7k.
- rtl_tx_performance_tweak is moved forward. No change.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a10d206ef1 upstream.
Each grace period is supposed to have at least one callback waiting
for that grace period to complete. However, if CONFIG_NO_HZ=n, an
extra callback-free grace period is no big problem -- it will chew up
a tiny bit of CPU time, but it will complete normally. In contrast,
CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
sleep indefinitely, in turn indefinitely delaying completion of the
callback-free grace period. Given that nothing is waiting on this grace
period, this is also not a problem.
That is, unless RCU CPU stall warnings are also enabled, as they are
in recent kernels. In this case, if a CPU wakes up after at least one
minute of inactivity, an RCU CPU stall warning will result. The reason
that no one noticed until quite recently is that most systems have enough
OS noise that they will never remain absolutely idle for a full minute.
But there are some embedded systems with cut-down userspace configurations
that consistently get into this situation.
All this begs the question of exactly how a callback-free grace period
gets started in the first place. This can happen due to the fact that
CPUs do not necessarily agree on which grace period is in progress.
If a CPU still believes that the grace period that just completed is
still ongoing, it will believe that it has callbacks that need to wait for
another grace period, never mind the fact that the grace period that they
were waiting for just completed. This CPU can therefore erroneously
decide to start a new grace period. Note that this can happen in
TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system: Deadlock
considerations mean that the CPU that detected the end of the grace
period is not necessarily officially informed of this fact for some time.
Once this CPU notices that the earlier grace period completed, it will
invoke its callbacks. It then won't have any callbacks left. If no
other CPU has any callbacks, we now have a callback-free grace period.
This commit therefore makes CPUs check more carefully before starting a
new grace period. This new check relies on an array of tail pointers
into each CPU's list of callbacks. If the CPU is up to date on which
grace periods have completed, it checks to see if any callbacks follow
the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
follow the RCU_WAIT_TAIL segment. The reason that this works is that
the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
as soon as the CPU is officially notified that the old grace period
has ended.
This change is to cpu_needs_another_gp(), which is called in a number
of places. The only one that really matters is in rcu_start_gp(), where
the root rcu_node structure's ->lock is held, which prevents any
other CPU from starting or completing a grace period, so that the
comparison that determines whether the CPU is missing the completion
of a grace period is stable.
Reported-by: Becky Bruce <bgillbruce@gmail.com>
Reported-by: Subodh Nijsure <snijsure@grid-net.com>
Reported-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e3b3b105a upstream.
SI asics store voltage information differently so we
don't have a way to deal with it properly yet.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c10514394e upstream.
While going through Ubuntu bugs, I discovered this patch being
posted and a confirmation that the patch works as expected.
Finding out how the hw volume really works would be preferrable
to just disabling the broken one, but this would be better than
nothing.
Credit: sndfnsdfin (qawsnews)
BugLink: https://bugs.launchpad.net/bugs/559939
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4f1e48bd1 upstream.
When the loopback timer handler is running, calling del_timer() (for STOP
trigger) will not wait for the handler to complete before deactivating the
timer. The timer gets rescheduled in the handler as usual. Then a subsequent
START trigger will try to start the timer using add_timer() with a timer pending
leading to a kernel panic.
Serialize the calls to add_timer() and del_timer() using a spin lock to avoid
this.
Signed-off-by: Omair Mohammed Abdullah <omair.m.abdullah@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 027ef6c878 upstream.
In many places !pmd_present has been converted to pmd_none. For pmds
that's equivalent and pmd_none is quicker so using pmd_none is better.
However (unless we delete pmd_present) we should provide an accurate
pmd_present too. This will avoid the risk of code thinking the pmd is non
present because it's under __split_huge_page_map, see the pmd_mknotpresent
there and the comment above it.
If the page has been mprotected as PROT_NONE, it would also lead to a
pmd_present false negative in the same way as the race with
split_huge_page.
Because the PSE bit stays on at all times (both during split_huge_page and
when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
but checking the PROTNONE bit too is still good to remember pmd_present
must always keep PROT_NONE into account.
This explains a not reproducible BUG_ON that was seldom reported on the
lists.
The same issue is in pmd_large, it would go wrong with both PROT_NONE and
if it races with split_huge_page.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec4d9f626d upstream.
In fuzzing with trinity, lockdep protested "possible irq lock inversion
dependency detected" when isolate_lru_page() reenabled interrupts while
still holding the supposedly irq-safe tree_lock:
invalidate_inode_pages2
invalidate_complete_page2
spin_lock_irq(&mapping->tree_lock)
clear_page_mlock
isolate_lru_page
spin_unlock_irq(&zone->lru_lock)
isolate_lru_page() is correct to enable interrupts unconditionally:
invalidate_complete_page2() is incorrect to call clear_page_mlock() while
holding tree_lock, which is supposed to nest inside lru_lock.
Both truncate_complete_page() and invalidate_complete_page() call
clear_page_mlock() before taking tree_lock to remove page from radix_tree.
I guess invalidate_complete_page2() preferred to test PageDirty (again)
under tree_lock before committing to the munlock; but since the page has
already been unmapped, its state is already somewhat inconsistent, and no
worse if clear_page_mlock() moved up.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Deciphered-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b71fc079b5 upstream.
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a08f447fa upstream.
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f066055a34 upstream.
Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1965f66e7d upstream.
For bridges with "secondary > subordinate", i.e., invalid bus number
apertures, we don't enumerate anything behind the bridge unless the
user specified "pci=assign-busses".
This patch makes us automatically try to reassign the downstream bus
numbers in this case (just for that bridge, not for all bridges as
"pci=assign-busses" does).
We don't discover all the devices on the Intel DP43BF motherboard
without this change (or "pci=assign-busses") because its BIOS configures
a bridge as:
pci 0000:00:1e.0: PCI bridge to [bus 20-08] (subtractive decode)
[bhelgaas: changelog, change message to dev_info]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18412
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=625754
Reported-by: Brian C. Huffman <bhuffman@graze.net>
Reported-by: VL <vl.homutov@gmail.com>
Tested-by: VL <vl.homutov@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
commit d436de8ce2 upstream.
__scsi_remove_device (e.g. due to dev_loss_tmo) calls
zfcp_scsi_slave_destroy which in turn sends a close LUN FSF request to
the adapter. After 30 seconds without response,
zfcp_erp_timeout_handler kicks the ERP thread failing the close LUN
ERP action. zfcp_erp_wait in zfcp_erp_lun_shutdown_wait and thus
zfcp_scsi_slave_destroy returns and then scsi_device is no longer
valid. Sometime later the response to the close LUN FSF request may
finally come in. However, commit
b62a8d9b45
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit"
introduced a number of attempts to unconditionally access struct
zfcp_scsi_dev through struct scsi_device causing a use-after-free.
This leads to an Oops due to kernel page fault in one of:
zfcp_fsf_abort_fcp_command_handler, zfcp_fsf_open_lun_handler,
zfcp_fsf_close_lun_handler, zfcp_fsf_req_trace,
zfcp_fsf_fcp_handler_common.
Move dereferencing of zfcp private data zfcp_scsi_dev allocated in
scsi_device via scsi_transport_reserve_device after the check for
potentially aborted FSF request and thus no longer valid scsi_device.
Only then assign sdev_to_zfcp(sdev) to the local auto variable struct
zfcp_scsi_dev *zfcp_sdev.
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d99b601b63 upstream.
Upstream commit f3450c7b91
"[SCSI] zfcp: Replace local reference counting with common kref"
accidentally dropped a reference count check before tearing down
zfcp_ports that are potentially in use by zfcp_units.
Even remote ports in use can be removed causing
unreachable garbage objects zfcp_ports with zfcp_units.
Thus units won't come back even after a manual port_rescan.
The kref of zfcp_port->dev.kobj is already used by the driver core.
We cannot re-use it to track the number of zfcp_units.
Re-introduce our own counter for units per port
and check on port_remove.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca579c9f13 upstream.
If list_for_each_entry, etc complete a traversal of the list, the iterator
variable ends up pointing to an address at an offset from the list head,
and not a meaningful structure. Thus this value should not be used after
the end of the iterator. Replace port->adapter->scsi_host by
adapter->scsi_host.
This problem was found using Coccinelle (http://coccinelle.lip6.fr/).
Oversight in upsteam commit of v2.6.37
a1ca48319a
"[SCSI] zfcp: Move ACL/CFDC code to zfcp_cfdc.c"
which merged the content of zfcp_erp_port_access_changed().
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb45214960 upstream.
If the mapping of FCP device bus ID and corresponding subchannel
is modified while the Linux image is suspended, the resume of FCP
devices can fail. During resume, zfcp gets callbacks from cio regarding
the modified subchannels but they can be arbitrarily mixed with the
restore/resume callback. Since the cio callbacks would trigger
adapter recovery, zfcp could wakeup before the resume callback.
Therefore, ignore the cio callbacks regarding subchannels while
being suspended. We can safely do so, since zfcp does not deal itself
with subchannels. For problem determination purposes, we still trace the
ignored callback events.
The following kernel messages could be seen on resume:
kernel: <WWPN>: parent <FCP device bus ID> should not be sleeping
As part of adapter reopen recovery, zfcp performs auto port scanning
which can erroneously try to register new remote ports with
scsi_transport_fc and the device core code complains about the parent
(adapter) still sleeping.
kernel: zfcp.3dff9c: <FCP device bus ID>:\
Setting up the QDIO connection to the FCP adapter failed
<last kernel message repeated 3 more times>
kernel: zfcp.574d43: <FCP device bus ID>:\
ERP cannot recover an error on the FCP device
In such cases, the adapter gave up recovery and remained blocked along
with its child objects: remote ports and LUNs/scsi devices. Even the
adapter shutdown as part of giving up recovery failed because the ccw
device state remained disconnected. Later, the corresponding remote
ports ran into dev_loss_tmo. As a result, the LUNs were erroneously
not available again after resume.
Even a manually triggered adapter recovery (e.g. sysfs attribute
failed, or device offline/online via sysfs) could not recover the
adapter due to the remaining disconnected state of the corresponding
ccw device.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0100998dbf upstream.
Duplicate fssrh_2 from a54ca0f62f
"[SCSI] zfcp: Redesign of the debug tracing for HBA records."
complicates distinction of generic status read response from
local link up.
Duplicate fsscth1 from 2c55b750a8
"[SCSI] zfcp: Redesign of the debug tracing for SAN records."
complicates distinction of good common transport response from
invalid port handle.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf9ecf4b63 ]
On the earliest TSO capable devices, TSO was accomplished through
firmware. The TSO cannot coexist with ASF management firmware though.
The tg3 driver determines whether or not ASF is enabled by calling
tg3_get_eeprom_hw_cfg(), which checks a particular bit of NIC memory.
Commit dabc5c670d, entitled "tg3: Move
TSO_CAPABLE assignment", accidentally moved the code that determines
TSO capabilities earlier than the call to tg3_get_eeprom_hw_cfg(). As a
consequence, the driver was attempting to determine TSO capabilities
before it had all the data it needed to make the decision.
This patch fixes the problem by revisiting and reevaluating the decision
after tg3_get_eeprom_hw_cfg() is called.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8babe8cc65 ]
In order for the network layer to see that AoE requires
no checksumming in a generic way, the packets must be
marked as requiring no checksum, so we make this requirement
explicit with the assertion.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0d680e577 ]
A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE). Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.
commit f01a5236bd
Author: Jesse Gross <jesse@nicira.com>
Date: Sun Jan 9 06:23:31 2011 +0000
net offloading: Generalize netif_get_vlan_features().
The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming. Calling it for a
protocol that requires no checksum is inappropriate.
The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets. Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:
http://www.spinics.net/lists/linux-mm/msg15184.html
The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:
https://lkml.org/lkml/2012/8/28/140
The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c0cc88a762 ]
While investigating l2tp bug, I hit a bug in eth_type_trans(),
because not enough bytes were pulled in skb head.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 96af69ea2a ]
mip6_mh_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1b05c4b50e ]
icmpv6_filter() should not modify its input, or else its caller
would need to recompute ipv6_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Also, if icmpv6 header cannot be found, do not deliver the packet,
as we do in IPv4.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ab43ed8b74 ]
icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.
Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3e10986d1d ]
Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()
Fix is to make sure socket is a SOCK_STREAM one.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6862234238 ]
In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.
For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.
This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)
The patch is co-written with Eric Dumazet <edumazet@google.com>
Signed-off-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b018d57ff ]
When PPPOE is running over a virtual ethernet interface (e.g., a
bonding interface) and the user tries to delete the interface in case
the PPPOE state is ZOMBIE, the kernel will loop forever while
unregistering net_device for the reference count is not decreased to
zero which should have been done with dev_put().
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c3a5bdae2 ]
SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via
skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of
sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up
by __sctp_write_space().
Buffer space charged via sctp_set_owner_w() is released in sctp_wfree()
which calls __sctp_write_space() directly.
Buffer space charged via skb_set_owner_w() is released via sock_wfree()
which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set.
sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets.
Therefore if sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is
interrupted by a signal.
This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ...
Charging for the data twice does not make sense in the first place, it
leads to overcharging sndbuf by a factor 2. Therefore this patch only
charges a single byte in wmem_alloc when transmitting an SCTP packet to
ensure that the socket stays alive until the packet has been released.
This means that control chunks are no longer accounted for in wmem_alloc
which I believe is not a problem as skb->truesize will typically lead
to overcharging anyway and thus compensates for any control overhead.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 15c041759b ]
If recv() syscall is called for a TCP socket so that
- IOAT DMA is used
- MSG_WAITALL flag is used
- requested length is bigger than sk_rcvbuf
- enough data has already arrived to bring rcv_wnd to zero
then when tcp_recvmsg() gets to calling sk_wait_data(), receive
window can be still zero while sk_async_wait_queue exhausts
enough space to keep it zero. As this queue isn't cleaned until
the tcp_service_net_dma() call, sk_wait_data() cannot receive
any data and blocks forever.
If zero receive window and non-empty sk_async_wait_queue is
detected before calling sk_wait_data(), process the queue first.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6825a26c2d ]
as we hold dst_entry before we call __ip6_del_rt,
so we should alse call dst_release not only return
-ENOENT when the rt6_info is ip6_null_entry.
and we already hold the dst entry, so I think it's
safe to call dst_release out of the write-read lock.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5316cf9a51 ]
skb_reset_mac_len() relies on the value of the skb->network_header pointer,
therefore we must wait for such pointer to be recalculated before computing
the new mac_len value.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2120c52da6 ]
I discovered I couldn't get sierra_net to work on a powerpc. Turns out
the firmware attribute check assumes the system is little endian and
hence fails because the attributes is a 16 bit value.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7126195697 ]
If the old timestamps of a class, say cl, are stale when the class
becomes active, then QFQ may assign to cl a much higher start time
than the maximum value allowed. This may happen when QFQ assigns to
the start time of cl the finish time of a group whose classes are
characterized by a higher value of the ratio
max_class_pkt/weight_of_the_class with respect to that of
cl. Inserting a class with a too high start time into the bucket list
corrupts the data structure and may eventually lead to crashes.
This patch limits the maximum start time assigned to a class.
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e4d1aa40e3 ]
Add a check if pdev->bus->self == NULL (root bus). When attaching
a netxen NIC to a VM it can be on the root bus and the guest would
crash in netxen_mask_aer_correctable() because of a NULL pointer
dereference if CONFIG_PCIEAER is present.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 0b836ddde1 ]
Commit 36a1211970 (netprio_cgroup.h:
dont include module.h from other includes) made the following build
error on ixp4xx_hss pop up:
CC [M] drivers/net/wan/ixp4xx_hss.o
drivers/net/wan/ixp4xx_hss.c:1412:20: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1413:25: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1414:21: error: expected ';', ',' or ')'
before string constant
drivers/net/wan/ixp4xx_hss.c:1415:19: error: expected ';', ',' or ')'
before string constant
make[8]: *** [drivers/net/wan/ixp4xx_hss.o] Error 1
This was previously hidden because ixp4xx_hss includes linux/hdlc.h which
includes linux/netdevice.h which includes linux/netprio_cgroup.h which
used to include linux/module.h. The real issue was actually present since
the initial commit that added this driver since it uses macros from
linux/module.h without including this file.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ffb5ba9001 ]
chan->count is used by rx channel. If the desc count is not updated by
the clean up loop in cpdma_chan_stop, the value written to the rxfree
register in cpdma_chan_start will be incorrect.
Signed-off-by: Tao Hou <hotforest@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ecd7918745 ]
The current code fails to ensure that the netlink message actually
contains as many bytes as the header indicates. If a user creates a new
state or updates an existing one but does not supply the bytes for the
whole ESN replay window, the kernel copies random heap bytes into the
replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
netlink attribute. This leads to following issues:
1. The replay window has random bits set confusing the replay handling
code later on.
2. A malicious user could use this flaw to leak up to ~3.5kB of heap
memory when she has access to the XFRM netlink interface (requires
CAP_NET_ADMIN).
Known users of the ESN replay window are strongSwan and Steffen's
iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
uses the interface with a bitmap supplied while the former does not.
strongSwan is therefore prone to run into issue 1.
To fix both issues without breaking existing userland allow using the
XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
fully specified one. For the former case we initialize the in-kernel
bitmap with zero, for the latter we copy the user supplied bitmap. For
state updates the full bitmap must be supplied.
To prevent overflows in the bitmap length calculation the maximum size
of bmp_len is limited to 128 by this patch -- resulting in a maximum
replay window of 4096 packets. This should be sufficient for all real
life scenarios (RFC 4303 recommends a default replay window size of 64).
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Martin Willi <martin@revosec.ch>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1f86840f89 ]
The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.
Initial version of the patch by Brad Spengler.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b789836f4 ]
The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f778a63671 ]
The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c87308bde ]
copy_to_user_auth() fails to initialize the remainder of alg_name and
therefore discloses up to 54 bytes of heap memory via netlink to
userland.
Use strncpy() instead of strcpy() to fill the trailing bytes of alg_name
with null bytes.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 433a195480 ]
if xfrm_policy_get_afinfo returns 0, it has already released the read
lock, xfrm_policy_put_afinfo should not be called again.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c254637225 ]
When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 864745d291 ]
When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.
This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3b59df46a4 ]
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.
If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.
We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.
To do this, we introduce a recheck function that does this
check in the ESN case.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 959d1af8cf upstream.
WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution. When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.
Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb. Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.
Add an explicit smp_wmb() before work_clear_pending().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0f6d93aa9d upstream.
The ACARD driver calls udelay() with a value > 2000, which leads to to
the following compilation error on ARM:
ERROR: "__bad_udelay" [drivers/scsi/atp870u.ko] undefined!
make[1]: *** [__modpost] Error 1
This is because udelay is defined on ARM, roughly speaking, as
#define udelay(n) ((n) > 2000 ? __bad_udelay() : \
__const_udelay((n) * ((2199023U*HZ)>>11)))
The argument to __const_udelay is the number of jiffies to wait divided
by 4, but this does not work unless the multiplication does not
overflow, and that is what the build error is designed to prevent. The
intended behavior can be achieved by using mdelay to call udelay
multiple times in a loop.
[jrnieder@gmail.com: adding context]
Signed-off-by: Martin Michlmayr <tbm@cyrius.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f96972f2dc upstream.
As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
have kernel_restart() call disable_nonboot_cpus(). Doing so can help
machines that require boot cpu be the last alive cpu during reboot to
survive with kernel restart.
This fixes one reboot issue seen on imx6q (Cortex-A9 Quad). The machine
requires that the restart routine be run on the primary cpu rather than
secondary ones. Otherwise, the secondary core running the restart
routine will fail to come to online after reboot.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dfb117b3e5 upstream.
Check whether we evaluated _ADR successfully. Previously we ignored
failure, so we would have used garbage data from the stack as the device
and function number.
We return AE_OK so that we ignore only this slot and continue looking
for other slots.
Found by Coverity (CID 113981).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[bwh: Backported to 2.6.32/3.0: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc54ab7295 upstream.
The _OSC method may exist in module level code,
so it must be called after ACPI_FULL_INITIALIZATION
On some new platforms with Zero-Power-Optical-Disk-Drive (ZPODD)
support, this fix is necessary to save power.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Tested-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b961180ef upstream.
ite_dev::rdev is currently initialised in ite_probe() after
rc_register_device() returns. If a newly registered device is opened
quickly enough, we may enable interrupts and try to use ite_dev::rdev
before it has been initialised. Move it up to the earliest point we
can, right after calling rc_allocate_device().
Reported-and-tested-by: YunQiang Su <wzssyqa@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c353acba28 upstream.
The call if_changed mechanism does not work when the command contains
backslashes. This basically is an issue with lzo and bzip2 compressed
kernels. The compressed binaries do not contain the uncompressed image
size, so these use size_append to append the size. This results in
backslashes in the executed command. With this if_changed always
detects a change in the command and rebuilds the compressed image even
if nothing has changed.
Fix this by escaping backslashes in make-cmd
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Bernhard Walle <bernhard@bwalle.de>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e47f8976d8 upstream.
A quote from SPC-4: "While in the unavailable primary target port
asymmetric access state, the device server shall support those of
the following commands that it supports while in the active/optimized
state: [ ... ] d) SET TARGET PORT GROUPS; [ ... ]". Hence enable
sending STPG to a target port group that is in the unavailable state.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc3f02a795 upstream.
John reports:
BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]
[..]
Call Trace:
[<ffffffff8141782a>] scsi_remove_target+0xda/0x1f0
[<ffffffff81421de5>] sas_rphy_remove+0x55/0x60
[<ffffffff81421e01>] sas_rphy_delete+0x11/0x20
[<ffffffff81421e35>] sas_port_delete+0x25/0x160
[<ffffffff814549a3>] mptsas_del_end_device+0x183/0x270
...introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race".
Don't restart lookup of more stargets in the multi-target case, just
arrange to traverse the list once, on the assumption that new targets
are always added at the end. There is no guarantee that the target will
change state in scsi_target_reap() so we can end up spinning if we
restart.
Acked-by: Jack Wang <jack_wang@usish.com>
LKML-Reference: <CAEhu1-6wq1YsNiscGMwP4ud0Q+MrViRzv=kcWCQSBNc8c68N5Q@mail.gmail.com>
Reported-by: John Drescher <drescherjm@gmail.com>
Tested-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: Dan Williams <djbw@fb.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be768912a4 upstream.
git commit c8adf9a3e8
"PCI: pre-allocate additional resources to devices only after
successful allocation of essential resources."
fails to take into consideration the optional-resources needed by children
devices while calculating the optional-resource needed by the bridge.
This can be a problem on some setup. For example, if a hotplug bridge has 8
children hotplug bridges, the bridge should have enough resources to accomodate
the hotplug requirements for each of its children hotplug bridges. Currently
this is not the case.
This patch fixes the problem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andrew Worsley <amworsley@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d70a74ffd upstream.
The oem parameter image embedded in the efi variable is at an offset
from the start of the variable. However, in the failure path we try to
free the 'orom' pointer which is only valid when the paramaters are
being read from the legacy option-rom space.
Since failure to load the oem parameters is unlikely and we keep the
memory around in the success case just defer all de-allocation to devm.
Reported-by: Don Morris <don.morris@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b796d06d5 upstream.
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bea1e22df4 upstream.
Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)
Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().
In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.
Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21e89afd32 upstream.
It turns out Smart Array logical drives do not support target
reset and when the target reset fails, the logical drive will
be taken off line. Symptoms look like this:
hpsa 0000:03:00.0: Abort request on C1:B0:T0:L0
hpsa 0000:03:00.0: resetting device 1:0:0:0
hpsa 0000:03:00.0: cp ffff880037c56000 is reported invalid (probably means target device no longer present)
hpsa 0000:03:00.0: resetting device failed.
sd 1:0:0:0: Device offlined - not ready after error recovery
sd 1:0:0:0: rejecting I/O to offline device
EXT3-fs error (device sdb1): read_block_bitmap:
LUN reset is supported though, and is what we should be using.
Target reset is also disruptive in shared SAS situations,
for example, an external MSA1210m which does support target
reset attached to Smart Arrays in multiple hosts -- a target
reset from one host is disruptive to other hosts as all LUNs
on the target will be reset and will abort all outstanding i/os
back to all the attached hosts. So we should use LUN reset,
not target reset.
Tested this with Smart Array logical drives and with tape drives.
Not sure how this bug survived since 2009, except it must be very
rare for a Smart Array to require more than 30s to complete a request.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 225c56960f upstream.
The length field in the host config packet is only 16-bit long, so
passing it 0x10000 (64K which is our standard PAGE_SIZE) doesn't
work and result in an empty config from the server.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit abb3e01103 upstream.
Currently UBI fails in autoresize when it is in R/O mode (e.g., because the
underlying MTD device is R/O). This patch fixes the issue - we just skip
autoresize and print a warning.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88ed2a6061 upstream.
Uplink (TX) network data will go through gsm_dlci_data_output_framed
there is a bug where if memory allocation fails, the skb which
has already been pulled off the list will be lost.
In addition TX skbs were being processed in LIFO order
Fixed the memory leak, and changed to FIFO order processing
Signed-off-by: Russ Gorby <russ.gorby@intel.com>
Tested-by: Kappel, LaurentX <laurentx.kappel@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6e097dfdf upstream.
The Intel XHCI specification says that after clearing the run/stop bit
the controller may take up to 16ms to halt. We've seen a device take
14ms, which with the current timeout of 10ms causes the kernel to
abort the suspend. Increasing the timeout to the recommended value
fixes the problem.
This patch should be backported to kernels as old as 2.6.37, that
contain the commit 5535b1d5f8 "USB: xHCI:
PCI power management implementation".
Signed-off-by: Michael Spang <spang@chromium.org>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f34f9d186d upstream.
In !CORE_DUMP_USE_REGSET case, if elf_note_info_init fails to allocate
memory for info->fields, it frees already allocated stuff and returns
error to its caller, fill_note_info. Which in turn returns error to its
caller, elf_core_dump. Which jumps to cleanup label and calls
free_note_info, which will happily try to free all info->fields again.
BOOM.
This is the fix.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e44708f75 upstream.
There were some locking holes in the management of the MUX's
message queue for 2 code paths:
1) gsmld_write_wakeup
2) receipt of CMD_FCON flow-control message
In both cases gsm_data_kick is called w/o locking so it can collide
with other other instances of gsm_data_kick (pulling messages tx_tail)
or potentially other instances of __gsm_data_queu (adding messages to tx_head)
Changed to take the tx_lock in these 2 cases
Signed-off-by: Russ Gorby <russ.gorby@intel.com>
Tested-by: Yin, Fengwei <fengwei.yin@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80fab3b244 upstream.
When a device with an isochronous endpoint is behind a hub plugged into
the Intel Panther Point xHCI host controller, and the driver submits
multiple frames per URB, the xHCI driver will set the Block Event
Interrupt (BEI) flag on all but the last TD for the URB. This causes
the host controller to place an event on the event ring, but not send an
interrupt. When the last TD for the URB completes, BEI is cleared, and
we get an interrupt for the whole URB.
However, under a Panther Point xHCI host controller, if the parent hub
is unplugged when one or more events from transfers with BEI set are on
the event ring, a port status change event is placed on the event ring,
but no interrupt is generated. This means URBs stop completing, and the
USB device disconnect is not noticed. Something like a USB headset will
cause mplayer to hang when the device is disconnected.
If another transfer is sent (such as running `sudo lsusb -v`), the next
transfer event seems to "unstick" the event ring, the xHCI driver gets
an interrupt, and the disconnect is reported to the USB core.
The fix is not to use the BEI flag under the Panther Point xHCI host.
This will impact power consumption and system responsiveness, because
the xHCI driver will receive an interrupt for every frame in all
isochronous URBs instead of once per URB.
Intel chipset developers confirm that this bug will be hit if the BEI
flag is used on any endpoint, not just ones that are behind a hub.
This patch should be backported to kernels as old as 3.0, that contain
the commit 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7083909023 upstream.
Some of the EFI variable attributes are missing from print out from
/sys/firmware/efi/vars/*/attributes. This patch adds those in. It also
updates code to use pre-defined constants for masking current value
of attributes.
Signed-off-by: Khalid Aziz <khalid.aziz@hp.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26e8220adb upstream.
Apparently the same card model has two IDs, so this patch
complements the commit 39aced68d6
adding the missing one.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5dd553b9f upstream.
This works around a few glitches in the ST version of the PL011
serial driver when using very high baud rates, as we do in the
Ux500: 3, 3.25, 4 and 4.05 Mbps.
Problem Observed/rootcause:
When using high baud-rates, and the baudrate*8 is getting close to
the provided clock frequency (so a division factor close to 1), when
using bursts of characters (so they are abutted), then it seems as if
there is not enough time to detect the beginning of the start-bit which
is a timing reference for the entire character, and thus the sampling
moment of character bits is moving towards the end of each bit, instead
of the middle.
Fix:
Increase slightly the RX baud rate of the UART above the theoretical
baudrate by 5%. This will definitely give more margin time to the
UART_RX to correctly sample the data at the middle of the bit period.
Also fix the ages old copy-paste error in the very stressed comment,
it's referencing the registers used in the PL010 driver rather than
the PL011 ones.
Signed-off-by: Guillaume Jaunet <guillaume.jaunet@stericsson.com>
Signed-off-by: Christophe Arnal <christophe.arnal@stericsson.com>
Signed-off-by: Matthias Locher <matthias.locher@stericsson.com>
Signed-off-by: Rajanikanth HV <rajanikanth.hv@stericsson.com>
Cc: Bibek Basu <bibek.basu@stericsson.com>
Cc: Par-Gunnar Hjalmdahl <par-gunnar.hjalmdahl@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee8b593aff upstream.
If a user provides a buffer larger than a tty->write_buf chunk and
passes '\r' at the end of the buffer, we touch an out-of-bound memory.
Add a check there to prevent this.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Samo Pogacnik <samo_pogacnik@t-2.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9490e93c1 upstream.
Change the BUG_ON to WARN_ON and return in case of tty->read_buf==NULL. We want to track a
couple of long standing reports of this but at the same time we can avoid killing the box.
Signed-off-by: Stanislav Kozina <skozina@redhat.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c8cad4c89e upstream.
When `do_cmd_ioctl()` allocates memory for the kernel copy of a channel
list, it frees any previously allocated channel list in
`async->cmd.chanlist` and replaces it with the new one. However, if the
device is ever removed (or "detached") the cleanup code in
`cleanup_device()` in "drivers.c" does not free this memory so it is
lost.
A sensible place to free the kernel copy of the channel list is in
`do_become_nonbusy()` as at that point the comedi asynchronous command
associated with the channel list is no longer valid. Free the channel
list in `do_become_nonbusy()` instead of `do_cmd_ioctl()` and clear the
pointer to prevent it being freed more than once.
Note that `cleanup_device()` could be called at an inappropriate time
while the comedi device is open, but that's a separate bug not related
to this this patch.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5d06e3df28 upstream.
`parse_insn()` is dereferencing the user-space pointer `insn->data`
directly when handling the `INSN_INTTRIG` comedi instruction. It
shouldn't be using `insn->data` at all; it should be using the separate
`data` pointer passed to the function. Fix it.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1878957b4 upstream.
Correct a direct dereference of I/O memory to use an appropriate I/O
memory access function. Note that the pointer being dereferenced is not
currently tagged with `__iomem` but I plan to correct that for 3.7.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b655c2c478 upstream.
`s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which
is a pointer to user memory. It should be dereferencing the separate
`data` parameter that points to a copy of the data in kernel memory.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40fe4f8967 upstream.
softsynth_read() reads a character at a time from the init string;
when it finds the null terminator it sets the initialized flag but
then repeats the last character.
Additionally, if the read() buffer is not big enough for the init
string, the next read() will start reading from the beginning again.
So the caller may never progress to reading anything else.
Replace the simple initialized flag with the current position in
the init string, carried over between calls. Switch to reading
real data once this reaches the null terminator.
(This assumes that the length of the init string can't change, which
seems to be the case. Really, the string and position belong together
in a per-file private struct.)
Tested-by: Samuel Thibault <sthibault@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c638eb2872 upstream.
The three Pantech devices UML190 (106c:3716), UML290 (106c:3718) and
P4200 (106c:3721) all use the same subclasses to identify vendor
specific functions. Replace the existing device specific entries
with generic vendor matching, adding support for the P4200.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba1cbad93d upstream.
The access beyond the end of device BUG_ON that was introduced to
dm_request_fn via commit 29e4013de7 ("dm: implement
REQ_FLUSH/FUA support for request-based dm") was an overly
drastic (but simple) response to this situation.
I have received a report that this BUG_ON was hit and now think
it would be better to use dm_kill_unmapped_request() to fail the clone
and original request with -EIO.
map_request() will assign the valid target returned by
dm_table_find_target to tio->ti. But when the target
isn't valid tio->ti is never assigned (because map_request isn't
called); so add a check for tio->ti != NULL to dm_done().
Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8110e16d42 upstream.
IBM reported a deadlock in select_parent(). This was found to be caused
by taking rename_lock when already locked when restarting the tree
traversal.
There are two cases when the traversal needs to be restarted:
1) concurrent d_move(); this can only happen when not already locked,
since taking rename_lock protects against concurrent d_move().
2) racing with final d_put() on child just at the moment of ascending
to parent; rename_lock doesn't protect against this rare race, so it
can happen when already locked.
Because of case 2, we need to be able to handle restarting the traversal
when rename_lock is already held. This patch fixes all three callers of
try_to_ascend().
IBM reported that the deadlock is gone with this patch.
[ I rewrote the patch to be smaller and just do the "goto again" if the
lock was already held, but credit goes to Miklos for the real work.
- Linus ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a76d7bd96d upstream.
The open-coded mutex implementation for ARMv6+ cores suffers from a
severe lack of barriers, so in the uncontended case we don't actually
protect any accesses performed during the critical section.
Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec
code but optimised to remove a branch instruction, as the mutex fastpath
was previously inlined. Now that this is executed out-of-line, we can
reuse the atomic access code for the locking (in fact, we use the xchg
code as this produces shorter critical sections).
This patch uses the generic xchg based implementation for mutexes on
ARMv6+, which introduces barriers to the lock/unlock operations and also
has the benefit of removing a fair amount of inline assembly code.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Shan Kang <kangshan0910@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0d00dc2611 upstream.
This patch (as1607) fixes a race that can occur if a USB host
controller is removed while a process is reading the
/sys/kernel/debug/usb/devices file.
The usb_device_read() routine uses the bus->root_hub pointer to
determine whether or not the root hub is registered. The is not a
valid test, because the pointer is set before the root hub gets
registered and remains set even after the root hub is unregistered and
deallocated. As a result, usb_device_read() or usb_device_dump() can
access freed memory, causing an oops.
The patch changes the test to use the hcd->rh_registered flag, which
does get set and cleared at the appropriate times. It also makes sure
to hold the usb_bus_list_lock mutex while setting the flag, so that
usb_device_read() will become aware of new root hubs as soon as they
are registered.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a129a7c845 upstream.
When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.
Fix this by always forcing vm86 to look like ring 3.
[ Backport to 3.0 notes:
Things changed there slightly:
- move mce_get_rip() up. It fills up m->cs and m->ip values which
are evaluated in mce_severity(). Therefore move it up right before
the mce_severity call. This seem to be another bug in 3.0?
- Place the backport (fix m->cs in V86 case) to where m->cs gets
filled which is mce_get_rip() in 3.0
]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b5740f4b2c upstream.
try_to_wake_up() has a problem which may change status from TASK_DEAD to
TASK_RUNNING in race condition with SMI or guest environment of virtual
machine. As a result, exited task is scheduled() again and panic occurs.
Here is the sequence how it occurs:
----------------------------------+-----------------------------
|
CPU A | CPU B
----------------------------------+-----------------------------
TASK A calls exit()....
do_exit()
exit_mm()
down_read(mm->mmap_sem);
rwsem_down_failed_common()
set TASK_UNINTERRUPTIBLE
set waiter.task <= task A
list_add to sem->wait_list
:
raw_spin_unlock_irq()
(I/O interruption occured)
__rwsem_do_wake(mmap_sem)
list_del(&waiter->list);
waiter->task = NULL
wake_up_process(task A)
try_to_wake_up()
(task is still
TASK_UNINTERRUPTIBLE)
p->on_rq is still 1.)
ttwu_do_wakeup()
(*A)
:
(I/O interruption handler finished)
if (!waiter.task)
schedule() is not called
due to waiter.task is NULL.
tsk->state = TASK_RUNNING
:
check_preempt_curr();
:
task->state = TASK_DEAD
(*B)
<--- set TASK_RUNNING (*C)
schedule()
(exit task is running again)
BUG_ON() is called!
--------------------------------------------------------
The execution time between (*A) and (*B) is usually very short,
because the interruption is disabled, and setting TASK_RUNNING at (*C)
must be executed before setting TASK_DEAD.
HOWEVER, if SMI is interrupted between (*A) and (*B),
(*C) is able to execute AFTER setting TASK_DEAD!
Then, exited task is scheduled again, and BUG_ON() is called....
If the system works on guest system of virtual machine, the time
between (*A) and (*B) may be also long due to scheduling of hypervisor,
and same phenomenon can occur.
By this patch, do_exit() waits for releasing task->pi_lock which is used
in try_to_wake_up(). It guarantees the task becomes TASK_DEAD after
waking up.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120117174031.3118.E1E9C6FF@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 067aa4815a upstream.
Commit 178db7d3, "spi: Fix device unregistration when unregistering
the bus master", changed spi device initialization of dev.parent pointer
to be the master's device pointer instead of his parent.
This introduced a bug in spi-fsl-spi, since its usage of spi device
pointer was not updated accordingly. This was later fixed by commit
5039a86, "spi/mpc83xx: fix NULL pdata dereference bug", but it missed
another spot on fsl_spi_cs_control function where we also need to update
usage of spi device pointer. This change address that.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Alfredo Capella <alfredo.capella@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5039a86973 upstream.
Commit 178db7d3, "spi: Fix device unregistration when unregistering
the bus master", changed device initialization to be children of the
bus master, not children of the bus masters parent device. The pdata
pointer used in fsl_spi_chipselect must updated to reflect the changed
initialization.
Signed-off-by: Kenth Eriksson <kenth.eriksson@transmode.com>
Acked-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Alfredo Capella <alfredo.capella@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78b495c39a upstream
UBI was mistakingly using 'kfree()' instead of 'kmem_cache_free()' when
freeing "attach eraseblock" structures in vtbl.c. Thankfully, this happened
only when we were doing auto-format, so many systems were unaffected. However,
there are still many users affected.
It is strange, but the system did not crash and nothing bad happened when
the SLUB memory allocator was used. However, in case of SLOB we observed an
crash right away.
This problem was introduced in 2.6.39 by commit
"6c1e875 UBI: add slab cache for ubi_scan_leb objects"
Reported-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4188bba0e9 upstream.
The driver should not try to switch to 1.8V when the SD 3.0 host
controller does not have any UHS capabilities bits set (SDR50, DDR50
or SDR104). See page 72 of "SD Specifications Part A2 SD Host
Controller Simplified Specification Version 3.00" under
"1.8V Signaling Enable". Instead of setting SDR12 and SDR25 in the host
capabilities data structure for all V3.0 host controllers, only set them
if SDR104, SDR50 or DDR50 is set in the host capabilities register. This
will prevent the switch to 1.8V later.
Signed-off-by: Al Cooper <acooper@gmail.com>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Acked-by: Philip Rakity <prakity@marvell.com>
Acked-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f2815f68da upstream.
Here is Essential conditions to indicate Version 3.00 Card
(SD_SPEC=2 and SD_SPEC3=1) :
(1) The card shall support CMD6
(2) The card shall support CMD8
(3) The card shall support CMD42
(4) User area capacity shall be up to 2GB (SDSC) or 32GB (SDHC)
User area capacity shall be more than or equal to 32GB and
up to 2TB (SDXC)
(5) Speed Class shall be supported (SDHC or SDXC)
So even if SD card doesn't support any of the newly defined
UHS-I bus speed mode, it can advertise itself as SD3.0 cards
as long as it supports all the essential conditions of
SD3.0 cards. Given this, these type of cards should atleast
run in High Speed mode @50MHZ if it supports HS.
But current initialization sequence for SD3.0 cards is
such that these non-UHS-I SD3.0 cards runs in Default
Speed mode @25MHz.
This patch makes sure that these non-UHS-I SD3.0 cards run
in High Speed Mode @50MHz.
Tested this patch with SanDisk Extreme SDHC 8GB Class 10 card.
Reported-by: "Hiremath, Vaibhav" <hvaibhav@ti.com>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
commit cc37f75a9f upstream.
A Squashfs filesystem containing nothing but an empty directory,
although unusual and ultimately pointless, is still valid.
The directory_table >= next_table sanity check rejects these
filesystems as invalid because the directory_table is empty and
equal to next_table.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38bd2a1ac7 upstream.
Parity Setting value is reverse.
E.G. In case of setting ODD parity, EVEN value is set.
This patch inverts "if" condition.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9539dfb7ac upstream.
Rx Error interrupt(E.G. parity error) is not enabled.
So, when parity error occurs, error interrupt is not occurred.
As a result, the received data is not dropped.
This patch adds enable/disable rx error interrupt code.
Signed-off-by: Tomoya MORINAGA <tomoya.rohm@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 720bb6436f upstream.
For some reason, when the lirc daemon learns that a usb remote control
has been unplugged, it wants to read the sysfs attributes of the
disappearing device. This is useful for uncovering transient
inconsistencies, but less so for keeping the system running when such
inconsistencies exist.
Under some circumstances (like every time I unplug my dvb stick from
my laptop), lirc catches an rc_dev whose raw event handler has been
removed (presumably by ir_raw_event_unregister), and proceeds to
interrogate the raw protocols supported by the NULL pointer.
This patch avoids the NULL dereference, and ignores the issue of how
this state of affairs came about in the first place.
Version 2 incorporates changes recommended by Mauro Carvalho Chehab
(-ENODEV instead of -EINVAL, and a signed-off-by).
Signed-off-by: Douglas Bagnall <douglas@paradise.net.nz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cee58483cf upstream
Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526c ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.
Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.
This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.
Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e8b14526c upstream
Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).
Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.
So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.
Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.
Reported-by: CAI Qian <caiqian@redhat.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1344454580-17031-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bec4596b4e upstream.
drop_monitor calls several sleeping functions while in atomic context.
BUG: sleeping function called from invalid context at mm/slub.c:943
in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
Call Trace:
[<ffffffff810697ca>] __might_sleep+0xca/0xf0
[<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
[<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
[<ffffffff815343fb>] __alloc_skb+0x4b/0x230
[<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
[<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
[<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
[<ffffffff810568e0>] process_one_work+0x130/0x4c0
[<ffffffff81058249>] worker_thread+0x159/0x360
[<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
[<ffffffff8105d403>] kthread+0x93/0xa0
[<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
[<ffffffff816be6d0>] ? gs_change+0xb/0xb
Rework the logic to call the sleeping functions in right context.
Use standard timer/workqueue api to let system chose any cpu to perform
the allocation and netlink send.
Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
use mod_timer() to wait 1/10 second before next try.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4fdcfa1284 upstream.
I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error. drop monitor maintains a per cpu structure,
that gets initalized from a single cpu. Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself . Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu. If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.
This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from. In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu
Tested successfully by myself.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3885ca785a upstream.
Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections. Specifically, its possible to replace data->skb while its
being written. This patch corrects that by making data->skb an rcu protected
variable. That will prevent it from being overwritten while a tracepoint is
modifying it.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cde2e9a651 upstream.
Eric Dumazet pointed out this warning in the drop_monitor protocol to me:
[ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[ 38.352582] Call Trace:
[ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50
[ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b
It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b71ca6bce upstream.
For one, the driver device pointer needs to be filled in, or the lirc core
will refuse to load the driver. And we really need to wire up all the
platform_device bits. This has been tested via the lirc sourceforge tree
and verified to work, been sitting there for months, finally getting
around to sending it. :\
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8323f26ce3 upstream.
Stefan reported a crash on a kernel before a3e5d1091c ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.
Looking at all that I found a lack of serialization and plain
wrong comments.
The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.
Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c4686c71a9 upstream.
Commit d640113fe8 introduced a regression on SMP
systems where the processor core with ACPI id zero is disabled
(typically should be the case because of hyperthreading).
The regression got spread through stable kernels.
On 3.0.X it got introduced via 3.0.18.
Such platforms may be rare, but do exist.
Look out for a disabled processor with acpi_id 0 in dmesg:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] disabled)
This problem has been observed on a:
HP Proliant BL280c G6 blade
This patch restricts the introduced workaround to platforms
with nr_cpu_ids <= 1.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c531077f40 upstream.
When using my Seagate FreeAgent GoFlex eSATAp external disk enclosure,
interface errors are always seen until 1.5Gbps is negotiated [1]. This
occurs using any disk in the enclosure, and when the disk is connected
directly with a generic passive eSATAp cable, we see stable 3Gbps
operation as expected.
Blacklist 3Gbps mode to avoid dataloss and the ~30s delay bus reset
and renegotiation incurs.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 06b6a1cf6e upstream.
Jay Fenlason (fenlason@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.
And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.
How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.
1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c
2 run ./rds_server on one console
3 run ./rds_client on another console
4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor
/***************** rds_client.c ********************/
int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;
sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}
memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}
memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}
printf("client send data:%s\n", recvBuffer);
memset(recvBuffer, '\0', 128);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("receive data from server:%s\n", recvBuffer);
close(sock_fd);
return 0;
}
/***************** rds_server.c ********************/
int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;
sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}
memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}
printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;
/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}
close(sock_fd);
return 0;
}
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Fixed upstream by commits 2955b47d2c and
a4683487f9 from Dan Williams, but they are much
more intrusive than this tiny fix, according to Andrew - gregkh]
This patch tries to fix a dead loop in async_synchronize_full(), which
could be seen when preemption is disabled on a single cpu machine.
void async_synchronize_full(void)
{
do {
async_synchronize_cookie(next_cookie);
} while (!list_empty(&async_running) || !
list_empty(&async_pending));
}
async_synchronize_cookie() calls async_synchronize_cookie_domain() with
&async_running as the default domain to synchronize.
However, there might be some works in the async_pending list from other
domains. On a single cpu system, without preemption, there is no chance
for the other works to finish, so async_synchronize_full() enters a dead
loop.
It seems async_synchronize_full() wants to synchronize all entries in
all running lists(domains), so maybe we could just check the entry_count
to know whether all works are finished.
Currently, async_synchronize_cookie_domain() expects a non-NULL running
list ( if NULL, there would be NULL pointer dereference ), so maybe a
NULL pointer could be used as an indication for the functions to
synchronize all works in all domains.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 734b65417b upstream.
This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.
With thanks to Eric Dumazet for catching a bug.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1fa6535faf upstream.
As pointed out by Gustavo and Marcel, all Apple-specific Broadcom
devices seen so far have the same interface class, subclass and
protocol numbers. This patch adds an entry which matches all of them,
using the new USB_VENDOR_AND_INTERFACE_INFO() macro.
In particular, this patch adds support for the MacBook Pro Retina
(05ac:8286), which is not in the present list.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Tested-by: Shea Levy <shea@shealevy.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92c385f46b upstream.
Many Broadcom devices has a vendor specific devices class, with this rule
we match all existent and future controllers with this behavior.
We also remove old rules to that matches product id for Broadcom devices.
Tested-by: John Hommel <john.hommel@hp.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96e65306b8 upstream.
The compiler may compile the following code into TWO write/modify
instructions.
worker->flags &= ~WORKER_UNBOUND;
worker->flags |= WORKER_REBIND;
so the other CPU may temporarily see worker->flags which doesn't have
either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
prematurely.
Fix it by using single explicit assignment via ACCESS_ONCE().
Because idle workers have another WORKER_NOT_RUNNING flag, this bug
doesn't exist for them; however, update it to use the same pattern for
consistency.
tj: Applied the change to idle workers too and updated comments and
patch description a bit.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b98b601672 upstream.
Clear Audio Enable bit to trigger unsolicated event to notify Audio
Driver part the HDMI hot plug change. The patch fixed the bug when
remove HDMI cable the bit was not cleared correctly.
In intel_hdmi_dpms(), if intel_hdmi->has_audio been true, the "Audio enable bit" will
be set to trigger unsolicated event to notify Alsa driver the change.
intel_hdmi->has_audio will be reset to false from intel_hdmi_detect() after
remove the hdmi cable, here's debug log:
[ 187.494153] [drm:output_poll_execute], [CONNECTOR:17:HDMI-A-1] status updated from 1 to 2
[ 187.525349] [drm:intel_hdmi_detect], HDMI: has_audio = 0
so when comes back to intel_hdmi_dpms(), the "Audio enable bit" will not be cleared. And this
cause the eld infomation and pin presence doesnot update accordingly in alsa driver side.
This patch will also trigger unsolicated event to alsa driver to notify the hot plug event:
[ 187.853159] ALSA sound/pci/hda/patch_hdmi.c:772 HDMI hot plug event: Codec=3 Pin=5 Presence_Detect=0 ELD_Valid=1
[ 187.853268] ALSA sound/pci/hda/patch_hdmi.c:990 HDMI status: Codec=3 Pin=5 Presence_Detect=0 ELD_Valid=0
Signed-off-by: Wang Xingchao <xingchao.wang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3766054fff upstream.
There are some new video switch keys that used by newer machines.
0xA0 - SDSP HDMI only
0xA1 - SDSP LCD + HDMI
0xA2 - SDSP CRT + HDMI
0xA3 - SDSP TV + HDMI
But in Linux, there is no suitable userspace application to handle this,
so, mapping them all to KEY_SWITCHVIDEOMODE.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52e9b39d9a upstream.
There is a more recent APU stepping with a new PCI ID
shipping in the same board by Fujitsu which needs the
same quirk to correctly mark the back plane connectors.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d8636a2717 upstream.
So we've had a fair few reports of fbcon handover breakage between
efi/vesafb and i915 surface recently, so I dedicated a couple of
days to finding the problem.
Essentially the last thing we saw was the conflicting framebuffer
message and that was all.
So after much tracing with direct netconsole writes (printks
under console_lock not so useful), I think I found the race.
Thread A (driver load) Thread B (timer thread)
unbind_con_driver -> |
bind_con_driver -> |
vc->vc_sw->con_deinit -> |
fbcon_deinit -> |
console_lock() |
| |
| fbcon_flashcursor timer fires
| console_lock() <- blocked for A
|
|
fbcon_del_cursor_timer ->
del_timer_sync
(BOOM)
Of course because all of this is under the console lock,
we never see anything, also since we also just unbound the active
console guess what we never see anything.
Hopefully this fixes the problem for anyone seeing vesafb->kms
driver handoff.
v1.1: add comment suggestion from Alan.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Tested-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7838f994b4 upstream.
On many of our larger systems, CPU 0 has had all of its IRQ resources
consumed before XPC loads. Worst cases on machines with multiple 10
GigE cards and multiple IB cards have depleted the entire first socket
of IRQs.
This patch makes selecting the node upon which IRQs are allocated (as
well as all the other GRU Message Queue structures) specifiable as a
module load param and has a default behavior of searching all nodes/cpus
for an available resources.
[akpm@linux-foundation.org: fix build: include cpu.h and module.h]
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58a34de7b1 upstream.
The power.deferred_resume can only be set if the runtime PM status
of device is RPM_SUSPENDING and it should be cleared after its
status has been changed, regardless of whether or not the runtime
suspend has been successful. However, it only is cleared on
suspend failure, while it may remain set on successful suspend and
is happily leaked to rpm_resume() executed in that case.
That shouldn't happen, so if power.deferred_resume is set in
rpm_suspend() after the status has been changed to RPM_SUSPENDED,
clear it before calling rpm_resume(). Then, it doesn't need to be
cleared before changing the status to RPM_SUSPENDING any more,
because it's always cleared after the status has been changed to
either RPM_SUSPENDED (on success) or RPM_ACTIVE (on failure).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f321c26c0 upstream.
For devices whose power.no_callbacks flag is set, rpm_resume()
should return 1 if the device's parent is already active, so that
the callers of pm_runtime_get() don't think that they have to wait
for the device to resume (asynchronously) in that case (the core
won't queue up an asynchronous resume in that case, so there's
nothing to wait for anyway).
Modify the code accordingly (and make sure that an idle notification
will be queued up on success, even if 1 is to be returned).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7dbfb315b2 upstream.
Correct the offset by subtracting 20 from tm_hour before taking the
modulo 12.
[ "Why 20?" I hear you ask. Or at least I did.
Here's the reason why: RS5C348_BIT_PM is 32, and is - stupidly -
included in the RS5C348_HOURS_MASK define. So it's really subtracting
out that bit to get "hour+12". But then because it does things modulo
12, it needs to add the 12 in again afterwards anyway.
This code is confused. It would be much clearer if RS5C348_HOURS_MASK
just didn't include the RS5C348_BIT_PM bit at all, then it wouldn't
need to do the silly subtract either.
Whatever. It's all just math, the end result is the same. - Linus ]
Reported-by: James Nute <newten82@gmail.com>
Tested-by: James Nute <newten82@gmail.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0bce9c46bf upstream.
ARM recently moved to asm-generic/mutex-xchg.h for its mutex
implementation after the previous implementation was found to be missing
some crucial memory barriers. However, this has revealed some problems
running hackbench on SMP platforms due to the way in which the
MUTEX_SPIN_ON_OWNER code operates.
The symptoms are that a bunch of hackbench tasks are left waiting on an
unlocked mutex and therefore never get woken up to claim it. This boils
down to the following sequence of events:
Task A Task B Task C Lock value
0 1
1 lock() 0
2 lock() 0
3 spin(A) 0
4 unlock() 1
5 lock() 0
6 cmpxchg(1,0) 0
7 contended() -1
8 lock() 0
9 spin(C) 0
10 unlock() 1
11 cmpxchg(1,0) 0
12 unlock() 1
At this point, the lock is unlocked, but Task B is in an uninterruptible
sleep with nobody to wake it up.
This patch fixes the problem by ensuring we put the lock into the
contended state if we fail to acquire it on the fastpath, ensuring that
any blocked waiters are woken up when the mutex is released.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 50d0206fca upstream.
This patch fixes a particularly nasty bug that was revealed by the ring
expansion patches. The bug has been present since the very beginning of
the xHCI driver history, and could have caused general protection faults
from bad memory accesses.
The first thing to note is that a Set TR Dequeue Pointer command can
move the dequeue pointer to a link TRB, if the canceled or stalled
transfer TD ended just before a link TRB. The function to increment the
dequeue pointer, inc_deq, was written before cancellation and stall
support was added. It assumed that the dequeue pointer could never
point to a link TRB. It would unconditionally increment the dequeue
pointer at the start of the function, check if the pointer was now on a
link TRB, and move it to the top of the next segment if so.
This means that if a Set TR Dequeue Point command moved the dequeue
pointer to a link TRB, a subsequent call to inc_deq() would move the
pointer off the segment and into la-la-land. It would then read from
that memory to determine if it was a link TRB. Other functions would
often call inc_deq() until the dequeue pointer matched some other
pointer, which means this function would quite happily read all of
system memory before wrapping around to the right pointer value.
Often, there would be another endpoint segment from a different ring
allocated from the same DMA pool, which would be contiguous to the
segment inc_deq just stepped off of. inc_deq would eventually find the
link TRB in that segment, and blindly move the dequeue pointer back to
the top of the correct ring segment.
The only reason the original code worked at all is because there was
only one ring segment. With the ring expansion patches, the dequeue
pointer would eventually wrap into place, but the dequeue segment would
be out-of-sync. On the second TD after the dequeue pointer was moved to
a link TRB, trb_in_td() would fail (because the dequeue pointer and
dequeue segment were out-of-sync), and this message would appear:
ERROR Transfer event TRB DMA ptr not part of current TD
This fixes bugzilla entry 4333 (option-based modem unhappy on USB 3.0
port: "Transfer event TRB DMA ptr not part of current TD", "rejecting
I/O to offline device"),
https://bugzilla.kernel.org/show_bug.cgi?id=43333
and possibly other general protection fault bugs as well.
This patch should be backported to kernels as old as 2.6.31. A separate
patch will be created for kernels older than 3.4, since inc_deq was
modified in 3.4 and this patch will not apply.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Tested-by: James Ettle <theholyettlz@googlemail.com>
Tested-by: Matthew Hall <mhall@mhcomputing.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2963657819 upstream.
For non PCI-based stacks, this function call
usb_disable_xhci_ports(to_pci_dev(hcd->self.controller));
made from xhci_shutdown is not applicable.
Ideally, we wouldn't have any PCI-specific code on
a generic driver such as the xHCI stack, but it looks
like we should just stub usb_disable_xhci_ports() out
for non-PCI devices.
[ balbi@ti.com: slight improvement to commit log ]
This patch should be backported to kernels as old as 3.0, since the
commit it fixes (e95829f474 "xhci: Switch
PPT ports to EHCI on shutdown.") was marked for stable.
Signed-off-by: Moiz Sonasath<m-sonasath@ti.com>
Signed-off-by: Ruchika Kharwar <ruchika@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29d214576f upstream.
On Intel Panther Point chipset USB 3.0 devices show up as
high-speed devices on powerup, but after an s3 cycle they are
correctly recognized as SuperSpeed. At powerup switch the port
to xHCI so that USB 3.0 devices are correctly recognized.
BugLink: http://bugs.launchpad.net/bugs/1000424
This patch should be backported to kernels as old as 3.0, that contain
commit ID 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e955a1cd08 upstream.
My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.
I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.
This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d "USB: xhci: BIOS handoff
and HW initialization."
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 052c7f9ffb upstream.
The intent was to test whether the flag was set.
This patch should be backported to stable kernels as old as 3.0, since
it fixes a bug in commit e95829f474 "xhci:
Switch PPT ports to EHCI on shutdown.", which was marked for stable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a96874a2a9 upstream.
With a previous patch to enable the EHCI/XHCI port switching, it switches
all the available ports.
The assumption is not correct because the BIOS may expect some ports
not switchable by the OS.
There are two more registers that contains the information of the switchable
and non-switchable ports.
This patch adds the checking code for the two register so that only the
switchable ports are altered.
This patch should be backported to kernels as old as 3.0, that contain
commit ID 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 92fc7a8b0f upstream.
This patch (as1604) adds a CONFIG_INTF_STRINGS quirk for the Joss
infrared touchboard device. The device doesn't like to be asked for
its interface strings.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: adam ? <adam3337@wp.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dafc4f7be1 upstream.
Commit b69cc67205 added support for the E-861. After acquiring a C-867, I
realised that every Physik Instrumente's device has a different PID. They are
listed in the Windows device driver's .inf file. So here are all PIDs for the
current (and probably future) USB devices from Physik Instrumente.
Compiled, but only actually tested on the E-861 and C-867.
Signed-off-by: Éric Piel <piel@delmic.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f08dea7348 upstream.
The Microchip vid:pid 04d8:000a is used for their CDC ACM
demo firmware application. This is a device with a single
function conforming to the CDC ACM specification and with
the intention of demonstrating CDC ACM class firmware and
driver interaction. The demo is used on a number of
development boards, and may also be used unmodified by
vendors using Microchip hardware.
Some vendors have re-used this vid:pid for other types of
firmware, emulating FTDI chips. Attempting to continue to
support such devices without breaking class based
applications that by matching on interface
class/subclass/proto being ff/ff/00. I have no information
about the actual device or interface descriptors, but this
will at least make the proper CDC ACM devices work again.
Anyone having details of the offending device's descriptors
should update this entry with the details.
Reported-by: Florian Wöhrl <fw@woehrl.biz>
Reported-by: Xiaofan Chen <xiaofanc@gmail.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Bruno Thomsen <bruno.thomsen@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d037774b4 upstream.
There is a possibility of QH overlay region having reference to a stale
qTD pointer during unlink.
Consider an endpoint having two pending qTD before unlink process begins.
The endpoint's QH queue looks like this.
qTD1 --> qTD2 --> Dummy
To unlink qTD2, QH is removed from asynchronous list and Asynchronous
Advance Doorbell is programmed. The qTD1's next qTD pointer is set to
qTD2'2 next qTD pointer and qTD2 is retired upon controller's doorbell
interrupt. If QH's current qTD pointer points to qTD1, transfer overlay
region still have reference to qTD2. But qtD2 is just unlinked and freed.
This may cause EHCI system error. Fix this by updating qTD next pointer
in QH overlay region with the qTD next pointer of the current qTD.
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 01913b49cf upstream.
If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last
decode_* call must have succeeded.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 872ece86ea upstream.
Apparently, am-utils is still using the legacy binary mountdata interface,
and is having trouble parsing /proc/mounts due to the 'port=' field being
incorrectly set.
The following patch should fix up the regression.
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3f52af3e0 upstream.
When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd764 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).
At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.
Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881
Reported-by: Andi Kleen <andi@firstfloor.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a396e10019 upstream.
We need to program the rfkill switch GPIO pin direction to input at
device initialization time, not only when the interface is brought up.
Doing this only when the interface is brought up could lead to rfkill
detecting the switch is turned on erroneously and inability to create
the interface and bringing it up.
Reported-and-tested-by: Andreas Messer <andi@bastelmap.de>
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Ivo Van Doorn <ivdoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c456797681 upstream.
Avoid the construction of a malformed DMA request sent to
the DMA controller.
Log message is for debug only because this condition is unlikely to
append and may only trigger at driver development time.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a85d0d7f34 upstream.
When call_crda() is called we kick off a witch hunt search
for the same regulatory domain on our internal regulatory
database and that work gets kicked off on a workqueue, this
is done while the cfg80211_mutex is held. If that workqueue
kicks off it will first lock reg_regdb_search_mutex and
later cfg80211_mutex but to ensure two CPUs will not contend
against cfg80211_mutex the right thing to do is to have the
reg_regdb_search() wait until the cfg80211_mutex is let go.
The lockdep report is pasted below.
cfg80211: Calling CRDA to update world regulatory domain
======================================================
[ INFO: possible circular locking dependency detected ]
3.3.8 #3 Tainted: G O
-------------------------------------------------------
kworker/0:1/235 is trying to acquire lock:
(cfg80211_mutex){+.+...}, at: [<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
but task is already holding lock:
(reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (reg_regdb_search_mutex){+.+...}:
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<81645778>] is_world_regdom+0x9f8/0xc74 [cfg80211]
-> #1 (reg_mutex#2){+.+...}:
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<8164539c>] is_world_regdom+0x61c/0xc74 [cfg80211]
-> #0 (cfg80211_mutex){+.+...}:
[<800a77b8>] __lock_acquire+0x10d4/0x17bc
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
other info that might help us debug this:
Chain exists of:
cfg80211_mutex --> reg_mutex#2 --> reg_regdb_search_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(reg_regdb_search_mutex);
lock(reg_mutex#2);
lock(reg_regdb_search_mutex);
lock(cfg80211_mutex);
*** DEADLOCK ***
3 locks held by kworker/0:1/235:
#0: (events){.+.+..}, at: [<80089a00>] process_one_work+0x230/0x460
#1: (reg_regdb_work){+.+...}, at: [<80089a00>] process_one_work+0x230/0x460
#2: (reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]
stack backtrace:
Call Trace:
[<80290fd4>] dump_stack+0x8/0x34
[<80291bc4>] print_circular_bug+0x2ac/0x2d8
[<800a77b8>] __lock_acquire+0x10d4/0x17bc
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<816468a4>] set_regdom+0x78c/0x808 [cfg80211]
Reported-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e21093ef6f upstream.
The Revision 1.0 Janz CMOD-IO Carrier Board does not have support for
the reset registers. To support older hardware, the code is changed to
use the hardware reset register on the Janz VMOD-ICAN3 hardware itself.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8669cf6793 upstream.
On Toshiba Satellite C850D, the touchpad and the keyboard might randomly
not work at boot. Preventing MUX mode activation solves this issue.
Signed-off-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1e5b7e425 upstream.
This patch zeroes the SCTLR.TRE bit prior to setting the mapping as
cacheable for ARMv7 cores in the decompressor, ensuring that the
memory region attributes are obtained from the C and B bits, not from
the page tables.
Cc: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matthew Leach <matthew.leach@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No upstream commit as this is a merge error in the 3.0 tree.
ARM_ERRATA_764369 and PL310_ERRATA_769419 do not appear in config menu in
stable 3.0.y tree.
This is because backported patch for arm/arm/Kconfig applied wrong place.
This patch solves it.
Signed-off-by: Tetsuyuki Kobayashi <koba@kmckk.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 308b135e4f upstream.
kdump can be interrupted by watchdog timer when the timer is left
activated on the crash kernel. Changed the hpwdt driver to disable
watchdog timer at boot-time. This assures that watchdog timer is
disabled until /dev/watchdog is opened, and prevents watchdog timer
to be left running on the crash kernel.
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Lisa Mitchell <lisa.mitchell@hp.com>
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 256d0eaac8 upstream.
If a command status of CMD_PROTOCOL_ERR is received, this
information should be conveyed to the SCSI mid layer, not
dropped on the floor. CMD_PROTOCOL_ERR may be received
from the Smart Array for any commands destined for an external
RAID controller such as a P2000, or commands destined for tape
drives or CD/DVD-ROM drives, if for instance a cable is
disconnected. This mostly affects multipath configurations, as
disconnecting a cable on a non-multipath configuration is not
going to do anything good regardless of whether CMD_PROTOCOL_ERR
is handled correctly or not. Not handling CMD_PROTOCOL_ERR
correctly in a multipath configaration involving external RAID
controllers may cause data corruption, so this is quite a serious
bug. This bug should not normally cause a problem for direct
attached disk storage.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d653220711 upstream.
This patch fixes the following kernel panic invoked by uninitialized fields
in the chip initialization for the 1G bnx2 iSCSI offload.
One of the bits in the chip initialization is being used by the latest
firmware to control overflow packets. When this control bit gets enabled
erroneously, it would ultimately result in a bad packet placement which would
cause the bnx2 driver to dereference a NULL ptr in the placement handler.
This can happen under certain stress I/O environment under the Linux
iSCSI offload operation.
This change only affects Broadcom's 5709 chipset.
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
[<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2
RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216
RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000
RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220
RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000
R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010
R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558
FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820)
Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520
ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20
0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8
Call Trace:
<IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b
[<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231
[<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1
[<ffffffff800125a0>] __do_softirq+0x89/0x133
[<ffffffff8005e30c>] call_softirq+0x1c/0x28
[<ffffffff8006d5de>] do_softirq+0x2c/0x7d
[<ffffffff8006d46e>] do_IRQ+0xee/0xf7
[<ffffffff8005d625>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341
[<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341
[<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341
[<ffffffff80049560>] cpu_idle+0x95/0xb8
[<ffffffff80078b1c>] start_secondary+0x479/0x488
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10cce6d8b5 upstream.
This patch checks whether HBA is SAS2008 B0 controller.
if it is a SAS2008 B0 controller then it use IO-APIC interrupt instead of MSIX,
as SAS2008 B0 controller doesn't support MSIX interrupts.
[jejb: fix whitespace problems]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f0ecb907d upstream.
The quirk introduced with commit
00250ec909 (hwmon: fam15h_power: fix
bogus values with current BIOSes) is not only required during driver
load but also when system resumes from suspend. The BIOS might set the
previously recommended (but unsuitable) initilization value for the
running average range register during resume.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Hartmann <andihartmann@01019freenet.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d54db795d upstream.
The hypervisor is in charge of allocating the proper "NUMA" memory
and dealing with the CPU scheduler to keep them bound to the proper
NUMA node. The PV guests (and PVHVM) have no inkling of where they
run and do not need to know that right now. In the future we will
need to inject NUMA configuration data (if a guest spans two or more
NUMA nodes) so that the kernel can make the right choices. But those
patches are not yet present.
In the meantime, disable the NUMA capability in the PV guest, which
also fixes a bootup issue. Andre says:
"we see Dom0 crashes due to the kernel detecting the NUMA topology not
by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
This will detect the actual NUMA config of the physical machine, but
will crash about the mismatch with Dom0's virtual memory. Variation of
the theme: Dom0 sees what it's not supposed to see.
This happens with the said config option enabled and on a machine where
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
We have this dump then:
NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10
Scanning NUMA topology in Northbridge 24
Number of physical nodes 4
Node 0 MemBase 0000000000000000 Limit 0000000040000000
Node 1 MemBase 0000000040000000 Limit 0000000138000000
Node 2 MemBase 0000000138000000 Limit 00000001f8000000
Node 3 MemBase 00000001f8000000 Limit 0000000238000000
Initmem setup node 0 0000000000000000-0000000040000000
NODE_DATA [000000003ffd9000 - 000000003fffffff]
Initmem setup node 1 0000000040000000-0000000138000000
NODE_DATA [0000000137fd9000 - 0000000137ffffff]
Initmem setup node 2 0000000138000000-00000001f8000000
NODE_DATA [00000001f095e000 - 00000001f0984fff]
Initmem setup node 3 00000001f8000000-0000000238000000
Cannot find 159744 bytes in node 3
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
.. snip..
[<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178
[<ffffffff81d23348>] sparse_init+0xe4/0x25a
[<ffffffff81d16840>] paging_init+0x13/0x22
[<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[<ffffffff81683954>] ? printk+0x3c/0x3e
[<ffffffff81d01a38>] start_kernel+0xe5/0x468
[<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
"
so we just disable NUMA scanning by setting numa_off=1.
Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Acked-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f14851af0e upstream.
There may be a bug when registering section info. For example, on my
Itanium platform, the pfn range of node0 includes the other nodes, so
other nodes' section info will be double registered, and memmap's page
count will equal to 3.
node0: start_pfn=0x100, spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
node1: start_pfn=0x80000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x080000-0x100000
node2: start_pfn=0x100000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x100000-0x180000
node3: start_pfn=0x180000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x180000-0x200000
free_all_bootmem_node()
register_page_bootmem_info_node()
register_page_bootmem_info_section()
When hot remove memory, we can't free the memmap's page because
page_count() is 2 after put_page_bootmem().
sparse_remove_one_section()
free_section_usemap()
free_map_bootmem()
put_page_bootmem()
[akpm@linux-foundation.org: add code comment]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05cf96398e upstream.
I found following definition in include/linux/memory.h, in my IA64
platform, SECTION_SIZE_BITS is equal to 32, and MIN_MEMORY_BLOCK_SIZE
will be 0.
#define MIN_MEMORY_BLOCK_SIZE (1 << SECTION_SIZE_BITS)
Because MIN_MEMORY_BLOCK_SIZE is int type and length of 32bits,
so MIN_MEMORY_BLOCK_SIZE(1 << 32) will will equal to 0.
Actually when SECTION_SIZE_BITS >= 31, MIN_MEMORY_BLOCK_SIZE will be wrong.
This will cause wrong system memory infomation in sysfs.
I think it should be:
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
And "echo offline > memory0/state" will cause following call trace:
kernel BUG at mm/memory_hotplug.c:885!
sh[6455]: bugcheck! 0 [1]
Pid: 6455, CPU 0, comm: sh
psr : 0000101008526030 ifs : 8000000000000fa4 ip : [<a0000001008c40f0>] Not tainted (3.6.0-rc1)
ip is at offline_pages+0x210/0xee0
Call Trace:
show_stack+0x80/0xa0
show_regs+0x640/0x920
die+0x190/0x2c0
die_if_kernel+0x50/0x80
ia64_bad_break+0x3d0/0x6e0
ia64_native_leave_kernel+0x0/0x270
offline_pages+0x210/0xee0
alloc_pages_current+0x180/0x2a0
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cab32f39dc upstream.
The MCP2515 has a silicon bug causing repeated frame transmission, see section
5 of MCP2515 Rev. B Silicon Errata Revision G (March 2007).
Basically, setting TXBnCTRL.TXREQ in either SPI mode (00 or 11) will eventually
cause the bug. The workaround proposed by Microchip is to use mode 00 and send
a RTS command on the SPI bus to initiate the transmission.
Signed-off-by: Benoît Locher <Benoit.Locher@skf.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dcebaa9a0 upstream.
On some platforms, bootloaders are known to do some interesting RTC
programming. Without going into the obscurities as to why this may be
the case, suffice it to say the the driver should not make any
assumptions about the state of the RTC when the driver loads. In
particular, the driver probe should be sure that all interrupts are
disabled until otherwise programmed.
This was discovered when finding bursty I2C traffic every second on
Overo platforms. This I2C overhead was keeping the SoC from hitting
deep power states. The cause was found to be the RTC firing every
second on the I2C-connected TWL PMIC.
Special thanks to Felipe Balbi for suggesting to look for a rogue driver
as the source of the I2C traffic rather than the I2C driver itself.
Special thanks to Steve Sakoman for helping track down the source of the
continuous RTC interrups on the Overo boards.
Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Felipe Balbi <balbi@ti.com>
Tested-by: Steve Sakoman <steve@sakoman.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Tested-by: Shubhrajyoti Datta <omaplinuxkernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ba8f2d593 upstream.
The heuristic method for buddy has been introduced since commit
43506fad21 ("mm/page_alloc.c: simplify calculation of combined index
of adjacent buddy lists"). But the page address of higher page's buddy
was wrongly calculated, which will lead page_is_buddy to fail for ever.
IOW, the heuristic method would be disabled with the wrong page address
of higher page's buddy.
Calculating the page address of higher page's buddy should be based
higher_page with the offset between index of higher page and index of
higher page's buddy.
Signed-off-by: Haifeng Li <omycle@gmail.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KyongHo Cho <pullip.cho@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 667a5313ec upstream.
commit 27a7b260f7
md: Fix handling for devices from 2TB to 4TB in 0.90 metadata.
changed 0.90 metadata handling to truncated size to 4TB as that is
all that 0.90 can record.
However for RAID0 and Linear, 0.90 doesn't need to record the size, so
this truncation is not needed and causes working arrays to become too small.
So avoid the truncation for RAID0 and Linear
This bug was introduced in 3.1 and is suitable for any stable kernels
from then onwards.
As the offending commit was tagged for 'stable', any stable kernel
that it was applied to should also get this patch. That includes
at least 2.6.32, 2.6.33 and 3.0. (Thanks to Ben Hutchings for
providing that list).
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 67a806d949 upstream.
The following build error occurred during an alpha build:
net/core/sock.c:274:36: error: initializer element is not constant
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 60e233a566 upstream.
Fengguang Wu <fengguang.wu@intel.com> writes:
> After the __devinit* removal series, I can still get kernel panic in
> show_uevent(). So there are more sources of bug..
>
> Debug patch:
>
> @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device
> goto out;
>
> /* copy keys to file */
> - for (i = 0; i < env->envp_idx; i++)
> + dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name);
> + for (i = 0; i < env->envp_idx; i++) {
> + printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]);
> count += sprintf(&buf[count], "%s\n", env->envp[i]);
> + }
>
> Oops message, the env[] is again not properly initilized:
>
> [ 44.068623] input input0: uevent 61 env[805306368]: input0/.../input0
> [ 44.069552] uevent 0 env[0]: (null)
This is a completely different CONFIG_HOTPLUG problem, only
demonstrating another reason why CONFIG_HOTPLUG should go away. I had a
hard time trying to disable it anyway ;-)
The problem this time is lots of code assuming that a call to
add_uevent_var() will guarantee that env->buflen > 0. This is not true
if CONFIG_HOTPLUG is unset. So things like this end up overwriting
env->envp_idx because the array index is -1:
if (add_uevent_var(env, "MODALIAS="))
return -ENOMEM;
len = input_print_modalias(&env->buf[env->buflen - 1],
sizeof(env->buf) - env->buflen,
dev, 0);
Don't know what the best action is, given that there seem to be a *lot*
of this around the kernel. This patch "fixes" the problem for me, but I
don't know if it can be considered an appropriate fix.
[ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to
always be on is the longterm fix, but it's too late for 3.6 and older
kernels to resolve this that way - gregkh ]
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74f330bcea upstream.
Since commit 30832ab56 ("mmc: sdhci: Always pass clock request value
zero to set_clock host op") was merged, esdhc_set_clock starts hitting
"if (clock == 0)" where ESDHC_SYSTEM_CONTROL has been operated. This
causes SDHCI card-detection function being broken. Fix the regression
by moving "if (clock == 0)" above ESDHC_SYSTEM_CONTROL operation.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6fa941d94 upstream.
Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event. Use explicit refcount of its own
instead. Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61ed59ed09 upstream.
Don't zero out bits 15..12 of the data value in `das08jr_ao_winsn()` as
that knobbles the upper three-quarters of the output range for the
'das08jr-16-ao' board.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa209eef3c upstream.
Hi,
This patch fixes a bug with driver failing to negotiate a connection.
The bug was traced to commit
203e4615ee
staging: vt6656: removed custom definitions of Ethernet packet types
In that patch, definitions in include/linux/if_ether.h replaced ones
in tether.h which had both big and little endian definitions.
include/linux/if_ether.h only refers to big endian values, cpu_to_be16
should be used for the correct endian architectures.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ea418b8b2 upstream.
A local static variable was declared as a pointer to a string
constant. We're assigning to the underlying memory, so it
needs to be an array instead.
Signed-off-by: Christopher Brannon <chris@the-brannons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3737e2be50 upstream.
The AK4396 DAC has a linear-scale attentuator, but
sound/pci/ice1712/prodigy_hifi.c used a log scale instead, which is
not quite right. This patch restores the correct scale, borrowing
from the ak4396 code in sound/pci/oxygen/oxygen.c.
Signed-off-by: Matteo Frigo <athena@fftw.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4c054ba63a upstream.
This patch fixes a long-standing bug with SCSI overflow handling
where se_cmd->data_length was incorrectly being re-assigned to
the larger CDB extracted allocation length, resulting in a number
of fabric level errors that would end up causing a session reset
in most cases. So instead now:
- Only re-assign se_cmd->data_length durining UNDERFLOW (to use the
smaller value)
- Use existing se_cmd->data_length for OVERFLOW (to use the smaller
value)
This fix has been tested with the following CDB to generate an
SCSI overflow:
sg_raw -r512 /dev/sdc 28 0 0 0 0 0 0 0 9 0
Tested using iscsi-target, tcm_qla2xxx, loopback and tcm_vhost fabric
ports. Here is a bit more detail on each case:
- iscsi-target: Bug with open-iscsi with overflow, sg_raw returns
-3584 bytes of data.
- tcm_qla2xxx: Working as expected, returnins 512 bytes of data
- loopback: sg_raw returns CHECK_CONDITION, from overflow rejection
in transport_generic_map_mem_to_cmd()
- tcm_vhost: Same as loopback
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8335eafc28 upstream.
After calling into the lower filesystem to do a rename, the lower target
inode's attributes were not copied up to the eCryptfs target inode. This
resulted in the eCryptfs target inode staying around, rather than being
evicted, because i_nlink was not updated for the eCryptfs inode. This
also meant that eCryptfs didn't do the final iput() on the lower target
inode so it stayed around, as well. This would result in a failure to
free up space occupied by the target file in the rename() operation.
Both target inodes would eventually be evicted when the eCryptfs
filesystem was unmounted.
This patch calls fsstack_copy_attr_all() after the lower filesystem
does its ->rename() so that important inode attributes, such as i_nlink,
are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
and eCryptfs can drop its final reference on the lower inode.
http://launchpad.net/bugs/561129
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 72d3eb13b5 upstream.
This netconsole_target_put() is obviously redundant, and it
causes a kernel segfault when removing a bridge device which has
netconsole running on it.
This is caused by:
commit 8d8fc29d02
Author: Amerigo Wang <amwang@redhat.com>
Date: Thu May 19 21:39:10 2011 +0000
netpoll: disable netpoll when enslave a device
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b161dfa693 upstream.
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock. Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.
The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed. This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.
This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.
IBM reported successful test results with this patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55815f7014 upstream.
We already use them for openat() and friends, but fstat() also wants to
be able to use O_PATH file descriptors. This should make it more
directly comparable to the O_SEARCH of Solaris.
Note that you could already do the same thing with "fstatat()" and an
empty path, but just doing "fstat()" directly is simpler and faster, so
there is no reason not to just allow it directly.
See also commit 332a2e1244, which did the same thing for fchdir, for
the same reasons.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2453f5f992 upstream.
If a command completes with a status of CMD_PROTOCOL_ERR, this
information should be conveyed to the SCSI mid layer, not dropped
on the floor. Unlike a similar bug in the hpsa driver, this bug
only affects tape drives and CD and DVD ROM drives in the cciss
driver, and to induce it, you have to disconnect (or damage) a
cable, so it is not a very likely scenario (which would explain
why the bug has gone undetected for the last 10 years.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6889125b8b upstream.
powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed. The
function migrates the kworker back to the original CPU but this is
still broken. Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.
It is unclear why this bug is being reported now. Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change. Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting. Maybe something made
contention on fidvid_mutex more likely? I don't know.
This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one. The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.
stable: ed48ece27c ("workqueue: reimplement work_on_cpu() using
system_wq") should be applied before this; otherwise, the
behavior could be horrible.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Tested-by: Duncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed48ece27c upstream.
The existing work_on_cpu() implementation is hugely inefficient. It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.
Now that system_wq can handle concurrent executions, there's no
advantage of doing this. Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.
stable: While this isn't a fix in itself, it's needed to fix a
workqueue related bug in cpufreq/powernow-k8. AFAICS, this
shouldn't break other existing users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acbb219d5f ]
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.
Tested in Linux 3.4.8.
Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 99469c32f7 ]
Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.
Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 20e1db19db ]
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.
The userspace process usually verifies the legitimate origin in
two ways:
a) Socket credentials. If UID != 0, then the message comes from
some ilegitimate process and the message needs to be dropped.
b) Netlink portID. In general, portID == 0 means that the origin
of the messages comes from the kernel. Thus, discarding any
message not coming from the kernel.
However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.
Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.
This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.
Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 43da5f2e0d ]
The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d8a041b7b ]
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7b07f8eb75 ]
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3592aaeb80 ]
The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.
Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 792039c73c ]
The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9344a97296 ]
The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f9432c5ec8 ]
The RFCOMM code fails to initialize the two padding bytes of struct
rfcomm_dev_list_req inserted for alignment before copying it to
userland. Additionally there are two padding bytes in each instance of
struct rfcomm_dev_info. The ioctl() that for disclosures two bytes plus
dev_num times two bytes uninitialized kernel heap memory.
Allocate the memory using kzalloc() to fix this issue.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e15ca9a0ef ]
The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3c0c5cfdcd ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e862f1a9b7 ]
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7f5c3e3a80 ]
Here's a quote of the comment about the BUG macro from asm-generic/bug.h:
Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of. If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.
If you're tempted to BUG(), think again: is completely giving up
really the *only* solution? There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.
In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.
Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7364e445f6 ]
Do not leak memory by updating pointer with potentially NULL realloc return value.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 696ecdc106 ]
gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.
So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1485348d24 ]
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7e6d06f0de ]
Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 30b678d844 ]
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 43ca6cb28c upstream.
The old interface is bugged and reads the wrong sensor when retrieving
the reading for the chassis fan (it reads the CPU sensor); the new
interface works fine.
Reported-by: Göran Uddeborg <goeran@uddeborg.se>
Tested-by: Göran Uddeborg <goeran@uddeborg.se>
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 276bdb82de upstream.
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bba3d8c3b3 upstream.
The following build error occured during a parisc build with
swap-over-NFS patches applied.
net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9e67d4837 upstream.
In some cases fuse_retrieve() would return a short byte count if offset was
non-zero. The data returned was correct, though.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 156bddd8e5 upstream.
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.
Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9c2fc0de1a upstream.
When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).
Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.
Reported-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14216561e1 upstream.
This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
SAT-2 says (section 8.12.2)
if the device is in the stopped state as the result of
processing a START STOP UNIT command (see 9.11), then the SATL
shall terminate the TEST UNIT READY command with CHECK CONDITION
status with the sense key set to NOT READY and the additional
sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
REQUIRED;
mpt2sas internal SATL seems to implement this. The result is very confusing
standby behaviour (using hdparm -y). If you suspend a drive and then send
another command, usually it wakes up. However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands. This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back. If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.
This bit us badly because
commit 85ef06d1d2
Author: Tejun Heo <tj@kernel.org>
Date: Fri Jul 1 16:17:47 2011 +0200
block: flush MEDIA_CHANGE from drivers on close(2)
Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.
The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.
The fix for this is twofold:
1. Set the allow_restart flag so we wake the drive when we see it has been
suspended
2. Return all TEST UNIT READY status directly to the mid layer without any
further error handling which prevents us causing error handling which
may offline the device just because of a media check TUR.
Reported-by: Matthias Prager <linux@matthiasprager.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 338b131a32 upstream.
If the specified max_queue_depth setting is less than the
expected number of internal commands, then driver will calculate
the queue depth size to a negitive number. This negitive number
is actually a very large number because variable is unsigned
16bit integer. So, the driver will ask for a very large amount of
memory for message frames and resulting into oops as memory
allocation routines will not able to handle such a large request.
So, in order to limit this kind of oops, The driver need to set
the max_queue_depth to a scsi mid layer's can_queue value. Then
the overall message frames required for IO is minimum of either
(max_queue_depth plus internal commands) or the IOC global
credits.
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd8d6dd43a upstream.
The following patch moves the poll_aen_lock initializer from
megasas_probe_one() to megasas_init(). This prevents a crash when a user
loads the driver and tries to issue a poll() system call on the ioctl
interface with no adapters present.
Signed-off-by: Kashyap Desai <Kashyap.Desai@lsi.com>
Signed-off-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1021cb268b upstream.
If the default DSCR is non zero we set thread.dscr_inherit in
copy_thread() meaning the new thread and all its children will ignore
future updates to the default DSCR. This is not intended and is
a change in behaviour that a number of our users have hit.
We just need to inherit thread.dscr and thread.dscr_inherit from
the parent which ends up being much simpler.
This was found with the following test case:
http://ozlabs.org/~anton/junkcode/dscr_default_test.c
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99f347caa4 upstream.
If a device specifies zero endpoints in its interface descriptor,
the kernel oopses in acm_probe(). Even though that's clearly an
invalid descriptor, we should test wether we have all endpoints.
This is especially bad as this oops can be triggered by just
plugging a USB device in.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c4167cbb upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Christian Lamparter <chunkeey@googlemail.com>
CC: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e694d51888 upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Hans de Goede <hdegoede@redhat.com>
CC: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 676ce6d5ca upstream.
Commit 91f68c89d8 ("block: fix infinite loop in __getblk_slow")
is not good: a successful call to grow_buffers() cannot guarantee
that the page won't be reclaimed before the immediate next call to
__find_get_block(), which is why there was always a loop there.
Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595:
inode #19278: block 664: comm cc1: unable to read itable block" on console,
which pointed to this commit.
I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs
sometimes fails from a missing header file, under memory pressure on
ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7
itself, despite that commit being in there: bisection pointed to an
irrelevant pinctrl merge, but hard to tell when failure takes between
18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2).
(I've since found such __ext4_get_inode_loc errors in /var/log/messages
from previous weeks: why the message never appeared on console until
yesterday morning is a mystery for another day.)
Revert 91f68c89d8, restoring __getblk_slow() to how it was (plus
a checkpatch nitfix). Simplify the interface between grow_buffers()
and grow_dev_page(), and avoid the infinite loop beyond end of device
by instead checking init_page_buffers()'s end_block there (I presume
that's more efficient than a repeated call to blkdev_max_block()),
returning -ENXIO to __getblk_slow() in that case.
And remove akpm's ten-year-old "__getblk() cannot fail ... weird"
comment, but that is worrying: are all users of __getblk() really
now prepared for a NULL bh beyond end of device, or will some oops??
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0b68c8e2c3 upstream.
Commit dbf0e4c (PCI: EHCI: fix crash during suspend on ASUS
computers) added a workaround for an ASUS suspend issue related to
USB EHCI and a bug in a number of ASUS BIOSes that attempt to shut
down the EHCI controller during system suspend if its PCI command
register doesn't contain 0 at that time.
It turns out that the same workaround is necessary in the analogous
hibernation code path, so add it.
References: https://bugzilla.kernel.org/show_bug.cgi?id=45811
Reported-and-tested-by: Oleksij Rempel <bug-track@fisher-privat.net>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1352fde56 upstream.
ath_rx_tasklet() calls ath9k_rx_skb_preprocess() and ath9k_rx_skb_postprocess()
in a loop over the received frames. The decrypt_error flag is
initialized to false
just outside ath_rx_tasklet() loop. ath9k_rx_accept(), called by
ath9k_rx_skb_preprocess(),
only sets decrypt_error to true and never to false.
Then ath_rx_tasklet() calls ath9k_rx_skb_postprocess() and passes
decrypt_error to it.
So, after a decryption error, in ath9k_rx_skb_postprocess(), we can
have a leftover value
from another processed frame. In that case, the frame will not be marked with
RX_FLAG_DECRYPTED even if it is decrypted correctly.
When using CCMP encryption this issue can lead to connection stuck
because of CCMP
PN corruption and a waste of CPU time since mac80211 tries to decrypt an already
deciphered frame with ieee80211_aes_ccm_decrypt.
Fix the issue initializing decrypt_error flag at the begging of the
ath_rx_tasklet() loop.
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f06f00a24d upstream.
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.
Symptoms were data corruption preceded by svc_tcp_sendto logging
something like
kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d10f27a750 upstream.
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.
It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.
Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available. If it finds that there is not
space, it then subtracts the estimate back out.
This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.
The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be1e44441a upstream.
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).
svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY. However,
that means the inconsistency must be repaired before dropping XPT_BUSY.
Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.
Symptoms were a BUG() in svc_tcp_clear_pages.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2140fc0cb upstream.
Refcounting of fsnotify_mark in audit tree is broken. E.g:
refcount
create_chunk
alloc_chunk 1
fsnotify_add_mark 2
untag_chunk
fsnotify_get_mark 3
fsnotify_destroy_mark
audit_tree_freeing_mark 2
fsnotify_put_mark 1
fsnotify_put_mark 0
via destroy_list
fsnotify_mark_destroy -1
This was reported by various people as triggering Oops when stopping auditd.
We could just remove the put_mark from audit_tree_freeing_mark() but that would
break freeing via inode destruction. So this patch simply omits a put_mark
after calling destroy_mark or adds a get_mark before.
The additional get_mark is necessary where there's no other put_mark after
fsnotify_destroy_mark() since it assumes that the caller is holding a reference
(or the inode is keeping the mark pinned, not the case here AFAICS).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Valentin Avram <aval13@gmail.com>
Reported-by: Peter Moody <pmoody@google.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fe33aae0e upstream.
Don't do free_chunk() after fsnotify_add_mark(). That one does a delayed unref
via the destroy list and this results in use-after-free.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47fbf7976e upstream.
Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
disconnected data server) we've been sending layoutreturn calls
while there is potentially still outstanding I/O to the data
servers. The reason we do this is to avoid races between replayed
writes to the MDS and the original writes to the DS.
When this happens, the BUG_ON() in nfs4_layoutreturn_done can
be triggered because it assumes that we would never call
layoutreturn without knowing that all I/O to the DS is
finished. The fix is to remove the BUG_ON() now that the
assumptions behind the test are obsolete.
Reported-by: Boaz Harrosh <bharrosh@panasas.com>
Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0866004304 upstream.
If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.
Reported-by: Yuanming Chen <hikvision_linux@163.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb48c07146 upstream.
Each page mapped in a process's address space must be correctly
accounted for in _mapcount. Normally the rules for this are
straightforward but hugetlbfs page table sharing is different. The page
table pages at the PMD level are reference counted while the mapcount
remains the same.
If this accounting is wrong, it causes bugs like this one reported by
Larry Woodman:
kernel BUG at mm/filemap.c:135!
invalid opcode: 0000 [#1] SMP
CPU 22
Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
RIP: 0010:[<ffffffff8112cfed>] [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
Call Trace:
delete_from_page_cache+0x40/0x80
truncate_hugepages+0x115/0x1f0
hugetlbfs_evict_inode+0x18/0x30
evict+0x9f/0x1b0
iput_final+0xe3/0x1e0
iput+0x3e/0x50
d_kill+0xf8/0x110
dput+0xe2/0x1b0
__fput+0x162/0x240
During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
shared page tables with the check dst_pte == src_pte. The logic is if
the PMD page is the same, they must be shared. This assumes that the
sharing is between the parent and child. However, if the sharing is
with a different process entirely then this check fails as in this
diagram:
parent
|
------------>pmd
src_pte----------> data page
^
other--------->pmd--------------------|
^
child-----------|
dst_pte
For this situation to occur, it must be possible for Parent and Other to
have faulted and failed to share page tables with each other. This is
possible due to the following style of race.
PROC A PROC B
copy_hugetlb_page_range copy_hugetlb_page_range
src_pte == huge_pte_offset src_pte == huge_pte_offset
!src_pte so no sharing !src_pte so no sharing
(time passes)
hugetlb_fault hugetlb_fault
huge_pte_alloc huge_pte_alloc
huge_pmd_share huge_pmd_share
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
LOCK(i_mmap_mutex)
find nothing, no sharing
UNLOCK(i_mmap_mutex)
pmd_alloc pmd_alloc
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
LOCK(instantiation_mutex)
fault
UNLOCK(instantiation_mutex)
These two processes are not poing to the same data page but are not
sharing page tables because the opportunity was missed. When either
process later forks, the src_pte == dst pte is potentially insufficient.
As the check falls through, the wrong PTE information is copied in
(harmless but wrong) and the mapcount is bumped for a page mapped by a
shared page table leading to the BUG_ON.
This patch addresses the issue by moving pmd_alloc into huge_pmd_share
which guarantees that the shared pud is populated in the same critical
section as pmd. This also means that huge_pte_offset test in
huge_pmd_share is serialized correctly now which in turn means that the
success of the sharing will be higher as the racing tasks see the pud
and pmd populated together.
Race identified and changelog written mostly by Mel Gorman.
{akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
Reported-by: Larry Woodman <lwoodman@redhat.com>
Tested-by: Larry Woodman <lwoodman@redhat.com>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Ken Chen <kenchen@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2fa3ccd7b upstream.
Currently we export SOCK_NONBLOCK to user space but that conflicts with
the definition from glibc leading to compilation errors in user programs
(e.g. see Debian bug #658460).
The generic socket.h restricts the definition of SOCK_NONBLOCK to the
kernel, as does the MIPS specific socket.h, so let's do the same on
Alpha.
Signed-off-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e68726ff72 upstream.
Userspace can pass weird create mode in open(2) that we canonicalize to
"(mode & S_IALLUGO) | S_IFREG" in vfs_create().
The problem is that we use the uncanonicalized mode before calling vfs_create()
with unforseen consequences.
So do the canonicalization early in build_open_flags().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ccf795847a upstream.
Currently the microphone input source is not selectable as while there is
a DAPM widget it's not connected to anything so it won't be properly
instantiated. Add something more correct for the input structure to get
things going, even though it's not hooked into the rest of the routing
map and so won't actually achieve anything except allowing the relevant
register bits to be written.
Reported-by: Christop Fritz <chf.fritz@googlemail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f637c4c940 upstream.
The i.MX cpufreq implementation uses the CPU_FREQ_TABLE helpers,
so it needs to select that code to be built. This problem has
apparently existed since the i.MX cpufreq code was first merged
in v2.6.37.
Building IMX without CPU_FREQ_TABLE results in:
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_exit':
arch/arm/plat-mxc/cpufreq.c:173: undefined reference to `cpufreq_frequency_table_put_attr'
arch/arm/plat-mxc/built-in.o: In function `mxc_set_target':
arch/arm/plat-mxc/cpufreq.c:84: undefined reference to `cpufreq_frequency_table_target'
arch/arm/plat-mxc/built-in.o: In function `mxc_verify_speed':
arch/arm/plat-mxc/cpufreq.c:65: undefined reference to `cpufreq_frequency_table_verify'
arch/arm/plat-mxc/built-in.o: In function `mxc_cpufreq_init':
arch/arm/plat-mxc/cpufreq.c:154: undefined reference to `cpufreq_frequency_table_cpuinfo'
arch/arm/plat-mxc/cpufreq.c:162: undefined reference to `cpufreq_frequency_table_get_attr'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Yong Shen <yong.shen@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b01858c780 upstream.
Commit d670ac019f (ARM: SAMSUNG: DMA Cleanup as per sparse) changed the
prototype of the s3c2410_dma_* functions to use the enum dma_ch instead
of an generic unsigned int.
In the s3c24xx dma.c s3c2410_dma_enqueue seems to have been forgotten,
the other functions there were changed correctly.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 730a8128cd upstream.
Commit 5a783cbc48 ("ARM: 7478/1: errata: extend workaround for erratum
#720789") added workarounds for erratum #720789 to the range TLB
invalidation functions with the observation that the erratum only
affects SMP platforms. However, when running an SMP_ON_UP kernel on a
uniprocessor platform we must take care to preserve the ASID as the
workaround is not required.
This patch ensures that we don't set the ASID to 0 when flushing the TLB
on such a system, preserving the original behaviour with the workaround
disabled.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f5f2025ef3 upstream.
Page migration encodes the pfn in the offset field of a swp_entry_t.
For LPAE, we support physical addresses of up to 36 bits (due to
sparsemem limitations with the size of page flags), requiring 24 bits
to represent a pfn. A further 3 bits are used to encode a swp_entry into
a pte, leaving 5 bits for the type field. Furthermore, the core code
defines MAX_SWAPFILES_SHIFT as 5, so the additional type bit does not
get used.
This patch reduces the width of the type field to 5 bits, allowing us
to create up to 31 swapfiles of 64GB each.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 47f1204329 upstream.
Swap entries are encoding in ptes such that !pte_present(pte) and
pte_file(pte). The remaining bits of the descriptor are used to identify
the swapfile and offset within it to the swap entry.
When writing such a pte for a user virtual address, set_pte_at
unconditionally sets the nG bit, which (in the case of LPAE) will
corrupt the swapfile offset and lead to a BUG:
[ 140.494067] swap_free: Unused swap offset entry 000763b4
[ 140.509989] BUG: Bad page map in process rs:main Q:Reg pte:0ec76800 pmd:8f92e003
This patch fixes the problem by only setting the nG bit for user
mappings that are actually present.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 83957df21d upstream.
This structure needs to always stick around, even if CONFIG_HOTPLUG
is disabled, otherwise we can oops when trying to probe a device that
was added after the structure is thrown away.
Thanks to Fengguang Wu and Bjørn Mork for tracking this issue down.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Bjørn Mork <bjorn@mork.no>
CC: Paul Gortmaker <paul.gortmaker@windriver.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c263b92f8 upstream.
* Use the buffer content length as opposed to the total buffer size. This can
be a real problem when using the mos7840 as a usb serial-console as all
kernel output is truncated during boot.
Signed-off-by: Mark Ferrell <mferrell@uplogix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1b5c997e6 upstream.
The ZTE (Vodafone) K5006-Z use the following
interface layout:
00 DIAG
01 secondary
02 modem
03 networkcard
04 storage
Ignoring interface #3 which is handled by the qmi_wwan
driver.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee6f827df9 upstream.
In this patch, we add new declarations into option.c to support the new
interfaces of Huawei Data Card devices. And at the same time, remove the
redundant declarations from option.c.
Signed-off-by: fangxiaozhi <huananhu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d81a5d1956 upstream.
A lot of Broadcom Bluetooth devices provides vendor specific interface
class and we are getting flooded by patches adding new device support.
This change will help us enable support for any other Broadcom with vendor
specific device that arrives in the future.
Only the product id changes for those devices, so this macro would be
perfect for us:
{ USB_VENDOR_AND_INTERFACE_INFO(0x0a5c, 0xff, 0x01, 0x01) }
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Henrik Rydberg <rydberg@bitmath.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e95829f474 upstream.
The Intel desktop boards DH77EB and DH77DF have a hardware issue that
can be worked around by BIOS. If the USB ports are switched to xHCI on
shutdown, the xHCI host will send a spurious interrupt, which will wake
the system. Some BIOS will work around this, but not all.
The bug can be avoided if the USB ports are switched back to EHCI on
shutdown. The Intel Windows driver switches the ports back to EHCI, so
change the Linux xHCI driver to do the same.
Unfortunately, we can't tell the two effected boards apart from other
working motherboards, because the vendors will change the DMI strings
for the DH77EB and DH77DF boards to their own custom names. One example
is Compulab's mini-desktop, the Intense-PC. Instead, key off the
Panther Point xHCI host PCI vendor and device ID, and switch the ports
over for all PPT xHCI hosts.
The only impact this will have on non-effected boards is to add a couple
hundred milliseconds delay on boot when the BIOS has to switch the ports
over from EHCI to xHCI.
This patch should be backported to kernels as old as 3.0, that contain
the commit 69e848c209 "Intel xhci: Support
EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Denis Turischev <denis@compulab.co.il>
Tested-by: Denis Turischev <denis@compulab.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 22ceac1912 upstream.
The NEC/Renesas 720201 xHCI host controller does not complete its reset
within 250 milliseconds. In fact, it takes about 9 seconds to reset the
host controller, and 1 second for the host to be ready for doorbell
rings. Extend the reset and CNR polling timeout to 10 seconds each.
This patch should be backported to kernels as old as 2.6.31, that
contain the commit 66d4eadd8d "USB: xhci:
BIOS handoff and HW initialization."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Edwin Klein Mentink <e.kleinmentink@zonnet.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cb7df2b2d upstream.
Gary reports that with recent kernels, he notices more xHCI driver
warnings:
xhci_hcd 0000:03:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk?
We think his Etron xHCI host controller may have the same buggy behavior
as the Fresco Logic xHCI host. When a short transfer is received, the
host will mark the transfer as successfully completed when it should be
marking it with a short completion.
Fix this by turning on the XHCI_TRUST_TX_LENGTH quirk when the Etron
host is discovered. Note that Gary has revision 1, but if Etron fixes
this bug in future revisions, the quirk will have no effect.
This patch should be backported to kernels as old as 2.6.36, that
contain a backported version of commit
1530bbc627 "xhci: Add new short TX quirk
for Fresco Logic host."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Gary E. Miller <gem@rellim.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7e731bc9a1 upstream.
Commit 03179fe923 introduced a kmemcheck complaint in
ext4_da_get_block_prep() because we save and restore
ei->i_da_metadata_calc_last_lblock even though it is left
uninitialized in the case where i_da_metadata_calc_len is zero.
This doesn't hurt anything, but silencing the kmemcheck complaint
makes it easier for people to find real bugs.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
(which is marked as a regression).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81ee8fb6b5 upstream.
It seems we can not update the crtc scanout address. After disabling
crtc, update to base address do not take effect after crtc being
reenable leading to at least frame being scanout from the old crtc
base address. Disabling crtc display request lead to same behavior.
So after changing the vram address if we don't keep crtc disabled
we will have the GPU trying to read some random system memory address
with some iommu this will broke the crtc engine and will lead to
broken display and iommu error message.
So to avoid this, disable crtc. For flicker less boot we will need
to avoid moving the vram start address.
This patch should also fix :
https://bugs.freedesktop.org/show_bug.cgi?id=42373
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e0d95c04 upstream.
When the frontend and the backend reside on the same domain, even if we
add pages to the m2p_override, these pages will never be returned by
mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
always fail, so the pfn of the frontend will be returned instead
(resulting in a deadlock because the frontend pages are already locked).
INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-system-i38 D ffff8800cfc137c0 0 1085 1 0x00000000
ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
Call Trace:
[<ffffffff81101ee0>] ? __lock_page+0x70/0x70
[<ffffffff81a0fdd9>] schedule+0x29/0x70
[<ffffffff81a0fe80>] io_schedule+0x60/0x80
[<ffffffff81101eee>] sleep_on_page+0xe/0x20
[<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81101ed7>] __lock_page+0x67/0x70
[<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
[<ffffffff811867e6>] ? bio_add_page+0x36/0x40
[<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
[<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
[<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
[<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
[<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
[<ffffffff81104027>] generic_file_aio_read+0x737/0x780
[<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
[<ffffffff811038f0>] ? find_get_pages+0x150/0x150
[<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
[<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
[<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
[<ffffffff811998b8>] do_io_submit+0x708/0xb90
[<ffffffff81199d50>] sys_io_submit+0x10/0x20
[<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b
The explanation is in the comment within the code:
We need to do this because the pages shared by the frontend
(xen-blkfront) can be already locked (lock_page, called by
do_read_cache_page); when the userspace backend tries to use them
with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
do_blockdev_direct_IO is going to try to lock the same pages
again resulting in a deadlock.
A simplified call graph looks like this:
pygrub QEMU
-----------------------------------------------
do_read_cache_page io_submit
| |
lock_page ext3_direct_IO
|
bio_add_page
|
lock_page
Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
a 'struct page' to have a different MFN (so that it can point to another
guest). It also can easily find out whether another pfn corresponding
to the mfn exists in the m2p, and can set the FOREIGN bit
in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.
This allows the backend to perform direct_IO on these pages, but as a
side effect prevents the frontend from using get_user_pages_fast on
them while they are being shared with the backend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb6ccff667 upstream.
Commit 7572777eef attempted to verify that
the total iovec from the client doesn't overflow iov_length() but it
only checked the first element. The iovec could still overflow by
starting with a small element. The obvious fix is to check all the
elements.
The overflow case doesn't look dangerous to the kernel as the copy is
limited by the length after the overflow. This fix restores the
intention of returning an error instead of successfully copying less
than the iovec represented.
I found this by code inspection. I built it but don't have a test case.
I'm cc:ing stable because the initial commit did as well.
Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e858712185 upstream.
The native 31 bit and the compat behaviour for the mmap system calls differ:
In native 31 bit mode the passed in address for the mmap system call will be
unmodified passed to sys_mmap_pgoff().
In compat mode however the passed in address will be modified with
compat_ptr() which masks out the most significant bit.
The result is that in native 31 bit mode each mmap request (with MAP_FIXED)
will fail where the most significat bit is set, while in compat mode it
may succeed.
This odd behaviour was introduced with d3815898 "[S390] mmap: add missing
compat_ptr conversion to both mmap compat syscalls".
To restore a consistent behaviour accross native and compat mode this
patch functionally reverts the above mentioned commit.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6dc463511d upstream.
Bamboo One's with ID of 0x6a and 0x6b were added with correct
indication of 1024 pressure levels but the Graphire packet routine
was only looking at 9 bits. Increased to 10 bits.
This bug caused these devices to roll over to zero pressure at half
way mark.
The other devices using this routine only support 256 or 512 range
and look to fix unused bits at zero.
Signed-off-by: Chris Bagwell <chris@cnpbagwell.com>
Reported-by: Tushant Mirchandani <tushantin@gmail.com>
Reviewed-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
partial of commit 8e8b41f9d8 upstream.
As part of commit 463454b5db ("cfg80211: fix interface
combinations check"), this extra check was introduced:
if ((all_iftypes & used_iftypes) != used_iftypes)
goto cont;
However, most wireless NIC drivers did not advertise ADHOC in
wiphy.iface_combinations[i].limits[] and hence we'll get -EBUSY
when we bring up a ADHOC wlan with commands similar to:
# iwconfig wlan0 mode ad-hoc && ifconfig wlan0 up
In commit 8e8b41f9d8 ("cfg80211: enforce lack of interface
combinations"), the change below fixes the issue:
if (total == 1)
return 0;
But it also introduces other dependencies for stable. For example,
a full cherry pick of 8e8b41f9d8 would introduce additional
regressions unless we also start cherry picking driver specific
fixes like the following:
9b4760e ath5k: add possible wiphy interface combinations
1ae2fc2 mac80211_hwsim: advertise interface combinations
20c8e8d ath9k: add possible wiphy interface combinations
And the purpose of the 'if (total == 1)' is to cover the specific
use case (IBSS, adhoc) that was mentioned above. So we just pick
the specific part out from 8e8b41f9d8 here.
Doing so gives stable kernels a way to fix the change introduced
by 463454b5db, without having to make cherry picks specific to
various NIC drivers.
Signed-off-by: Liang Li <liang.li@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f6fc43e62 upstream.
libertas currently calls cfg80211_disconnected() when it is being
brought down. This causes an event to be allocated, but since the
wdev is already removed from the rdev by the time that the event
processing work executes, the event is never processed or freed.
http://article.gmane.org/gmane.linux.kernel.wireless.general/95666
Fix this leak, and other possible situations, by processing the event
queue when a device is being unregistered. Thanks to Johannes Berg for
the suggestion.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 59ee93a528 upstream.
The irq_to_gpio function was removed from the pxa platform
in linux-3.2, and this driver has been broken since.
There is actually no in-tree user of this driver that adds
this platform device, but the driver can and does get enabled
on some platforms.
Without this patch, building ezx_defconfig results in:
drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work':
drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Daniel Ribeiro <drwyrm@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bed491c8d upstream.
The CONFIG_DEFAULT_MMAP_MIN_ADDR was set to 65536 in mxs_defconfig,
this caused severe breakage of userland applications since the upper
limit for ARM is 32768. By default CONFIG_DEFAULT_MMAP_MIN_ADDR is
set to 4096 and can also be changed via /proc/sys/vm/mmap_min_addr
if needed.
Quoting Russell King [1]:
"4096 is also fine for ARM too. There's not much point in having
defconfigs change it - that would just be pure noise in the config
files."
the CONFIG_DEFAULT_MMAP_MIN_ADDR can be removed from the defconfig
altogether.
This problem was introduced by commit cde7c41 (ARM: configs: add
defconfig for mach-mxs).
[1] http://marc.info/?l=linux-arm-kernel&m=134401593807820&w=2
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Wolfgang Denk <wd@denx.de>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d833352a43 upstream.
If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted. Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log. The final teardown will trigger a
BUG_ON in mm/filemap.c.
This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37. It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages
look like this;
[ ..........] Lots of bad pmd messages followed by this
[ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[ 127.186778] ------------[ cut here ]------------
[ 127.186781] kernel BUG at mm/filemap.c:134!
[ 127.186782] invalid opcode: 0000 [#1] SMP
[ 127.186783] CPU 7
[ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[ 127.186801]
[ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002
[ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[ 127.186821] Stack:
[ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[ 127.186827] Call Trace:
[ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0
[ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50
[ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130
[ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0
[ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230
[ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30
[ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80
[ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360
[ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186870] RSP <ffff8804144b5c08>
[ 127.186871] ---[ end trace 7cbac5d1db69f426 ]---
The bug is a race and not always easy to reproduce. To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.
$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done
On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted. The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON. shutdown will also hang and needs a
hard reset or a sysrq-b.
The basic problem is a race between page table sharing and teardown. For
the most part page table sharing depends on i_mmap_mutex. In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.
Unfortunately it appears to be also insufficient. Consider the following
situation
Process A Process B
--------- ---------
hugetlb_fault shmdt
LockWrite(mmap_sem)
do_munmap
unmap_region
unmap_vmas
unmap_single_vma
unmap_hugepage_range
Lock(i_mmap_mutex)
Lock(mm->page_table_lock)
huge_pmd_unshare/unmap tables <--- (1)
Unlock(mm->page_table_lock)
Unlock(i_mmap_mutex)
huge_pte_alloc ...
Lock(i_mmap_mutex) ...
vma_prio_walk, find svma, spte ...
Lock(mm->page_table_lock) ...
share spte ...
Unlock(mm->page_table_lock) ...
Unlock(i_mmap_mutex) ...
hugetlb_no_page <--- (2)
free_pgtables
unlink_file_vma
hugetlb_free_pgd_range
remove_vma_list
In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down. The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables. At (1) above,
the page tables are not shared yet so it unmaps the PMDs. Process A sets
up page table sharing and at (2) faults a new entry. Process B then trips
up on it in free_pgtables.
This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed. This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA
ineligible for sharing and avoids the race. Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).
This should be treated as a -stable candidate if it is merged.
Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.
==== CUT HERE ====
static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;
unsigned int get_random(unsigned int max)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srandom(tv.tv_usec);
return random() % max;
}
static void play(void *addr, size_t size)
{
unsigned char *start = addr,
*end = start + size,
*a;
start += get_random(size/2);
/* we could itterate on huge pages but let's give it more time. */
for (a = start; a < end; a += 4096)
*a = 0;
}
int main(int argc, char **argv)
{
key_t key = IPC_PRIVATE;
size_t sizeA = nr_huge_page_A * huge_page_size;
size_t sizeB = nr_huge_page_B * huge_page_size;
int shmidA, shmidB;
void *addrA = NULL, *addrB = NULL;
int nr_children = 300, n = 0;
if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
fork_child:
switch(fork()) {
case 0:
switch (n%3) {
case 0:
play(addrA, sizeA);
break;
case 1:
play(addrB, sizeB);
break;
case 2:
break;
}
break;
case -1:
perror("fork:");
break;
default:
if (++n < nr_children)
goto fork_child;
play(addrA, sizeA);
break;
}
shmdt(addrA);
shmdt(addrB);
do {
wait(NULL);
} while (--n > 0);
shmctl(shmidA, IPC_RMID, NULL);
shmctl(shmidB, IPC_RMID, NULL);
return 0;
}
[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9fc3f778a upstream.
Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.
So turn off the possibility of doing that per core and allow it only
system-wide.
This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:
$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...
and disable the interface on the other cores:
$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument
Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.
A more generic fix will follow.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2e7c96af1 upstream.
Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf(). This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.
[ Commit description modified by tytso to remove an extended
advertisement for the RDRAND instruction. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: DJ Johnston <dj.johnston@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cbc96b7594 upstream.
Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.
However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().
Add a big fat comment to rand_initialize() spelling out
this requirement.
Suggested-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c5857ccf29 upstream.
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 27130f0cc3 upstream.
wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9dccf55f4c upstream.
The tamper evident features of the RTC include the "write counter" which
is a pseudo-random number regenerated whenever we set the RTC. Since this
value is unpredictable it should provide some useful seeding to the random
number generator.
Only do this on boot since the goal is to seed the pool rather than add
useful entropy.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 330e0a01d5 upstream.
Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2557a303a upstream.
Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present. Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.
The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door. (For example, an
increasing counter encrypted by an AES key known to the NSA.)
It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
--- especially since Bull Mountain is documented to use AES as a
whitener. Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel. Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.
Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.
This way we get almost of the benefits of the HW RNG without any
potential liabilities. The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.
For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6d4947b12 upstream.
If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.
Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2080a67ab upstream.
Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot). This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).
[ Modified by tytso to mix in a timestamp, since there may be some
variability caused by the time needed to detect/configure the hardware
in question. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 902c098a36 upstream.
The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 775f4b297b upstream.
We've been moving away from add_interrupt_randomness() for various
reasons: it's too expensive to do on every interrupt, and flooding the
CPU with interrupts could theoretically cause bogus floods of entropy
from a somewhat externally controllable source.
This solves both problems by limiting the actual randomness addition
to just once a second or after 64 interrupts, whicever comes first.
During that time, the interrupt cycle data is buffered up in a per-cpu
pool. Also, we make sure the the nonblocking pool used by urandom is
initialized before we start feeding the normal input pool. This
assures that /dev/urandom is returning unpredictable data as soon as
possible.
(Based on an original patch by Linus, but significantly modified by
tytso.)
Tested-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Eric Wustrow <ewust@umich.edu>
Reported-by: Nadia Heninger <nadiah@cs.ucsd.edu>
Reported-by: Zakir Durumeric <zakir@umich.edu>
Reported-by: J. Alex Halderman <jhalderm@umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44e4360fa3 upstream.
/proc/sys/kernel/random/boot_id can be read concurrently by userspace
processes. If two (or more) user-space processes concurrently read
boot_id when sysctl_bootid is not yet assigned, a race can occur making
boot_id differ between the reads. Because the whole point of the boot id
is to be unique across a kernel execution, fix this by protecting this
operation with a spinlock.
Given that this operation is not frequently used, hitting the spinlock
on each call should not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3e88bdff1c upstream.
If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores. Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-git-send-email-tytso@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf833d0b99 upstream.
We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.
Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".
And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.
So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.
Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGbK18zwKDwvwqnravA84A@mail.gmail.com
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd29e568a4 upstream.
If there is an architecture-specific random number generator we use it
to acquire randomness one "long" at a time. We should put these random
words into consecutive words in the result buffer - not just overwrite
the first word again and again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63d7717326 upstream.
Add support for architecture-specific hooks into the kernel-directed
random number generator interfaces. This patchset does not use the
architecture random number generator interfaces for the
userspace-directed interfaces (/dev/random and /dev/urandom), thus
eliminating the need to distinguish between them based on a pool
pointer.
Changes in version 3:
- Moved the hooks from extract_entropy() to get_random_bytes().
- Changes the hooks to inlines.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd4c9260e7 upstream.
The mesh path timer needs to be canceled when
leaving the mesh as otherwise it could fire
after the interface has been removed already.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ad3d901bb upstream.
mmu_notifier_release() is called when the process is exiting. It will
delete all the mmu notifiers. But at this time the page belonging to the
process is still present in page tables and is present on the LRU list, so
this race will happen:
CPU 0 CPU 1
mmu_notifier_release: try_to_unmap:
hlist_del_init_rcu(&mn->hlist);
ptep_clear_flush_notify:
mmu nofifler not found
free page !!!!!!
/*
* At the point, the page has been
* freed, but it is still mapped in
* the secondary MMU.
*/
mn->ops->release(mn, mm);
Then the box is not stable and sometimes we can get this bug:
[ 738.075923] BUG: Bad page state in process migrate-perf pfn:03bec
[ 738.075931] page:ffffea00000efb00 count:0 mapcount:0 mapping: (null) index:0x8076
[ 738.075936] page flags: 0x20000000000014(referenced|dirty)
The same issue is present in mmu_notifier_unregister().
We can call ->release before deleting the notifier to ensure the page has
been unmapped from the secondary MMU before it is freed.
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b74253f784 upstream.
The vivt_flush_cache_{range,page} functions check that the mm_struct
of the VMA being flushed has been active on the current CPU before
performing the cache maintenance.
The gate_vma has a NULL mm_struct pointer and, as such, will cause a
kernel fault if we try to flush it with the above operations. This
happens during ELF core dumps, which include the gate_vma as it may be
useful for debugging purposes.
This patch adds checks to the VIVT cache flushing functions so that VMAs
with a NULL mm_struct are flushed unconditionally (the vectors page may
be dirty if we use it to store the current TLS pointer).
Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Tested-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5a783cbc48 upstream.
Commit cdf357f1 ("ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS
operations can broadcast a faulty ASID") replaced by-ASID TLB flushing
operations with all-ASID variants to workaround A9 erratum #720789.
This patch extends the workaround to include the tlb_range operations,
which were overlooked by the original patch.
Tested-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc32f63453 upstream.
Commit a6bc32b899 ("mm: compaction: introduce sync-light migration for
use by compaction") changed the declaration of migrate_pages() and
migrate_huge_pages().
But it missed changing the argument of migrate_huge_pages() in
soft_offline_huge_page(). In this case, we should call
migrate_huge_pages() with MIGRATE_SYNC.
Additionally, there is a mismatch between type the of argument and the
function declaration for migrate_pages().
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4088ac3a upstream.
efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
EFI system table and setup an early console for printk output. The
routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
information.
The call to ioremap is happening early in the boot process which leads
to a panic on x86_64 systems:
panic+0x01ca
do_exit+0x043c
oops_end+0x00a7
no_context+0x0119
__bad_area_nosemaphore+0x0138
bad_area_nosemaphore+0x000e
do_page_fault+0x0321
page_fault+0x0020
reserve_memtype+0x02a1
__ioremap_caller+0x0123
ioremap_nocache+0x0012
efi_setup_pcdp_console+0x002b
setup_arch+0x03a9
start_kernel+0x00d4
x86_64_start_reservations+0x012c
x86_64_start_kernel+0x00fe
This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
with calls to early_ioremap/early_iounmap which can be called during
early boot.
This patch was tested on an x86_64 prototype system which uses the
HCDP/PCDP table for early console setup.
Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Khalid Aziz <khalid.aziz@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 572d8b3945 upstream.
An fs-thaw ioctl causes deadlock with a chcp or mkcp -s command:
chcp D ffff88013870f3d0 0 1325 1324 0x00000004
...
Call Trace:
nilfs_transaction_begin+0x11c/0x1a0 [nilfs2]
wake_up_bit+0x20/0x20
copy_from_user+0x18/0x30 [nilfs2]
nilfs_ioctl_change_cpmode+0x7d/0xcf [nilfs2]
nilfs_ioctl+0x252/0x61a [nilfs2]
do_page_fault+0x311/0x34c
get_unmapped_area+0x132/0x14e
do_vfs_ioctl+0x44b/0x490
__set_task_blocked+0x5a/0x61
vm_mmap_pgoff+0x76/0x87
__set_current_blocked+0x30/0x4a
sys_ioctl+0x4b/0x6f
system_call_fastpath+0x16/0x1b
thaw D ffff88013870d890 0 1352 1351 0x00000004
...
Call Trace:
rwsem_down_failed_common+0xdb/0x10f
call_rwsem_down_write_failed+0x13/0x20
down_write+0x25/0x27
thaw_super+0x13/0x9e
do_vfs_ioctl+0x1f5/0x490
vm_mmap_pgoff+0x76/0x87
sys_ioctl+0x4b/0x6f
filp_close+0x64/0x6c
system_call_fastpath+0x16/0x1b
where the thaw ioctl deadlocked at thaw_super() when called while chcp was
waiting at nilfs_transaction_begin() called from
nilfs_ioctl_change_cpmode(). This deadlock is 100% reproducible.
This is because nilfs_ioctl_change_cpmode() first locks sb->s_umount in
read mode and then waits for unfreezing in nilfs_transaction_begin(),
whereas thaw_super() locks sb->s_umount in write mode. The locking of
sb->s_umount here was intended to make snapshot mounts and the downgrade
of snapshots to checkpoints exclusive.
This fixes the deadlock issue by replacing the sb->s_umount usage in
nilfs_ioctl_change_cpmode() with a dedicated mutex which protects snapshot
mounts.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caea33da89 upstream.
Without this patch kernel will panic on LockD start, because lockd_up() checks
lockd_up_net() result for negative value.
From my pow it's better to return negative value from rpcbind routines instead
of replacing all such checks like in lockd_up().
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a119365586 upstream.
The following build error occured during a ia64 build with
swap-over-NFS patches applied.
net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant
This is identical to a parisc build error. Fengguang Wu, Mel Gorman
and James Bottomley did all the legwork to track the root cause of
the problem. This fix and entire commit log is shamelessly copied
from them with one extra detail to change a dubious runtime use of
ATOMIC_INIT() to atomic_set() in drivers/char/mspec.c
Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });
The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.
The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 141168c36c and
commit 3f806e5098 upstream.
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space. However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files. The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:
text data bss dec hex filename
4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before
4737444 506459 972040 6215943 5ed907 vmlinux.o.after
for a difference of 276 bytes for an example UP config.
If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c663600584 upstream.
Booting a 3.2, 3.3, or 3.4-rc4 kernel on an Atari using the
`nfeth' ethernet device triggers a WARN_ONCE() in generic irq
handling code on the first irq for that device:
WARNING: at kernel/irq/handle.c:146 handle_irq_event_percpu+0x134/0x142()
irq 3 handler nfeth_interrupt+0x0/0x194 enabled interrupts
Modules linked in:
Call Trace: [<000299b2>] warn_slowpath_common+0x48/0x6a
[<000299c0>] warn_slowpath_common+0x56/0x6a
[<00029a4c>] warn_slowpath_fmt+0x2a/0x32
[<0005b34c>] handle_irq_event_percpu+0x134/0x142
[<0005b34c>] handle_irq_event_percpu+0x134/0x142
[<0000a584>] nfeth_interrupt+0x0/0x194
[<001ba0a8>] schedule_preempt_disabled+0x0/0xc
[<0005b37a>] handle_irq_event+0x20/0x2c
[<0005add4>] generic_handle_irq+0x2c/0x3a
[<00002ab6>] do_IRQ+0x20/0x32
[<0000289e>] auto_irqhandler_fixup+0x4/0x6
[<00003144>] cpu_idle+0x22/0x2e
[<001b8a78>] printk+0x0/0x18
[<0024d112>] start_kernel+0x37a/0x386
[<0003021d>] __do_proc_dointvec+0xb1/0x366
[<0003021d>] __do_proc_dointvec+0xb1/0x366
[<0024c31e>] _sinittext+0x31e/0x9c0
After invoking the irq's handler the kernel sees !irqs_disabled()
and concludes that the handler erroneously enabled interrupts.
However, debugging shows that !irqs_disabled() is true even before
the handler is invoked, which indicates a problem in the platform
code rather than the specific driver.
The warning does not occur in 3.1 or older kernels.
It turns out that the ALLOWINT definition for Atari is incorrect.
The Atari definition of ALLOWINT is ~0x400, the stated purpose of
that is to avoid taking HSYNC interrupts. irqs_disabled() returns
true if the 3-bit ipl & 4 is non-zero. The nfeth interrupt runs at
ipl 3 (it's autovector 3), but 3 & 4 is zero so irqs_disabled() is
false, and the warning above is generated.
When interrupts are explicitly disabled, ipl is set to 7. When they
are enabled, ipl is masked with ALLOWINT. On Atari this will result
in ipl = 3, which blocks interrupts at ipl 3 and below. So how come
nfeth interrupts at ipl 3 are received at all? That's because ipl
is reset to 2 by Atari-specific code in default_idle(), again with
the stated purpose of blocking HSYNC interrupts. This discrepancy
means that ipl 3 can remain blocked for longer than intended.
Both default_idle() and falcon_hblhandler() identify HSYNC with
ipl 2, and the "Atari ST/.../F030 Hardware Register Listing" agrees,
but ALLOWINT is defined as if HSYNC was ipl 3.
[As an experiment I modified default_idle() to reset ipl to 3, and
as expected that resulted in all nfeth interrupts being blocked.]
The fix is simple: define ALLOWINT as ~0x500 instead. This makes
arch_local_irq_enable() consistent with default_idle(), and prevents
the !irqs_disabled() problems for ipl 3 interrupts.
Tested on Atari running in an Aranym VM.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Michael Schmitz <schmitzmic@googlemail.com>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9e2760d18b upstream.
User space access must always go through uaccess accessors, since on
classic m68k user space and kernel space are completely separate.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b8edf3e552 upstream.
Otherwise if someone tries to use all four channels on AIF1 with the
device in master mode we won't be able to clock out all the data.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc733d4952 upstream.
The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aff252a848 upstream.
uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.
In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f96a4216e8 upstream.
The default 10 microsecond delay for the controller to come out of
halt in dbgp_ehci_startup is too short, so increase it to 1 millisecond.
This is based on emperical testing on various USB debug ports on
modern machines such as a Lenovo X220i and an Ivybridge development
platform that needed to wait ~450-950 microseconds.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commits a117dacde0
and 8bbb181308 ]
The tun module leaks up to 36 bytes of memory by not fully initializing
a structure located on the stack that gets copied to user memory by the
TUNGETIFF and SIOCGIFHWADDR ioctl()s.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59ea33a68a ]
Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.
The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.
This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.
If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.
This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.
This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b1beb681cb ]
When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
IFF_ALLMULTI bits in dev->gflags according to the passed value but
do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
from dev->flags.
This can be easily trigerred by doing:
tcpdump -i eth0 &
ip l s up eth0
ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
IFF_PROMISC in gflags.
Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e4c7f259c5 ]
The problem is that we call this with a spin lock held. The call tree
is:
kaweth_start_xmit() holds kaweth->device_lock.
-> kaweth_async_set_rx_mode()
-> kaweth_control()
-> kaweth_internal_control_msg()
The kaweth_internal_control_msg() function is only called from
kaweth_control() which used GFP_ATOMIC for its allocations.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4249357010 ]
TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
values. If a user assign -1 to it, the socket will set successfully and wait
for 4294967295 miliseconds. This patch add a negative value check to avoid
this issue.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 89d7ae34cd ]
As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).
This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface. In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.
A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():
#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>
struct local_tag {
char type;
char length;
char info[4];
};
struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};
int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};
memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));
return 0;
}
CC: Lin Ming <mlin@ss.pku.edu.cn>
Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2eebc1e188 ]
A few days ago Dave Jones reported this oops:
[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140] ffff880147c03a10
[22766.387140] ffffffffa169f2b6
[22766.387140] ffff88013ed95728
[22766.387143] 0000000000000002
[22766.387143] 0000000000000000
[22766.387143] ffff880003fad062
[22766.387144] ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <ffff880147c039b0>
[22766.387142] ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.
Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.
I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.
I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.
I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c1f5163de4 ]
In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address
when it gets to the last entry of the tx ring. We were not using
the proper macro to skip the last entry when advancing the tx index.
Reported-by: Zongyun Lai <zlai@vmware.com>
Reviewed-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97795d2a5b upstream.
If we hit a condition where we have allocated metadata blocks that
were not appropriately reserved, we risk underflow of
ei->i_reserved_meta_blocks. In turn, this can throw
sbi->s_dirtyclusters_counter significantly out of whack and undermine
the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this
occurs and set i_allocated_meta_blocks to avoid this problem.
This condition is reproduced by xfstests 270 against ext2 with
delalloc enabled:
Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28
Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost
270 ultimately fails with an inconsistent filesystem and requires an
fsck to repair. The cause of the error is an underflow in
ext4_da_update_reserve_space() due to an unreserved meta block
allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6fb99cadc upstream.
Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5cf02d09b5 upstream.
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:
PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
#0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
#1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.
Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2930d381d2 upstream.
Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ca2ccde5e2 upstream.
To have DP behave like VGA/DVI we need to retrain the link
on hotplug. For this to happen we need to force link
training to happen by setting connector dpms to off
before asking it turning it on again.
v2: agd5f
- drop the dp_get_link_status() change in atombios_dp.c
for now. We still need the dpms OFF change.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 266dcba541 upstream.
No need to retrain the link for passive adapters.
v2: agd5f
- no passive DP to VGA adapters, update comments
- assign radeon_connector_atom_dig after we are sure
we have a digital connector as analog connectors
have different private data.
- get new sink type before checking for retrain. No
need to check if it's no longer a DP connection.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8d1c702aa0 upstream.
We want to print link status query failed only if it's
an unexepected fail. If we query to see if we need
link training it might be because there is nothing
connected and thus link status query have the right
to fail in that case.
To avoid printing failure when it's expected, move the
failure message to proper place.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f60ec4c7df upstream.
This could previously fail if either of the enabled displays was using a
horizontal resolution that is a multiple of 128, and only the leftmost column
of the cursor was (supposed to be) visible at the right edge of that display.
The solution is to move the cursor one pixel to the left in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33183
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e9fbcb4220 upstream.
Each ordered operation has a free callback, and this was called with the
worker spinlock held. Josef made the free callback also call iput,
which we can't do with the spinlock.
This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list. We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f197ac13f6 upstream.
In the ac.c, power_supply_register()'s return value is not checked.
As a result, the driver's add() ops may return success
even though the device failed to initialize.
For example, some BIOS may describe two ACADs in the same DSDT.
The second ACAD device will fail to register,
but ACPI driver's add() ops returns sucessfully.
The ACPI device will receive ACPI notification and cause OOPS.
https://bugzilla.redhat.com/show_bug.cgi?id=772730
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6575820221 upstream.
Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.
Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.
However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.
While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.
Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.
Workqueue cpu hotplug operations will soon go through further cleanup.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 443772d408 upstream.
If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).
This issue was explained in more detail in commit f42ac38c59 (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e8 (tracing: allow tracing of
suspend/resume & hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.
So fix this by disabling function tracing during suspend/resume & hibernation.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0ec4f431eb upstream.
The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int. Thus some illegal values
may be let through and cause problems in later code.
[ They actually *don't* cause problems in mainline, as of Dave Jones's
commit 8d657eb3b4 "Remove easily user-triggerable BUG from
generic_setlease", but we should fix this anyway. And this patch will
be necessary to fix real bugs on earlier kernels. ]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 31bde1ceaa upstream.
A "usb0" interface that has never been connected to a host has an unknown
operstate, and therefore the IFF_RUNNING flag is (incorrectly) asserted
when queried by ifconfig, ifplugd, etc. This is a result of calling
netif_carrier_off() too early in the probe function; it should be called
after register_netdev().
Similar problems have been fixed in many other drivers, e.g.:
e826eafa6 (bonding: Call netif_carrier_off after register_netdevice)
0d672e9f8 (drivers/net: Call netif_carrier_off at the end of the probe)
6a3c869a6 (cxgb4: fix reported state of interfaces without link)
Fix is to move netif_carrier_off() to the end of the function.
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2102e06a5f upstream.
iso data buffers may have holes in them if some packets were short, so for
iso urbs we should always copy the entire buffer, just like the regular
processcompl does.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b110547e58 upstream.
Commit 9fa2df6b90
(ARM: OMAP2+: OPP: allow OPP enumeration to continue if device is not present)
makes the logic:
for (i = 0; i < opp_def_size; i++) {
<snip>
if (!oh || !oh->od) {
<snip>
continue;
}
<snip>
opp_def++;
}
In short, the moment we hit a "Bad OPP", we end up looping the list
comparing against the bad opp definition pointer for the rest of the
iteration count. Instead, increment opp_def in the for loop itself
and allow continue to be used in code without much thought so that
we check the next set of OPP definition pointers :)
Cc: Steve Sakoman <steve@sakoman.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 940f5d47e2 upstream.
When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device. If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.
Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3b661a92e8 upstream.
The following crash results from cases where the end_device has been
removed before scsi_sysfs_add_sdev has had a chance to run.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
...
Call Trace:
[<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
[<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8125e641>] kobject_add_varg+0x41/0x50
[<ffffffff8125e70b>] kobject_add+0x64/0x66
[<ffffffff8131122b>] device_add+0x12d/0x63a
[<ffffffff814b65ea>] ? _raw_spin_unlock_irqrestore+0x47/0x56
[<ffffffff8107de15>] ? module_refcount+0x89/0xa0
[<ffffffff8132f348>] scsi_sysfs_add_sdev+0x4e/0x28a
[<ffffffff8132dcbb>] do_scan_async+0x9c/0x145
...teach scsi_sysfs_add_devices() to check for deleted devices() before
trying to add them, and teach scsi_remove_target() how to remove targets
that have not been added via device_add().
Reported-by: Dariusz Majchrzak <dariusz.majchrzak@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57fc2e335f upstream.
Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.
A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.
Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh. Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b17caa174a upstream.
commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues
The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports. The result is random discovery failures
depending on configuration.
Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'. Convert it to bool and stop
returning its result up the stack.
Tested-by: Dan Melnic <dan.melnic@amd.com>
Reported-by: Dan Melnic <dan.melnic@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26f2f199ff upstream.
Continue running revalidation until no more broadcast devices are
discovered. Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9f5072d4f6 upstream.
Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
numbers on the way. This causes nonsensical cpu idle/iowait values to be
displayed in /proc/stat (the only user of usecs_to_cputime so far).
This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
its value and using it directly in cputime_to_usecs instead of doing two
multiplications.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 55fc05b741 upstream.
At http://dev.laptop.org/ticket/11980 we have determined that the
Marvell CaFe SDHCI controller reports bad card presence during
resume. It reports that no card is present even when it is.
This is a regression -- resume worked back around 2.6.37.
Around 400ms after resuming, a "card inserted" interrupt is
generated, at which point it starts reporting presence.
Work around this hardware oddity by setting the
SDHCI_QUIRK_BROKEN_CARD_DETECTION flag.
Thanks to Chris Ball for helping with diagnosis.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 635697c663 upstream.
Stable note: The commit [acf92b48: vmscan: shrinker->nr updates race and
go wrong] aimed to reduce excessive reclaim of slab objects but
had bug in how it treated shrinker functions that returned -1.
A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock. For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this. So the next time around this shrinker can cause
really big pressure. Let's skip such shrinkers instead.
Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1c12cbcd0 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This
is part of a series of patches that reduce page allocator overhead.
Fix a gcc warning (and bug?) introduced in cc9a6c877 ("cpuset: mm: reduce
large amounts of memory barrier related damage v3")
Local variable "page" can be uninitialized if the nodemask from vma policy
does not intersects with nodemask from cpuset. Even if it doesn't happens
it is better to initialize this variable explicitly than to introduce
a kernel oops in a weird corner case.
mm/hugetlb.c: In function `alloc_huge_page':
mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc9a6c8776 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This
is part of a series of patches that reduce page allocator overhead.
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.
[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.
For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.
This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.
While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.
In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were
3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87
The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.
For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.
To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.
Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b246272ecc upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
expensive and severely impacted page allocator performance. This is
part of a series of patches that reduce page allocator overhead.
Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.
c0ff7453bb ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by
89e8a244b9 ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e8a244b9 upstream.
Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is
extremely expensive and severely impacted page allocator performance.
This is part of a series of patches that reduce page allocator
overhead.
{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.
This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself. It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example. In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!
The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes. This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.
If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes. Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b95a2f2d48 upstream - WARNING: this is a substitute patch.
Stable note: Not tracked in Bugzilla. This is a partial backport of an
upstream commit addressing a completely different issue
that accidentally contained an important fix. The workload
this patch helps was memcached when IO is started in the
background. memcached should stay resident but without this patch
it gets swapped. Sometimes this manifests as a drop in throughput
but mostly it was observed through /proc/vmstat.
Commit [246e87a9: memcg: fix get_scan_count() for small targets] was meant
to fix a problem whereby small scan targets on memcg were ignored causing
priority to raise too sharply. It forced scanning to take place if the
target was small, memcg or kswapd.
From the time it was introduced it caused excessive reclaim by kswapd
with workloads being pushed to swap that previously would have stayed
resident. This was accidentally fixed in commit [b95a2f2d: mm: vmscan:
convert global reclaim to per-memcg LRU lists] by making it harder for
kswapd to force scan small targets but that patchset is not suitable for
backporting. This was later changed again by commit [90126375: mm/vmscan:
push lruvec pointer into get_scan_count()] into a format that looks
like it would be a straight-forward backport but there is a subtle
difference due to the use of lruvecs.
The impact of the accidental fix is to make it harder for kswapd to force
scan small targets by taking zone->all_unreclaimable into account. This
patch is the closest equivalent available based on what is backported.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 043bcbe5ec upstream.
Stable note: Not tracked in Bugzilla. There were reports of shared
mapped pages being unfairly reclaimed in comparison to older kernels.
This is being addressed over time. Even though the subject
refers to lumpy reclaim, it impacts compaction as well.
Lumpy reclaim does well to stop at a PageAnon when there's no swap, but
better is to stop at any PageSwapBacked, which includes shmem/tmpfs too.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 86cfd3a450 upstream.
Stable note: Not tracked in Bugzilla. This patch reduces kswapd CPU
usage on swapless systems with high anonymous memory usage.
It's pointless to continue reclaiming when we have no swap space and lots
of anon pages in the inactive list.
Without this patch, it is possible when swap is disabled to continue
trying to reclaim when there are only anonymous pages in the system even
though that will not make any progress.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 34dbc67a64 upstream.
Stable note: Not tracked in Bugzilla. There were reports of shared
mapped pages being unfairly reclaimed in comparison to older kernels.
This is being addressed over time. The specific workload being
addressed here in described in paragraph four and while paragraph
five says it did not help performance as such, it made a difference
to major page faults. I'm aware of at least one bug for a large
vendor that was due to increased major faults.
Commit 6457474624 ("vmscan: detect mapped file pages used only once")
greatly decreases lifetime of single-used mapped file pages.
Unfortunately it also decreases life time of all shared mapped file
pages. Because after commit bf3f3bc5e7 ("mm: don't mark_page_accessed
in fault path") page-fault handler does not mark page active or even
referenced.
Thus page_check_references() activates file page only if it was used twice
while it stays in inactive list, meanwhile it activates anon pages after
first access. Inactive list can be small enough, this way reclaimer can
accidentally throw away any widely used page if it wasn't used twice in
short period.
After this patch page_check_references() also activate file mapped page at
first inactive list scan if this page is already used multiple times via
several ptes.
I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5
(~2.6.18). There a complete mess with >100 web/mail/spam/ftp containers,
they share all their files but there a lot of anonymous pages: ~500mb
shared file mapped memory and 15-20Gb non-shared anonymous memory. In
this situation major-pagefaults are very costly, because all containers
share the same page. In my load kernel created a disproportionate
pressure on the file memory, compared with the anonymous, they equaled
only if I raise swappiness up to 150 =)
These patches actually wasn't helped a lot in my problem, but I saw
noticable (10-20 times) reduce in count and average time of
major-pagefault in file-mapped areas.
Actually both patches are fixes for commit v2.6.33-5448-g6457474, because
it was aimed at one scenario (singly used pages), but it breaks the logic
in other scenarios (shared and/or executable pages)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 0cee34fd72 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time.
If compaction can proceed for a given zone, shrink_zones() does not
reclaim any more pages from it. After commit [e0c2327: vmscan: abort
reclaim/compaction if compaction can proceed], do_try_to_free_pages()
tries to finish as soon as possible once one zone can compact.
This was intended to prevent slabs being shrunk unnecessarily but there
are side-effects. One is that a small zone that is ready for compaction
will abort reclaim even if the chances of successfully allocating a THP
from that zone is small. It also means that reclaim can return too early
even though sc->nr_to_reclaim pages were not reclaimed.
This partially reverts the commit until it is proven that slabs are really
being shrunk unnecessarily but preserves the check to return 1 to avoid
OOM if reclaim was aborted prematurely.
[aarcange@redhat.com: This patch replaces a revert from Andrea]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7335084d44 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but otherwise has little to justify it. The
problem it fixes was never observed but the source of the
theoretical problem did not exist for very long.
During direct reclaim it is possible that reclaim will be aborted so that
compaction can be attempted to satisfy a high-order allocation. If this
decision is made before any pages are reclaimed, it is possible that 0 is
returned to the page allocator potentially triggering an OOM. This has
not been observed but it is a possibility so this patch addresses it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe4b1b244b upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time. This patch
addresses a problem where the fix regressed THP allocation
success rates.
In commit e0887c19 ("vmscan: limit direct reclaim for higher order
allocations"), Rik noted that reclaim was too aggressive when THP was
enabled. In his initial patch he used the number of free pages to decide
if reclaim should abort for compaction. My feedback was that reclaim and
compaction should be using the same logic when deciding if reclaim should
be aborted.
Unfortunately, this had the effect of reducing THP success rates when the
workload included something like streaming reads that continually
allocated pages. The window during which compaction could run and return
a THP was too small.
This patch combines Rik's two patches together. compaction_suitable() is
still used to decide if reclaim should be aborted to allow compaction is
used. However, it will also ensure that there is a reasonable buffer of
free pages available. This improves upon the THP allocation success rates
but bounds the number of pages that are freed for compaction.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6bc32b899 upstream.
Stable note: Not tracked in Buzilla. This was part of a series that
reduced interactivity stalls experienced when THP was enabled.
These stalls were particularly noticable when copying data
to a USB stick but the experiences for users varied a lot.
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage. Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.
This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.
[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f0dfcde099 upstream.
Stable note: Fixes https://bugzilla.redhat.com/show_bug.cgi?id=712019. This
patch reduces kswapd CPU usage.
There 2 places to read pgdat in kswapd. One is return from a successful
balance, another is waked up from kswapd sleeping. The new_order and
new_classzone_idx represent the balance input order and classzone_idx.
But current new_order and new_classzone_idx are not assigned after
kswapd_try_to_sleep(), that will cause a bug in the following scenario.
1: after a successful balance, kswapd goes to sleep, and new_order = 0;
new_classzone_idx = __MAX_NR_ZONES - 1;
2: kswapd waked up with order = 3 and classzone_idx = ZONE_NORMAL
3: in the balance_pgdat() running, a new balance wakeup happened with
order = 5, and classzone_idx = ZONE_NORMAL
4: the first wakeup(order = 3) finished successufly, return order = 3
but, the new_order is still 0, so, this balancing will be treated as a
failed balance. And then the second tighter balancing will be missed.
So, to avoid the above problem, the new_order and new_classzone_idx need
to be assigned for later successful comparison.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Tested-by: Pádraig Brady <P@draigBrady.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d2ebd0f6b8 upstream.
Stable note: Fixes https://bugzilla.redhat.com/show_bug.cgi?id=712019. This
patch reduces kswapd CPU usage.
In commit 215ddd66 ("mm: vmscan: only read new_classzone_idx from pgdat
when reclaiming successfully") , Mel Gorman said kswapd is better to sleep
after a unsuccessful balancing if there is tighter reclaim request pending
in the balancing. But in the following scenario, kswapd do something that
is not matched our expectation. The patch fixes this issue.
1, Read pgdat request A (classzone_idx, order = 3)
2, balance_pgdat()
3, During pgdat, a new pgdat request B (classzone_idx, order = 5) is placed
4, balance_pgdat() returns but failed since returned order = 0
5, pgdat of request A assigned to balance_pgdat(), and do balancing again.
While the expectation behavior of kswapd should try to sleep.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Pádraig Brady <P@draigBrady.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c824493528 upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
information by reducing LRU list churning had the side-effect of
reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list. This
had to be partially reverted because some dirty pages can be migrated by
compaction without blocking.
This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise LRU
disruption.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66199712e9 upstream.
Stable note: Not tracked in Buzilla. This was part of a series that
reduced interactivity stalls experienced when THP was enabled.
If compaction is deferred, direct reclaim is used to try to free enough
pages for the allocation to succeed. For small high-orders, this has a
reasonable chance of success. However, if the caller has specified
__GFP_NO_KSWAPD to limit the disruption to the system, it makes more sense
to fail the allocation rather than stall the caller in direct reclaim.
This patch skips direct reclaim if compaction is deferred and the caller
specifies __GFP_NO_KSWAPD.
Async compaction only considers a subset of pages so it is possible for
compaction to be deferred prematurely and not enter direct reclaim even in
cases where it should. To compensate for this, this patch also defers
compaction only if sync compaction failed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b969c4ab9f upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page
aging information by reducing LRU list churning had the side-effect
of reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time. Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates. Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;
if (PageDirty(page) && !sync &&
mapping->a_ops->migratepage != migrate_page)
rc = -EBUSY;
This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking. This
patch updates the ->migratepage callback with a "sync" parameter. It is
the responsibility of the callback to fail gracefully if migration would
block.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a77ebd333c upstream.
Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging
information by reducing LRU list churning had the side-effect of
reducing THP allocation success rates. This was part of a series
to restore the success rates while preserving the reclaim fix.
Short summary: There are severe stalls when a USB stick using VFAT is
used with THP enabled that are reduced by this series. If you are
experiencing this problem, please test and report back and considering I
have seen complaints from openSUSE and Fedora users on this as well as a
few private mails, I'm guessing it's a widespread issue. This is a new
type of USB-related stall because it is due to synchronous compaction
writing where as in the past the big problem was dirty pages reaching
the end of the LRU and being written by reclaim.
Am cc'ing Andrew this time and this series would replace
mm-do-not-stall-in-synchronous-compaction-for-thp-allocations.patch.
I'm also cc'ing Dave Jones as he might have merged that patch to Fedora
for wider testing and ideally it would be reverted and replaced by this
series.
That said, the later patches could really do with some review. If this
series is not the answer then a new direction needs to be discussed
because as it is, the stalls are unacceptable as the results in this
leader show.
For testers that try backporting this to 3.1, it won't work because
there is a non-obvious dependency on not writing back pages in direct
reclaim so you need those patches too.
Changelog since V5
o Rebase to 3.2-rc5
o Tidy up the changelogs a bit
Changelog since V4
o Added reviewed-bys, credited Andrea properly for sync-light
o Allow dirty pages without mappings to be considered for migration
o Bound the number of pages freed for compaction
o Isolate PageReclaim pages on their own LRU list
This is against 3.2-rc5 and follows on from discussions on "mm: Do
not stall in synchronous compaction for THP allocations" and "[RFC
PATCH 0/5] Reduce compaction-related stalls". Initially, the proposed
patch eliminated stalls due to compaction which sometimes resulted in
user-visible interactivity problems on browsers by simply never using
sync compaction. The downside was that THP success allocation rates
were lower because dirty pages were not being migrated as reported by
Andrea. His approach at fixing this was nacked on the grounds that
it reverted fixes from Rik merged that reduced the amount of pages
reclaimed as it severely impacted his workloads performance.
This series attempts to reconcile the requirements of maximising THP
usage, without stalling in a user-visible fashion due to compaction
or cheating by reclaiming an excessive number of pages.
Patch 1 partially reverts commit 39deaf85 to allow migration to isolate
dirty pages. This is because migration can move some dirty
pages without blocking.
Patch 2 notes that the /proc/sys/vm/compact_memory handler is not using
synchronous compaction when it should be. This is unrelated
to the reported stalls but is worth fixing.
Patch 3 checks if we isolated a compound page during lumpy scan and
account for it properly. For the most part, this affects
tracing so it's unrelated to the stalls but worth fixing.
Patch 4 notes that it is possible to abort reclaim early for compaction
and return 0 to the page allocator potentially entering the
"may oom" path. This has not been observed in practice but
the rest of the series potentially makes it easier to happen.
Patch 5 adds a sync parameter to the migratepage callback and gives
the callback responsibility for migrating the page without
blocking if sync==false. For example, fallback_migrate_page
will not call writepage if sync==false. This increases the
number of pages that can be handled by asynchronous compaction
thereby reducing stalls.
Patch 6 restores filter-awareness to isolate_lru_page for migration.
In practice, it means that pages under writeback and pages
without a ->migratepage callback will not be isolated
for migration.
Patch 7 avoids calling direct reclaim if compaction is deferred but
makes sure that compaction is only deferred if sync
compaction was used.
Patch 8 introduces a sync-light migration mechanism that sync compaction
uses. The objective is to allow some stalls but to not call
->writepage which can lead to significant user-visible stalls.
Patch 9 notes that while we want to abort reclaim ASAP to allow
compation to go ahead that we leave a very small window of
opportunity for compaction to run. This patch allows more pages
to be freed by reclaim but bounds the number to a reasonable
level based on the high watermark on each zone.
Patch 10 allows slabs to be shrunk even after compaction_ready() is
true for one zone. This is to avoid a problem whereby a single
small zone can abort reclaim even though no pages have been
reclaimed and no suitably large zone is in a usable state.
Patch 11 fixes a problem with the rate of page scanning. As reclaim is
rarely stalling on pages under writeback it means that scan
rates are very high. This is particularly true for direct
reclaim which is not calling writepage. The vmstat figures
implied that much of this was busy work with PageReclaim pages
marked for immediate reclaim. This patch is a prototype that
moves these pages to their own LRU list.
This has been tested and other than 2 USB keys getting trashed,
nothing horrible fell out. That said, I am a bit unhappy with the
rescue logic in patch 11 but did not find a better way around it. It
does significantly reduce scan rates and System CPU time indicating
it is the right direction to take.
What is of critical importance is that stalls due to compaction
are massively reduced even though sync compaction was still
allowed. Testing from people complaining about stalls copying to USBs
with THP enabled are particularly welcome.
The following tests all involve THP usage and USB keys in some
way. Each test follows this type of pattern
1. Read from some fast fast storage, be it raw device or file. Each time
the copy finishes, start again until the test ends
2. Write a large file to a filesystem on a USB stick. Each time the copy
finishes, start again until the test ends
3. When memory is low, start an alloc process that creates a mapping
the size of physical memory to stress THP allocation. This is the
"real" part of the test and the part that is meant to trigger
stalls when THP is enabled. Copying continues in the background.
4. Record the CPU usage and time to execute of the alloc process
5. Record the number of THP allocs and fallbacks as well as the number of THP
pages in use a the end of the test just before alloc exited
6. Run the test 5 times to get an idea of variability
7. Between each run, sync is run and caches dropped and the test
waits until nr_dirty is a small number to avoid interference
or caching between iterations that would skew the figures.
The individual tests were then
writebackCPDeviceBasevfat
Disable THP, read from a raw device (sda), vfat on USB stick
writebackCPDeviceBaseext4
Disable THP, read from a raw device (sda), ext4 on USB stick
writebackCPDevicevfat
THP enabled, read from a raw device (sda), vfat on USB stick
writebackCPDeviceext4
THP enabled, read from a raw device (sda), ext4 on USB stick
writebackCPFilevfat
THP enabled, read from a file on fast storage and USB, both vfat
writebackCPFileext4
THP enabled, read from a file on fast storage and USB, both ext4
The kernels tested were
3.1 3.1
vanilla 3.2-rc5
freemore Patches 1-10
immediate Patches 1-11
andrea The 8 patches Andrea posted as a basis of comparison
The results are very long unfortunately. I'll start with the case
where we are not using THP at all
writebackCPDeviceBasevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.28 ( 0.00%) 54.49 (-4143.46%) 48.63 (-3687.69%) 4.69 ( -265.11%) 51.88 (-3940.81%)
+/- 0.06 ( 0.00%) 2.45 (-4305.55%) 4.75 (-8430.57%) 7.46 (-13282.76%) 4.76 (-8440.70%)
User Time 0.09 ( 0.00%) 0.05 ( 40.91%) 0.06 ( 29.55%) 0.07 ( 15.91%) 0.06 ( 27.27%)
+/- 0.02 ( 0.00%) 0.01 ( 45.39%) 0.02 ( 25.07%) 0.00 ( 77.06%) 0.01 ( 52.24%)
Elapsed Time 110.27 ( 0.00%) 56.38 ( 48.87%) 49.95 ( 54.70%) 11.77 ( 89.33%) 53.43 ( 51.54%)
+/- 7.33 ( 0.00%) 3.77 ( 48.61%) 4.94 ( 32.63%) 6.71 ( 8.50%) 4.76 ( 35.03%)
THP Active 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault Alloc 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Fault Fallback 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
+/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
The THP figures are obviously all 0 because THP was enabled. The
main thing to watch is the elapsed times and how they compare to
times when THP is enabled later. It's also important to note that
elapsed time is improved by this series as System CPu time is much
reduced.
writebackCPDevicevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.22 ( 0.00%) 13.89 (-1040.72%) 46.40 (-3709.20%) 4.44 ( -264.37%) 47.37 (-3789.33%)
+/- 0.06 ( 0.00%) 22.82 (-37635.56%) 3.84 (-6249.44%) 6.48 (-10618.92%) 6.60
(-10818.53%)
User Time 0.06 ( 0.00%) 0.06 ( -6.90%) 0.05 ( 17.24%) 0.05 ( 13.79%) 0.04 ( 31.03%)
+/- 0.01 ( 0.00%) 0.01 ( 33.33%) 0.01 ( 33.33%) 0.01 ( 39.14%) 0.01 ( 25.46%)
Elapsed Time 10445.54 ( 0.00%) 2249.92 ( 78.46%) 70.06 ( 99.33%) 16.59 ( 99.84%) 472.43 (
95.48%)
+/- 643.98 ( 0.00%) 811.62 ( -26.03%) 10.02 ( 98.44%) 7.03 ( 98.91%) 59.99 ( 90.68%)
THP Active 15.60 ( 0.00%) 35.20 ( 225.64%) 65.00 ( 416.67%) 70.80 ( 453.85%) 62.20 ( 398.72%)
+/- 18.48 ( 0.00%) 51.29 ( 277.59%) 15.99 ( 86.52%) 37.91 ( 205.18%) 22.02 ( 119.18%)
Fault Alloc 121.80 ( 0.00%) 76.60 ( 62.89%) 155.40 ( 127.59%) 181.20 ( 148.77%) 286.60 ( 235.30%)
+/- 73.51 ( 0.00%) 61.11 ( 83.12%) 34.89 ( 47.46%) 31.88 ( 43.36%) 68.13 ( 92.68%)
Fault Fallback 881.20 ( 0.00%) 926.60 ( -5.15%) 847.60 ( 3.81%) 822.00 ( 6.72%) 716.60 ( 18.68%)
+/- 73.51 ( 0.00%) 61.26 ( 16.67%) 34.89 ( 52.54%) 31.65 ( 56.94%) 67.75 ( 7.84%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 3540.88 1945.37 716.04 64.97 1937.03
Total Elapsed Time (seconds) 52417.33 11425.90 501.02 230.95 2520.28
The first thing to note is the "Elapsed Time" for the vanilla kernels
of 2249 seconds versus 56 with THP disabled which might explain the
reports of USB stalls with THP enabled. Applying the patches brings
performance in line with THP-disabled performance while isolating
pages for immediate reclaim from the LRU cuts down System CPU time.
The "Fault Alloc" success rate figures are also improved. The vanilla
kernel only managed to allocate 76.6 pages on average over the course
of 5 iterations where as applying the series allocated 181.20 on
average albeit it is well within variance. It's worth noting that
applies the series at least descreases the amount of variance which
implies an improvement.
Andrea's series had a higher success rate for THP allocations but
at a severe cost to elapsed time which is still better than vanilla
but still much worse than disabling THP altogether. One can bring my
series close to Andrea's by removing this check
/*
* If compaction is deferred for high-order allocations, it is because
* sync compaction recently failed. In this is the case and the caller
* has requested the system not be heavily disrupted, fail the
* allocation now instead of entering direct reclaim
*/
if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD))
goto nopage;
I didn't include a patch that removed the above check because hurting
overall performance to improve the THP figure is not what the average
user wants. It's something to consider though if someone really wants
to maximise THP usage no matter what it does to the workload initially.
This is summary of vmstat figures from the same test.
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
Page Ins 3257266139 1111844061 17263623 10901575 161423219
Page Outs 81054922 30364312 3626530 3657687 8753730
Swap Ins 3294 2851 6560 4964 4592
Swap Outs 390073 528094 620197 790912 698285
Direct pages scanned 1077581700 3024951463 1764930052 115140570 5901188831
Kswapd pages scanned 34826043 7112868 2131265 1686942 1893966
Kswapd pages reclaimed 28950067 4911036 1246044 966475 1497726
Direct pages reclaimed 805148398 280167837 3623473 2215044 40809360
Kswapd efficiency 83% 69% 58% 57% 79%
Kswapd velocity 664.399 622.521 4253.852 7304.360 751.490
Direct efficiency 74% 9% 0% 1% 0%
Direct velocity 20557.737 264745.137 3522673.849 498551.938 2341481.435
Percentage direct scans 96% 99% 99% 98% 99%
Page writes by reclaim 722646 529174 620319 791018 699198
Page writes file 332573 1080 122 106 913
Page writes anon 390073 528094 620197 790912 698285
Page reclaim immediate 0 2552514720 1635858848 111281140 5478375032
Page rescued immediate 0 0 0 87848 0
Slabs scanned 23552 23552 9216 8192 9216
Direct inode steals 231 0 0 0 0
Kswapd inode steals 0 0 0 0 0
Kswapd skipped wait 28076 786 0 61 6
THP fault alloc 609 383 753 906 1433
THP collapse alloc 12 6 0 0 6
THP splits 536 211 456 593 1136
THP fault fallback 4406 4633 4263 4110 3583
THP collapse fail 120 127 0 0 4
Compaction stalls 1810 728 623 779 3200
Compaction success 196 53 60 80 123
Compaction failures 1614 675 563 699 3077
Compaction pages moved 193158 53545 243185 333457 226688
Compaction move failure 9952 9396 16424 23676 45070
The main things to look at are
1. Page In/out figures are much reduced by the series.
2. Direct page scanning is incredibly high (264745.137 pages scanned
per second on the vanilla kernel) but isolating PageReclaim pages
on their own list reduces the number of pages scanned significantly.
3. The fact that "Page rescued immediate" is a positive number implies
that we sometimes race removing pages from the LRU_IMMEDIATE list
that need to be put back on a normal LRU but it happens only for
0.07% of the pages marked for immediate reclaim.
writebackCPDeviceext4
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%)
+/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%)
User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%)
+/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%)
Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%)
+/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%)
THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%)
+/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%)
Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%)
+/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%)
Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%)
+/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45
Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26
Similar test but the USB stick is using ext4 instead of vfat. As
ext4 does not use writepage for migration, the large stalls due to
compaction when THP is enabled are not observed. Still, isolating
PageReclaim pages on their own list helped completion time largely
by reducing the number of pages scanned by direct reclaim although
time spend in congestion_wait could also be a factor.
Again, Andrea's series had far higher success rates for THP allocation
at the cost of elapsed time. I didn't look too closely but a quick
look at the vmstat figures tells me kswapd reclaimed 8 times more pages
than the patch series and direct reclaim reclaimed roughly three times
as many pages. It follows that if memory is aggressively reclaimed,
there will be more available for THP.
writebackCPFilevfat
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.76 ( 0.00%) 29.10 (-1555.52%) 46.01 (-2517.18%) 4.79 ( -172.35%) 54.89 (-3022.53%)
+/- 0.14 ( 0.00%) 25.61 (-18185.17%) 2.15 (-1434.83%) 6.60 (-4610.03%) 9.75
(-6863.76%)
User Time 0.05 ( 0.00%) 0.07 ( -45.83%) 0.05 ( -4.17%) 0.06 ( -29.17%) 0.06 ( -16.67%)
+/- 0.02 ( 0.00%) 0.02 ( 20.11%) 0.02 ( -3.14%) 0.01 ( 31.58%) 0.01 ( 47.41%)
Elapsed Time 22520.79 ( 0.00%) 1082.85 ( 95.19%) 73.30 ( 99.67%) 32.43 ( 99.86%) 291.84 ( 98.70%)
+/- 7277.23 ( 0.00%) 706.29 ( 90.29%) 19.05 ( 99.74%) 17.05 ( 99.77%) 125.55 ( 98.27%)
THP Active 83.80 ( 0.00%) 12.80 ( 15.27%) 15.60 ( 18.62%) 13.00 ( 15.51%) 0.80 ( 0.95%)
+/- 66.81 ( 0.00%) 20.19 ( 30.22%) 5.92 ( 8.86%) 15.06 ( 22.54%) 1.17 ( 1.75%)
Fault Alloc 171.00 ( 0.00%) 67.80 ( 39.65%) 97.40 ( 56.96%) 125.60 ( 73.45%) 133.00 ( 77.78%)
+/- 82.91 ( 0.00%) 30.69 ( 37.02%) 53.91 ( 65.02%) 55.05 ( 66.40%) 21.19 ( 25.56%)
Fault Fallback 832.00 ( 0.00%) 935.20 ( -12.40%) 906.00 ( -8.89%) 877.40 ( -5.46%) 870.20 ( -4.59%)
+/- 82.91 ( 0.00%) 30.69 ( 62.98%) 54.01 ( 34.86%) 55.05 ( 33.60%) 20.91 ( 74.78%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 7229.81 928.42 704.52 80.68 1330.76
Total Elapsed Time (seconds) 112849.04 5618.69 571.11 360.54 1664.28
In this case, the test is reading/writing only from filesystems but as
it's vfat, it's slow due to calling writepage during compaction. Little
to observe really - the time to complete the test goes way down
with the series applied and THP allocation success rates go up in
comparison to 3.2-rc5. The success rates are lower than 3.1.0 but
the elapsed time for that kernel is abysmal so it is not really a
sensible comparison.
As before, Andrea's series allocates more THPs at the cost of overall
performance.
writebackCPFileext4
3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1
System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%)
+/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%)
User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%)
+/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%)
Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%)
+/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%)
THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%)
+/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%)
Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%)
+/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%)
Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%)
+/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45
Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26
Same type of story - elapsed times go down. In this case, allocation
success rates are roughtly the same. As before, Andrea's has higher
success rates but takes a lot longer.
Overall the series does reduce latencies and while the tests are
inherency racy as alloc competes with the cp processes, the variability
was included. The THP allocation rates are not as high as they could
be but that is because we would have to be more aggressive about
reclaim and compaction impacting overall performance.
This patch:
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list.
What was missed during review is that asynchronous migration moves dirty
pages if their ->migratepage callback is migrate_page() because these can
be moved without blocking. This potentially impacted hugepage allocation
success rates by a factor depending on how many dirty pages are in the
system.
This patch partially reverts 39deaf85 to allow migration to isolate dirty
pages again. This increases how much compaction disrupts the LRU but that
is addressed later in the series.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f80c067361 upstream.
Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU list
leading to poor reclaim decisions which has a variable
performance impact.
In __zone_reclaim case, we don't want to shrink mapped page. Nonetheless,
we have isolated mapped page and re-add it into LRU's head. It's
unnecessary CPU overhead and makes LRU churning.
Of course, when we isolate the page, the page might be mapped but when we
try to migrate the page, the page would be not mapped. So it could be
migrated. But race is rare and although it happens, it's no big deal.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39deaf8585 upstream.
Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU
list leading to poor reclaim decisions which has a variable
performance impact.
In async mode, compaction doesn't migrate dirty or writeback pages. So,
it's meaningless to pick the page and re-add it to lru list.
Of course, when we isolate the page in compaction, the page might be dirty
or writeback but when we try to migrate the page, the page would be not
dirty, writeback. So it could be migrated. But it's very unlikely as
isolate and migration cycle is much faster than writeout.
So, this patch helps cpu overhead and prevent unnecessary LRU churning.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4356f21d09 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but has no other impact.
Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.
Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags. INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."
This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9e84ac153 upstream.
Stable note: Not tracked in Bugzilla. This patch makes later patches
easier to apply but has no other impact.
acct_isolated of compaction uses page_lru_base_type which returns only
base type of LRU list so it never returns LRU_ACTIVE_ANON or
LRU_ACTIVE_FILE. In addtion, cc->nr_[anon|file] is used in only
acct_isolated so it doesn't have fields in conpact_control.
This patch removes fields from compact_control and makes clear function of
acct_issolated which counts the number of anon|file pages isolated.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0c23279c9 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time.
If compaction can proceed, shrink_zones() stops doing any work but its
callers still call shrink_slab() which raises the priority and potentially
sleeps. This is unnecessary and wasteful so this patch aborts direct
reclaim/compaction entirely if compaction can proceed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0887c19b2 upstream.
Stable note: Not tracked on Bugzilla. THP and compaction was found to
aggressively reclaim pages and stall systems under different
situations that was addressed piecemeal over time. Paragraph
3 of this changelog is the motivation for this patch.
When suffering from memory fragmentation due to unfreeable pages, THP page
faults will repeatedly try to compact memory. Due to the unfreeable
pages, compaction fails.
Needless to say, at that point page reclaim also fails to create free
contiguous 2MB areas. However, that doesn't stop the current code from
trying, over and over again, and freeing a minimum of 4MB (2UL <<
sc->order pages) at every single invocation.
This resulted in my 12GB system having 2-3GB free memory, a corresponding
amount of used swap and very sluggish response times.
This can be avoided by having the direct reclaim code not reclaim from
zones that already have plenty of free memory available for compaction.
If compaction still fails due to unmovable memory, doing additional
reclaim will only hurt the system, not help.
[jweiner@redhat.com: change comment to explain the order check]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3567b59aa8 upstream.
Stable note: Not tracked in Bugzilla. This patch reduces excessive
reclaim of slab objects reducing the amount of information that
has to be brought back in from disk. The third and fourth paragram
in the series describes the impact.
When a shrinker returns -1 to shrink_slab() to indicate it cannot do
any work given the current memory reclaim requirements, it adds the
entire total_scan count to shrinker->nr. The idea ehind this is that
whenteh shrinker is next called and can do work, it will do the work
of the previously aborted shrinker call as well.
However, if a filesystem is doing lots of allocation with GFP_NOFS
set, then we get many, many more aborts from the shrinkers than we
do successful calls. The result is that shrinker->nr winds up to
it's maximum permissible value (twice the current cache size) and
then when the next shrinker call that can do work is issued, it
has enough scan count built up to free the entire cache twice over.
This manifests itself in the cache going from full to empty in a
matter of seconds, even when only a small part of the cache is
needed to be emptied to free sufficient memory.
Under metadata intensive workloads on ext4 and XFS, I'm seeing the
VFS caches increase memory consumption up to 75% of memory (no page
cache pressure) over a period of 30-60s, and then the shrinker
empties them down to zero in the space of 2-3s. This cycle repeats
over and over again, with the shrinker completely trashing the inode
and dentry caches every minute or so the workload continues.
This behaviour was made obvious by the shrink_slab tracepoints added
earlier in the series, and made worse by the patch that corrected
the concurrent accounting of shrinker->nr.
To avoid this problem, stop repeated small increments of the total
scan value from winding shrinker->nr up to a value that can cause
the entire cache to be freed. We still need to allow it to wind up,
so use the delta as the "large scan" threshold check - if the delta
is more than a quarter of the entire cache size, then it is a large
scan and allowed to cause lots of windup because we are clearly
needing to free lots of memory.
If it isn't a large scan then limit the total scan to half the size
of the cache so that windup never increases to consume the whole
cache. Reducing the total scan limit further does not allow enough
wind-up to maintain the current levels of performance, whilst a
higher threshold does not prevent the windup from freeing the entire
cache under sustained workloads.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit acf92b485c upstream.
Stable note: Not tracked in Bugzilla. This patch reduces excessive
reclaim of slab objects reducing the amount of information
that has to be brought back in from disk.
shrink_slab() allows shrinkers to be called in parallel so the
struct shrinker can be updated concurrently. It does not provide any
exclusio for such updates, so we can get the shrinker->nr value
increasing or decreasing incorrectly.
As a result, when a shrinker repeatedly returns a value of -1 (e.g.
a VFS shrinker called w/ GFP_NOFS), the shrinker->nr goes haywire,
sometimes updating with the scan count that wasn't used, sometimes
losing it altogether. Worse is when a shrinker does work and that
update is lost due to racy updates, which means the shrinker will do
the work again!
Fix this by making the total_scan calculations independent of
shrinker->nr, and making the shrinker->nr updates atomic w.r.t. to
other updates via cmpxchg loops.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 095760730c upstream.
Stable note: This patch makes later patches easier to apply but otherwise
has little to justify it. It is a diagnostic patch that was part
of a series addressing excessive slab shrinking after GFP_NOFS
failures. There is detailed information on the series' motivation
at https://lkml.org/lkml/2011/6/2/42 .
It is impossible to understand what the shrinkers are actually doing
without instrumenting the code, so add a some tracepoints to allow
insight to be gained.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 439423f689 upstream.
Stable note: Not tracked in Bugzilla. kswapd is responsible for clearing
ZONE_CONGESTED after it balances a zone and this patch fixes a bug
where that was failing to happen. Without this patch, processes
can stall in wait_iff_congested unnecessarily. For users, this can
look like an interactivity stall but some workloads would see it
as sudden drop in throughput.
ZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any
task. It's possible ZONE_CONGESTED isn't cleared in some cases:
1. the zone is already balanced just entering balance_pgdat() for
order-0 because concurrent tasks free memory. In this case, later
check will skip the zone as it's balanced so the flag isn't cleared.
2. high order balance fallbacks to order-0. quote from Mel: At the
end of balance_pgdat(), kswapd uses the following logic;
If reclaiming at high order {
for each zone {
if all_unreclaimable
skip
if watermark is not met
order = 0
loop again
/* watermark is met */
clear congested
}
}
i.e. it clears ZONE_CONGESTED if it the zone is balanced. if not,
it restarts balancing at order-0. However, if the higher zones are
balanced for order-0, kswapd will miss clearing ZONE_CONGESTED as
that only happens after a zone is shrunk. This can mean that
wait_iff_congested() stalls unnecessarily.
This patch makes kswapd clear ZONE_CONGESTED during its initial
highmem->dma scan for zones that are already balanced.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4d3e9e763 upstream.
Stable note: Not tracked in Bugzilla. This patch augments an earlier commit
that avoids scanning priority being artificially raised. The older
fix was particularly important for small memcgs to avoid calling
wait_iff_congested() unnecessarily.
Without swap, anonymous pages are not scanned. As such, they should not
count when considering force-scanning a small target if there is no swap.
Otherwise, targets are not force-scanned even when their effective scan
number is zero and the other conditions--kswapd/memcg--apply.
This fixes 246e87a939 ("memcg: fix get_scan_count() for small
targets").
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 938929f14c upstream.
Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 .
Large machines with 1TB or more of RAM take a long time to boot
without this patch and may spew out soft lockup warnings.
When min_free_kbytes is updated, some pageblocks are marked
MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early
in boot but on large machines with 1TB of memory, this has been reported
to delay boot times, probably due to the NUMA distances involved.
The bulk of the work is due to calling calling pageblock_is_reserved() an
unnecessary amount of times and accessing far more struct page metadata
than is necessary. This patch significantly reduces the amount of work
done by setup_zone_migrate_reserve() improving boot times on 1TB machines.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2bbcb87883 upstream.
Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=721039 .
Without the patch, memory hot-add can fail for kernel configurations
that do not set CONFIG_SPARSEMEM_VMEMMAP.
(Resending as I am not seeing it in -next so maybe it got lost)
mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
It is expected that memory being brought online is PageReserved
similar to what happens when the page allocator is being brought up.
Memory is onlined in "memory blocks" which consist of one or more
sections. Unfortunately, the code that verifies PageReserved is
currently assuming that the memmap backing all these pages is virtually
contiguous which is only the case when CONFIG_SPARSEMEM_VMEMMAP is set.
As a result, memory hot-add is failing on those configurations with
the message;
kernel: section number XXX page number 256 not reserved, was it already online?
This patch updates the PageReserved check to lookup struct page once
per section to guarantee the correct struct page is being checked.
[Check pages within sections properly: rientjes@google.com]
[original patch by: nfont@linux.vnet.ibm.com]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit a1cb2c60dd upstream.
Stable note: Not tracked on Bugzilla. This patch is known to make a big
difference to tmpfs performance on larger machines.
This was found to adversely affect tmpfs I/O performance.
Tests run on a 640 cpu UV system.
With 120 threads doing parallel writes, each to different tmpfs mounts:
No patch: ~300 MB/sec
With vm_stat alignment: ~430 MB/sec
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
commit 751f188dd5 upstream.
This patch fixes a crash when a discard request is sent during mirror
recovery.
Firstly, some background. Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
simultaneously recovered regions (by default the semaphore value is 1,
so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
__rh_recovery_prepare asks the log driver for the next region to
recover. Then, it sets the region state to DM_RH_RECOVERING. If there
are no pending I/Os on this region, the region is added to
quiesced_regions list. If there are pending I/Os, the region is not
added to any list. It is added to the quiesced_regions list later (by
dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
flight on this region. The region is popped from the list in
dm_rh_recovery_start function. Then, a kcopyd job is started in the
recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
dm_rh_recovery_end. dm_rh_recovery_end adds the region to
recovered_regions or failed_recovered_regions list (depending on
whether the copy operation was successful or not).
The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.
Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.
There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.
These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb (dm raid1: support discard).
In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING). However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.
This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.
Reference: https://bugzilla.redhat.com/837607
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6727932cf upstream.
UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).
The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.
There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:
UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0
The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.
The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1c7e7f6c07 upstream.
Offlining memory may block forever, waiting for kswapd() to wake up
because kswapd() does not check the event kthread->should_stop before
sleeping.
The proper pattern, from Documentation/memory-barriers.txt, is:
--- waker ---
event_indicated = 1;
wake_up_process(event_daemon);
--- sleeper ---
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (event_indicated)
break;
schedule();
}
set_current_state() may be wrapped by:
prepare_to_wait();
In the kswapd() case, event_indicated is kthread->should_stop.
=== offlining memory (waker) ===
kswapd_stop()
kthread_stop()
kthread->should_stop = 1
wake_up_process()
wait_for_completion()
=== kswapd_try_to_sleep (sleeper) ===
kswapd_try_to_sleep()
prepare_to_wait()
.
.
schedule()
.
.
finish_wait()
The schedule() needs to be protected by a test of kthread->should_stop,
which is wrapped by kthread_should_stop().
Reproducer:
Do heavy file I/O in background.
Do a memory offline/online in a tight loop
Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cd60042cc1 upstream.
When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.
Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.
Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org>
Reported-by: Bill Robertson <bill_robertson@debortoli.com.au>
Reported-by: Dion Edwards <dion_edwards@debortoli.com.au>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 3e997130bd
The leap second rework unearthed another issue of inconsistent data.
On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.
This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.
Add the missing update call, so all the data is consistent everywhere.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 5baefd6d84
The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.
clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.
In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.
So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.
ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().
The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.
This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of f6c06abfb3
To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 4873fa070a
The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.
The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix
Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of f55a6faa38
clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().
For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.
Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.
[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
rid of all this ifdeffery ASAP ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of dd48d708ff
When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a backport of 6b43ae8a61
This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.
Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
write_seqlock_irq(&xtime_lock);
process_adjtimex_modes()
process_adj_status()
ntp_start_leap_timer()
hrtimer_start()
hrtimer_reprogram()
tick_program_event()
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock); [DEADLOCK]
This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.
NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.
Original commit message below:
Since commit 7dffa3c673 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.
Thomas diagnosed this as the following pattern:
CPU 0 CPU 1
do_adjtimex()
spin_lock_irq(&ntp_lock);
process_adjtimex_modes(); timer_interrupt()
process_adj_status(); do_timer()
ntp_start_leap_timer(); write_lock(&xtime_lock);
hrtimer_start(); update_wall_time();
hrtimer_reprogram(); ntp_tick_length()
tick_program_event() spin_lock(&ntp_lock);
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock);
This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.
The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ) after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).
This patch applies on top of tip/timers/core.
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Diagnoised-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f8cdddb8d6 upstream.
Don't validate interface combinations on a stopped
interface. Otherwise we might end up being able to
create a new interface with a certain type, but
won't be able to change an existing interface
into that type.
This also skips some other functions when
interface is stopped and changing interface type.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[Fixes regression introduced by cherry pick of 463454b5db]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit e76b8ee25e upstream.
I couldn't find the vendor ID in any of the online databases, but this
mat has a Pump It Up logo on the top side of the controller compartment,
and a disclaimer stating that Andamiro will not be liable on the bottom.
Signed-off-by: Yuri Khan <yurivkhan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0efa8f23a upstream.
SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit efd821182c upstream.
On rt2x00_dmastart() we increase index specified by Q_INDEX and on
rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries
between Q_INDEX_DONE and Q_INDEX are those we currently process in the
hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can
submit to the hardware.
According to that fix rt2x00usb_kick_queue(), as we need to submit RX
entries that are not processed by the hardware. It worked before only
for empty queue, otherwise was broken.
Note that for TX queues indexes ordering are ok. We need to kick entries
that have filled skb, but was not submitted to the hardware, i.e.
started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set.
From practical standpoint this fixes RX queue stall, usually reproducible
in AP mode, like for example reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=828824
Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy>
Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com>
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 05d290d66b upstream.
If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns. In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted. Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.
The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come. (On my systems, this happens
pretty reliably within about 5 to 500 iterations. Others report that
it manages to loop ~forever sometimes; YMMV.)
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)
void handler(int signum) {}
int main()
{
struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
CHECK(sigaction(SIGCHLD, &act, NULL));
CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
for (;;) {
int fd;
pid_t pid;
putc('.', stderr);
CHECK(pid = fork());
if (pid == 0) {
CHECK(fd = open("fifo", O_RDONLY));
_exit(0);
}
CHECK(fd = open("fifo", O_WRONLY));
CHECK(close(fd));
CHECK(waitpid(pid, NULL, 0));
}
}
This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:
http://bugs.debian.org/678852
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88ca518b0b upstream.
intel_ips driver spews the warning message
"ME failed to update for more than 1s, likely hung"
at each second endlessly on HP ProBook laptops with IronLake.
As this has never worked, better to blacklist the driver for now.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 596fd46268 upstream.
We don't need to open code the divide function, just use div_u64 that
already exists and do the same job. While this is a straightforward
clean up, there is more to that, the real motivation for this.
While building on a cross compiling environment in armel, using gcc
4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), I was getting the following build
error:
ERROR: "__aeabi_uldivmod" [drivers/mtd/nand/nandsim.ko] undefined!
After investigating with objdump and hand built assembly version
generated with the compiler, I narrowed __aeabi_uldivmod as being
generated from the divide function. When nandsim.c is built with
-fno-inline-functions-called-once, that happens when
CONFIG_DEBUG_SECTION_MISMATCH is enabled, the do_div optimization in
arch/arm/include/asm/div64.h doesn't work as expected with the open
coded divide function: even if the do_div we are using doesn't have a
constant divisor, the compiler still includes the else parts of the
optimized do_div macro, and translates the divisions there to use
__aeabi_uldivmod, instead of only calling __do_div_asm -> __do_div64 and
optimizing/removing everything else out.
So to reproduce, gcc 4.6 plus CONFIG_DEBUG_SECTION_MISMATCH=y and
CONFIG_MTD_NAND_NANDSIM=m should do it, building on armel.
After this change, the compiler does the intended thing even with
-fno-inline-functions-called-once, and optimizes out as expected the
constant handling in the optimized do_div on arm. As this also avoids a
build issue, I'm marking for Stable, as I think is applicable for this
case.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 91f68c89d8 upstream.
Commit 080399aaaf ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.
The problem was initially reported by Torsten Hilbrich here:
https://lkml.org/lkml/2012/6/18/54
and also reported independently here:
http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
and then Richard W.M. Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue. This patch has been
confirmed to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=835019
The main problem is here, in __getblk_slow:
for (;;) {
struct buffer_head * bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page. I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding. However, we also continue to loop for other cases, like the
block lying beond the end of the disk. So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).
The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
[ Jens is on vacation, taking this directly - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41002f8dd5 upstream.
We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.
Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f68b4c2e1 upstream.
Current WARN msg is only for the ati_ixp4x0 board, while this function
is used by mulitple platforms. So this one board specific warning
is not appropriate any more.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae10ccdc30 upstream.
Currently when acpi_skip_timer_override is set, it only cover the
(source_irq == 0 && global_irq == 2) cases. While there is also
platform which need use this option and its global_irq is not 2.
This patch will extend acpi_skip_timer_override to cover all
timer overriding cases as long as the source irq is 0.
This is the first part of a fix to kernel bug bugzilla 40002:
"IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002
Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9ab4233dd0 upstream.
Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).
The bug was introduced by commit 90ed52ebe4 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")
[bwh: Backported to 3.2:
- Adjust context
- madvise_remove() calls vmtruncate_range(), not do_fallocate()]
[luto: Backported to 3.0: Adjust context]
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fea9f718b3 upstream.
There is a bug in the below scenario for !CONFIG_MMU:
1. create a new file
2. mmap the file and write to it
3. read the file can't get the correct value
Because
sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()
which causes the page to be zeroed.
Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bf2bba375 upstream.
If page migration cannot charge the temporary page to the memcg,
migrate_pages() will return -ENOMEM. This isn't considered in memory
compaction however, and the loop continues to iterate over all
pageblocks trying to isolate and migrate pages. If a small number of
very large memcgs happen to be oom, however, these attempts will mostly
be futile leading to an enormous amout of cpu consumption due to the
page migration failures.
This patch will short circuit and fail memory compaction if
migrate_pages() returns -ENOMEM. COMPACT_PARTIAL is returned in case
some migrations were successful so that the page allocator will retry.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc448a18ae upstream.
If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored. We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location. This results in an error, and the recovery
aborts.
When we get to that last chunk we should just stop - there is nothing
more to do anyway.
This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.
Reported-by: Christian Balzer <chibi@gol.com>
Tested-by: Christian Balzer <chibi@gol.com>
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c0544e255 upstream.
In chunk_aligned_read() we are adding data_offset before calling
is_badblock. But is_badblock also adds data_offset, so that is bad.
So move the addition of data_offset to after the call to
is_badblock.
This bug was introduced by commit 31c176ecdf
md/raid5: avoid reading from known bad blocks.
which first appeared in 3.0. So that patch is suitable for any
-stable kernel from 3.0.y onwards. However it will need minor
revision for most of those (as the comment didn't appear until
recently).
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
[bwh: Backported to 3.2: ignored missing comment]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ad3341130 upstream.
It makes sense to label "Digital Thermal Sensor" as "DTS", but
unfortunately the string "dts" was already used for "Debug Store", and
/proc/cpuinfo is a user space ABI.
Therefore, rename this to "dtherm".
This conflict went into mainline via the hwmon tree without any x86
maintainer ack, and without any kind of hint in the subject.
a4659053 x86/hwmon: fix initialization of coretemp
Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
[bwh: Backported to 3.2: drop the coretemp device table change]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32587371ad upstream.
Fix a regression introduced by 7eaceaccab ("block: remove per-queue
plugging"). In that patch, Jens removed the whole mm_unplug_device()
function, which used to be the trigger to make umem start to work.
We need to implement unplugging to make umem start to work, or I/O will
never be triggered.
Signed-off-by: Tao Guo <Tao.Guo@emc.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Shaohua Li <shli@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0fde0a8cfd upstream.
Fix:
BUG: sleeping function called from invalid context at kernel/workqueue.c:2547
in_atomic(): 1, irqs_disabled(): 0, pid: 629, name: wpa_supplicant
2 locks held by wpa_supplicant/629:
#0: (rtnl_mutex){+.+.+.}, at: [<c08b2b84>] rtnl_lock+0x14/0x20
#1: (&trigger->leddev_list_lock){.+.?..}, at: [<c0867f41>] led_trigger_event+0x21/0x80
Pid: 629, comm: wpa_supplicant Not tainted 3.3.0-0.rc3.git5.1.fc17.i686
Call Trace:
[<c046a9f6>] __might_sleep+0x126/0x1d0
[<c0457d6c>] wait_on_work+0x2c/0x1d0
[<c045a09a>] __cancel_work_timer+0x6a/0x120
[<c045a160>] cancel_delayed_work_sync+0x10/0x20
[<f7dd3c22>] rtl8187_led_brightness_set+0x82/0xf0 [rtl8187]
[<c0867f7c>] led_trigger_event+0x5c/0x80
[<f7ff5e6d>] ieee80211_led_radio+0x1d/0x40 [mac80211]
[<f7ff3583>] ieee80211_stop_device+0x13/0x230 [mac80211]
Removing _sync is ok, because if led_on work is currently running
it will be finished before led_off work start to perform, since
they are always queued on the same mac80211 local->workqueue.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=795176
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fab363b5ff upstream.
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but
the two bits have relationship. A delayed stripe can be moved to hold list only
when preread active stripe count is below IO_THRESHOLD. If a stripe has both
the bits set, such stripe will be in delayed list and preread count not 0,
which will make such stripe never leave delayed list.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d550dda192 upstream.
This is a tiny, but important, patch to vhost.
Vhost's worker thread only called schedule() when it had no work to do, and
it wanted to go to sleep. But if there's always work to do, e.g., the guest
is running a network-intensive program like netperf with small message sizes,
schedule() was *never* called. This had several negative implications (on
non-preemptive kernels):
1. Passing time was not properly accounted to the "vhost" process (ps and
top would wrongly show it using zero CPU time).
2. Sometimes error messages about RCU timeouts would be printed, if the
core running the vhost thread didn't schedule() for a very long time.
3. Worst of all, a vhost thread would "hog" the core. If several vhost
threads need to share the same core, typically one would get most of the
CPU time (and its associated guest most of the performance), while the
others hardly get any work done.
The trivial solution is to add
if (need_resched())
schedule();
After doing every piece of work. This will not do the heavy schedule() all
the time, just when the timer interrupt decided a reschedule is warranted
(so need_resched returns true).
Thanks to Abel Gordon for this patch.
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 51c9e6c773 upstream, but modified
to get this to apply on 3.0.
If the user chooses to say "no" to CONFIG_USB_XHCI_HCD on a system
with an Intel Panther Point chipset, the PCI quirks code or the EHCI
driver will switch the ports over to the xHCI host, but the xHCI driver
will never load. The ports will be powered off and seem "dead" to the
user.
Fix this by only switching the ports over if CONFIG_USB_XHCI_HCD is
either compiled in, or compiled as a module.
This patch should be backported to the 3.0 stable kernel, since it
contains the commit 69e848c209 "Intel
xhci: Support EHCI/xHCI port switching."
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reported-by: Eric Anholt <eric.anholt@intel.com>
Reported-by: David Bein <d.bein@f5.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbf0e4c725 upstream.
Quite a few ASUS computers experience a nasty problem, related to the
EHCI controllers, when going into system suspend. It was observed
that the problem didn't occur if the controllers were not put into the
D3 power state before starting the suspend, and commit
151b612847 (USB: EHCI: fix crash during
suspend on ASUS computers) was created to do this.
It turned out this approach messed up other computers that didn't have
the problem -- it prevented USB wakeup from working. Consequently
commit c2fb8a3fa2 (USB: add
NO_D3_DURING_SLEEP flag and revert 151b612847) was merged; it
reverted the earlier commit and added a whitelist of known good board
names.
Now we know the actual cause of the problem. Thanks to AceLan Kao for
tracking it down.
According to him, an engineer at ASUS explained that some of their
BIOSes contain a bug that was added in an attempt to work around a
problem in early versions of Windows. When the computer goes into S3
suspend, the BIOS tries to verify that the EHCI controllers were first
quiesced by the OS. Nothing's wrong with this, but the BIOS does it
by checking that the PCI COMMAND registers contain 0 without checking
the controllers' power state. If the register isn't 0, the BIOS
assumes the controller needs to be quiesced and tries to do so. This
involves making various MMIO accesses to the controller, which don't
work very well if the controller is already in D3. The end result is
a system hang or memory corruption.
Since the value in the PCI COMMAND register doesn't matter once the
controller has been suspended, and since the value will be restored
anyway when the controller is resumed, we can work around the BIOS bug
simply by setting the register to 0 during system suspend. This patch
(as1590) does so and also reverts the second commit mentioned above,
which is now unnecessary.
In theory we could do this for every PCI device. However to avoid
introducing new problems, the patch restricts itself to EHCI host
controllers.
Finally the affected systems can suspend with USB wakeup working
properly.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728
Based-on-patch-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Javier Marcet <jmarcet@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Oleksij Rempel <bug-track@fisher-privat.net>
Tested-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9fe79d7600 upstream.
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.
The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8dc6780587 upstream.
File operations on /dev/ecryptfs would BUG() when the operations were
performed by processes other than the process that originally opened the
file. This could happen with open files inherited after fork() or file
descriptors passed through IPC mechanisms. Rather than calling BUG(), an
error code can be safely returned in most situations.
In ecryptfs_miscdev_release(), eCryptfs still needs to handle the
release even if the last file reference is being held by a process that
didn't originally open the file. ecryptfs_find_daemon_by_euid() will not
be successful, so a pointer to the daemon is stored in the file's
private_data. The private_data pointer is initialized when the miscdev
file is opened and only used when the file is released.
https://launchpad.net/bugs/994247
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 863555be0c upstream.
Use rcu_dereference_protected to tell rcu that the ft_lport_lock
is held during ft_lport_create. This resolved "suspicious RCU usage"
warnings when debugging options are turned on.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 48f8b64129 upstream.
The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.
Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 332a2e1244 upstream.
We already use them for openat() and friends, but fchdir() also wants to
be able to use O_PATH file descriptors. This should make it comparable
to the O_SEARCH of Solaris. In particular, O_PATH allows you to access
(not-quite-open) a directory you don't have read persmission to, only
execute permission.
Noticed during development of multithread support for ksh93.
Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 925839243d upstream.
Currently we check the sequence number of last packet received
against start_win. If a sequence hole is detected, start_win is
updated to next sequence number.
Since the rx sequence number is initialized to 0, a corner case
exists when BA setup happens immediately after association. As
0 is a valid sequence number, start_win gets increased to 1
incorrectly. This causes the first packet with sequence number 0
being dropped.
Initialize rx sequence number as 0xffff and skip adjusting
start_win if the sequence number remains 0xffff. The sequence
number will be updated once the first packet is received.
Signed-off-by: Stone Piao <piaoyun@marvell.com>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Kiran Divekar <dkiran@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4b5ebccc40 upstream.
When receiving an "individually addressed" action frame, the
receiver is required to return it to the sender. mac80211
gets this wrong as it also returns group addressed (mcast)
frames to the sender. Fix this and update the reference to
the new 802.11 standards version since things were shuffled
around significantly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e734568b67 upstream.
The OProfile perf backend uses a static array to keep track of the
perf events on the system. When compiling with CONFIG_CPUMASK_OFFSTACK=y
&& SMP, nr_cpumask_bits is not a compile-time constant and the build
will fail with:
oprofile_perf.c:28: error: variably modified 'perf_events' at file scope
This patch uses NR_CPUs instead of nr_cpumask_bits for the array
initialisation. If this causes space problems in the future, we can
always move to dynamic allocation for the events array.
Cc: Matt Fleming <matt@console-pimps.org>
Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3fcc8f9682 upstream.
This patch adds 10 device IDs for CP210x based devices from the following manufacturers:
Timewave
Clipsal
Festo
Link Instruments
Signed-off-by: Craig Shelley <craig@microtron.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb3979f64d upstream.
Distribution kernel maintainers routinely backport fixes for users that
were deemed important but not "something critical" as defined by the
rules. To users of these kernels they are very serious and failing to fix
them reduces the value of -stable.
The problem is that the patches fixing these issues are often subtle and
prone to regressions in other ways and need greater care and attention.
To combat this, these "serious" backports should have a higher barrier
to entry.
This patch relaxes the rules to allow a distribution maintainer to merge
to -stable a backported patch or small series that fixes a "serious"
user-visible performance issue. They should include additional information on
the user-visible bug affected and a link to the bugzilla entry if available.
The same rules about the patch being already in mainline still apply.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f6b54f083c upstream.
This is the 2nd part of fix for kernel bugzilla 40002:
"IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002
The root cause is the buggy FW, whose ACPI tables assign the GSI 16
to 2 irqs 0 and 16(VGA), and the VGA is the right owner of GSI 16.
So add a quirk to ignore the irq0 overriding GSI 16 for the
FUJITSU SIEMENS AMILO PRO V2030 platform will solve this issue.
Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5f16012610 upstream.
The acpi_pad driver can get stuck in destroy_power_saving_task()
waiting for kthread_stop() to stop a power_saving thread. The problem
is that the isolated_cpus_lock mutex is owned when
destroy_power_saving_task() calls kthread_stop(), which waits for a
power_saving thread to end, and the power_saving thread tries to
acquire the isolated_cpus_lock when it calls round_robin_cpu(). This
patch fixes the issue by making round_robin_cpu() use its own mutex.
https://bugzilla.kernel.org/show_bug.cgi?id=42981
Signed-off-by: Stuart Hayes <Stuart_Hayes@Dell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6db65cbb94 upstream.
This patch fixes the problem on some HP desktop machines with eDP
which give blank screens after S3 resume.
It turned out that BLC_PWM_CPU_CTL must be written after
BLC_PWM_CPU_CTL2. Otherwise it doesn't take effect on these
SNB machines.
Tested with 3.5-rc3 kernel.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49233
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9bd0c15fcf upstream.
nv_two_heads() was never meant to be used outside of pre-nv50 code. The
code checks for >= NV_10 for 2 CRTCs, then downgrades a few specific
chipsets to 1 CRTC based on (pci_device & 0x0ff0).
The breakage example seen is on GTX 560Ti, with a pciid of 0x1200, which
gets detected as an NV20 (0x020x) with 1 CRTC by nv_two_heads(), causing
memory corruption because there's actually 2 CRTCs..
This switches fbcon to use the CRTC count directly from the mode_config
structure, which will also fix the same issue on Kepler boards which have
4 CRTCs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b196a4980f upstream.
We need to initialize this to false, because the is_rb callback only
ever sets it to true.
Noticed while reading through the code.
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b6305567e7 upstream.
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.
This commit forces the delayed updates to run so the
replay code can find any modifications done. It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c9fe573a65 upstream.
In sound/soc/codecs/tlv320aic3x.c
data = snd_soc_read(codec, AIC3X_PLL_PROGA_REG);
snd_soc_write(codec, AIC3X_PLL_PROGA_REG,
data | (pll_p << PLLP_SHIFT));
In the above code, pll-p value is OR'ed with previous value without
clearing it. Bug is not seen if pll-p value doesn't change across
Sampling frequency.
However on some platforms (like AM335x EVM-SK), pll-p may have different
values across different sampling frequencies. In such case, above code
configures the pll with a wrong value.
Because of this bug, when a audio stream is played with pll value
different from previous stream, audio is heard as differently(like its
stretched).
Signed-off-by: Hebbar, Gururaja <gururaja.hebbar@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bcb7ad7bcb upstream.
steps to recreate:
load latest ath9k driver with AR9485
stop the network-manager and wpa_supplicant
bring the interface up
Call Trace:
[<ffffffffa0517490>] ? ath_hw_check+0xe0/0xe0 [ath9k]
[<ffffffff812cd1e8>] __const_udelay+0x28/0x30
[<ffffffffa03bae7a>] ar9003_get_pll_sqsum_dvc+0x4a/0x80 [ath9k_hw]
[<ffffffffa05174eb>] ath_hw_pll_work+0x5b/0xe0 [ath9k]
[<ffffffff810744fe>] process_one_work+0x11e/0x470
[<ffffffff8107530f>] worker_thread+0x15f/0x360
[<ffffffff810751b0>] ? manage_workers+0x230/0x230
[<ffffffff81079af3>] kthread+0x93/0xa0
[<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
[<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815fd3a0>] ? gs_change+0x13/0x13
ensure that the PLL-WAR for AR9485/AR9340 is executed only if the STA is
associated (or) IBSS/AP mode had started beaconing. Ideally this WAR
is needed to recover from some rare beacon stuck during stress testing.
Before the STA is associated/IBSS had started beaconing, PLL4(0x1618c)
always seem to have zero even though we had configured PLL3(0x16188) to
query about PLL's locking status. When we keep on polling infinitely PLL4's
8th bit(ie check for PLL locking measurements is done), machine hangs
due to softlockup.
fixes https://bugzilla.redhat.com/show_bug.cgi?id=811142
Reported-by: Rolf Offermanns <rolf.offermanns@gmx.net>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1df2ae31c7 upstream.
Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit adee11b208 upstream.
Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbb24a3a91 upstream.
A gc-inode is a pseudo inode used to buffer the blocks to be moved by
garbage collection.
Block caches of gc-inodes must be cleared every time a garbage collection
function (nilfs_clean_segments) completes. Otherwise, stale blocks
buffered in the caches may be wrongly reused in successive calls of the GC
function.
For user files, this is not a problem because their gc-inodes are
distinguished by a checkpoint number as well as an inode number. They
never buffer different blocks if either an inode number, a checkpoint
number, or a block offset differs.
However, gc-inodes of sufile, cpfile and DAT file can store different data
for the same block offset. Thus, the nilfs_clean_segments function can
move incorrect block for these meta-data files if an old block is cached.
I found this is really causing meta-data corruption in nilfs.
This fixes the issue by ensuring cache clear of gc-inodes and resolves
reported GC problems including checkpoint file corruption, b-tree
corruption, and the following warning during GC.
nilfs_palloc_freev: entry number 307234 already freed.
...
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac852edb47 upstream.
Key lookups may call read_smc() with a fixed-length key string,
and if the lookup fails, trailing stack content may appear in the
kernel log. Fixed with this patch.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 954fba0274 ]
Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :
"skb->len += len;" instead of "skb_put(skb, len);"
Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
only packet headers would be copied, leaving garbage in the payload.
However the skb_realloc_headroom() must be avoided as much as possible
since it requires memory and netpoll tries hard to work even if memory
is exhausted (using a pool of preallocated skbs)
It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
which happens to work for typicall drivers but not all.
Right thing is to use LL_RESERVED_SPACE(dev)
(And also add dev->needed_tailroom of tailroom)
This patch combines both fixes.
Many thanks to Bogdan for raising this issue.
Reported-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ff0feac88 ]
The newer flavors of Yukon II use a different method for receive
checksum offload. This is indicated in the driver by the SKY2_HW_NEW_LE
flag. On these newer chips, the BMU_ENA_RX_CHKSUM should not be set.
The driver would get incorrectly toggle the bit, enabling the old
checksum logic on these chips and cause a BUG_ON() assertion. If
receive checksum was toggled via ethtool.
Reported-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d189634eca ]
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.
This opens up a short time frame to access fib_table_hash with its pants
down.
Move the registration of the proc files to a later point in the init
order to avoid the race.
Tested :-)
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ee31c6898 ]
In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping. This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.
When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit. In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.
Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.
If we want to save skb->queue_mapping into skb->cb[], best place is to
add a field in struct qdisc_skb_cb, to make sure it wont conflict with
other layers (eg : Qdiscc, Infiniband...)
This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8
bytes :
netem qdisc for example assumes it can store an u64 in it, without
misalignment penalty.
Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[].
The largest user is CHOKe and it fills it.
Based on a previous patch from Tom Herbert.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Tom Herbert <therbert@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 149ddd83a9 ]
This ensures that bridges created with brctl(8) or ioctl(2) directly
also carry IFLA_LINKINFO when dumped over netlink. This also allows
to create a bridge with ioctl(2) and delete it with RTM_DELLINK.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 16b0dc29c1 ]
Trying to "modprobe dummy numdummies=30000" triggers :
INFO: rcu_sched self-detected stall on CPU { 8} (t=60000 jiffies)
After this splat, RTNL is locked and reboot is needed.
We must call cond_resched() to avoid this, even holding RTNL.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 20e2a86485 ]
When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system
receives a CIPSO tagged packet it is dropped (cipso_v4_validate()
returns non-zero). In most cases this is the correct and desired
behavior, however, in the case where we are simply forwarding the
traffic, e.g. acting as a network bridge, this becomes a problem.
This patch fixes the forwarding problem by providing the basic CIPSO
validation code directly in ip_options_compile() without the need for
the NetLabel or CIPSO code. The new validation code can not perform
any of the CIPSO option label/value verification that
cipso_v4_validate() does, but it can verify the basic CIPSO option
format.
The behavior when NetLabel is enabled is unchanged.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cc9b17ad29 ]
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7deabca0ac upstream.
We can stall RCU processing on SMP platforms if a CPU sits in its idle
loop for a long time. This happens because we don't call irq_enter()
and irq_exit() around generic_smp_call_function_interrupt() and
friends. Add the necessary calls, and remove the one from within
ipi_timer(), so that they're all in a common place.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[add irq_enter()/irq_exit() in do_local_timer]
Signed-off-by: UCHINO Satoshi <satoshi.uchino@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bc1d770291 upstream.
We have a bug report where the kernel hits a warning in the cpumask
code:
WARNING: at include/linux/cpumask.h:107
Which is:
WARN_ON_ONCE(cpu >= nr_cpumask_bits);
The backtrace is:
cpu_cmd
cmds
xmon_core
xmon
die
xmon is iterating through 0 to NR_CPUS. I'm not sure why we are still
open coding this but iterating above nr_cpu_ids is definitely a bug.
This patch iterates through all possible cpus, in case we issue a
system reset and CPUs in an offline state call in.
Perhaps the old code was trying to handle CPUs that were in the
partition but were never started (eg kexec into a kernel with an
nr_cpus= boot option). They are going to die way before we get into
xmon since we haven't set any kernel state up for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3a3dd074f upstream.
TEAC's UD-H01 (and probably other devices) have a gap in the interface
number allocation of their descriptors:
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 220
bNumInterfaces 3
[...]
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
[...]
Interface Association:
bLength 8
bDescriptorType 11
bFirstInterface 2
bInterfaceCount 2
bFunctionClass 1 Audio
bFunctionSubClass 0
bFunctionProtocol 32
iFunction 4
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 2
bAlternateSetting 0
[...]
Once a configuration is selected, usb_set_configuration() walks the
known interfaces of a given configuration and calls find_iad() on
each of them to set the interface association pointer the interface
is included in.
The problem here is that the loop variable is taken for the interface
number in the comparison logic that gathers the association. Which is
fine as long as the descriptors are sane.
In the case above, however, the logic gets out of sync and the
interface association fields of all interfaces beyond the interface
number gap are wrong.
Fix this by passing the interface's bInterfaceNumber to find_iad()
instead.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: bEN <ml_all@circa.be>
Reported-by: Ivan Perrone <ivanperrone@hotmail.com>
Tested-by: ivan perrone <ivanperrone@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 954c3f8a5f upstream.
We need to make sure that the USB serial driver we find
matches the USB driver whose probe we are currently
executing. Otherwise we will end up with USB serial
devices bound to the correct serial driver but wrong
USB driver.
An example of such cross-probing, where the usbserial_generic
USB driver has found the sierra serial driver:
May 29 18:26:15 nemi kernel: [ 4442.559246] usbserial_generic 4-4:1.0: Sierra USB modem converter detected
May 29 18:26:20 nemi kernel: [ 4447.556747] usbserial_generic 4-4:1.2: Sierra USB modem converter detected
May 29 18:26:25 nemi kernel: [ 4452.557288] usbserial_generic 4-4:1.3: Sierra USB modem converter detected
sysfs view of the same problem:
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/sierra/
total 0
--w------- 1 root root 4096 May 29 18:23 bind
lrwxrwxrwx 1 root root 0 May 29 18:23 module -> ../../../../module/sierra
-rw-r--r-- 1 root root 4096 May 29 18:23 new_id
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/ttyUSB0
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB1 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2/ttyUSB1
lrwxrwxrwx 1 root root 0 May 29 18:32 ttyUSB2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3/ttyUSB2
--w------- 1 root root 4096 May 29 18:23 uevent
--w------- 1 root root 4096 May 29 18:23 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb/drivers/usbserial_generic/
total 0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.0 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.2 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.2
lrwxrwxrwx 1 root root 0 May 29 18:33 4-4:1.3 -> ../../../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.3
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
bjorn@nemi:~$ ls -l /sys/bus/usb-serial/drivers/generic/
total 0
--w------- 1 root root 4096 May 29 18:33 bind
lrwxrwxrwx 1 root root 0 May 29 18:33 module -> ../../../../module/usbserial
-rw-r--r-- 1 root root 4096 May 29 18:33 new_id
--w------- 1 root root 4096 May 29 18:22 uevent
--w------- 1 root root 4096 May 29 18:33 unbind
So we end up with a mismatch between the USB driver and the
USB serial driver. The reason for the above is simple: The
USB driver probe will succeed if *any* registered serial
driver matches, and will use that serial driver for all
serial driver functions.
This makes ref counting go wrong. We count the USB driver
as used, but not the USB serial driver. This may result
in Oops'es as demonstrated by Johan Hovold <jhovold@gmail.com>:
[11811.646396] drivers/usb/serial/usb-serial.c: get_free_serial 1
[11811.646443] drivers/usb/serial/usb-serial.c: get_free_serial - minor base = 0
[11811.646460] drivers/usb/serial/usb-serial.c: usb_serial_probe - registering ttyUSB0
[11811.646766] usb 6-1: pl2303 converter now attached to ttyUSB0
[11812.264197] USB Serial deregistering driver FTDI USB Serial Device
[11812.264865] usbcore: deregistering interface driver ftdi_sio
[11812.282180] USB Serial deregistering driver pl2303
[11812.283141] pl2303 ttyUSB0: pl2303 converter now disconnected from ttyUSB0
[11812.283272] usbcore: deregistering interface driver pl2303
[11812.301056] USB Serial deregistering driver generic
[11812.301186] usbcore: deregistering interface driver usbserial_generic
[11812.301259] drivers/usb/serial/usb-serial.c: usb_serial_disconnect
[11812.301823] BUG: unable to handle kernel paging request at f8e7438c
[11812.301845] IP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.301871] *pde = 357ef067 *pte = 00000000
[11812.301957] Oops: 0000 [#1] PREEMPT SMP
[11812.301983] Modules linked in: usbserial(-) [last unloaded: pl2303]
[11812.302008]
[11812.302019] Pid: 1323, comm: modprobe Tainted: G W 3.4.0-rc7+ #101 Dell Inc. Vostro 1520/0T816J
[11812.302115] EIP: 0060:[<f8e38445>] EFLAGS: 00010246 CPU: 1
[11812.302130] EIP is at usb_serial_disconnect+0xb5/0x100 [usbserial]
[11812.302141] EAX: f508a180 EBX: f508a180 ECX: 00000000 EDX: f8e74300
[11812.302151] ESI: f5050800 EDI: 00000001 EBP: f5141e78 ESP: f5141e58
[11812.302160] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[11812.302170] CR0: 8005003b CR2: f8e7438c CR3: 34848000 CR4: 000007d0
[11812.302180] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[11812.302189] DR6: ffff0ff0 DR7: 00000400
[11812.302199] Process modprobe (pid: 1323, ti=f5140000 task=f61e2bc0 task.ti=f5140000)
[11812.302209] Stack:
[11812.302216] f8e3be0f f8e3b29c f8e3ae00 00000000 f513641c f5136400 f513641c f507a540
[11812.302325] f5141e98 c133d2c1 00000000 00000000 f509c400 f513641c f507a590 f5136450
[11812.302372] f5141ea8 c12f0344 f513641c f507a590 f5141ebc c12f0c67 00000000 f507a590
[11812.302419] Call Trace:
[11812.302439] [<c133d2c1>] usb_unbind_interface+0x51/0x190
[11812.302456] [<c12f0344>] __device_release_driver+0x64/0xb0
[11812.302469] [<c12f0c67>] driver_detach+0x97/0xa0
[11812.302483] [<c12f001c>] bus_remove_driver+0x6c/0xe0
[11812.302500] [<c145938d>] ? __mutex_unlock_slowpath+0xcd/0x140
[11812.302514] [<c12f0ff9>] driver_unregister+0x49/0x80
[11812.302528] [<c1457df6>] ? printk+0x1d/0x1f
[11812.302540] [<c133c50d>] usb_deregister+0x5d/0xb0
[11812.302557] [<f8e37c55>] ? usb_serial_deregister+0x45/0x50 [usbserial]
[11812.302575] [<f8e37c8d>] usb_serial_deregister_drivers+0x2d/0x40 [usbserial]
[11812.302593] [<f8e3a6e2>] usb_serial_generic_deregister+0x12/0x20 [usbserial]
[11812.302611] [<f8e3acf0>] usb_serial_exit+0x8/0x32 [usbserial]
[11812.302716] [<c1080b48>] sys_delete_module+0x158/0x260
[11812.302730] [<c110594e>] ? mntput+0x1e/0x30
[11812.302746] [<c145c3c3>] ? sysenter_exit+0xf/0x18
[11812.302746] [<c107777c>] ? trace_hardirqs_on_caller+0xec/0x170
[11812.302746] [<c145c390>] sysenter_do_call+0x12/0x36
[11812.302746] Code: 24 02 00 00 e8 dd f3 20 c8 f6 86 74 02 00 00 02 74 b4 8d 86 4c 02 00 00 47 e8 78 55 4b c8 0f b6 43 0e 39 f8 7f a9 8b 53 04 89 d8 <ff> 92 8c 00 00 00 89 d8 e8 0e ff ff ff 8b 45 f0 c7 44 24 04 2f
[11812.302746] EIP: [<f8e38445>] usb_serial_disconnect+0xb5/0x100 [usbserial] SS:ESP 0068:f5141e58
[11812.302746] CR2: 00000000f8e7438c
Fix by only evaluating serial drivers pointing back to the
USB driver we are currently probing. This still allows two
or more drivers to match the same device, running their
serial driver probes to sort out which one to use.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c4707f3f8 upstream.
Currently CDC-ACM devices stay throttled when their TTY is closed while
throttled, stalling further communication attempts after the next open.
Unthrottling during open/activate got lost starting with kernel
3.0.0 and this patch reintroduces it.
Signed-off-by: Otto Meta <otto.patches@sister-shadow.de>
Acked-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c2fb8a3fa2 upstream.
This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers. Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.
After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state. Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.
The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep. Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.
A similar patch has already been applied as commit
151b612847 (USB: EHCI: fix crash during
suspend on ASUS computers). The patch supersedes that one and reverts
it. There are two differences:
The old patch added the flag at the USB level; this patch
adds it at the PCI level.
The old patch applied to all chipsets with the same vendor,
subsystem vendor, and product IDs; this patch makes an
exception for a known-good system (based on DMI information).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Dâniel Fraga <fragabr@gmail.com>
Tested-by: Andrey Rahmatullin <wrar@wrar.name>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c41444ccfa upstream.
Some additional IDs found in the BSD/GPL licensed out-of-tree
GobiSerial driver from Sierra Wireless.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c87663ee upstream.
The __devinitconst section can't be referenced
from usb_serial_device structure. Thus removed it as
it done in other mos* device drivers.
Error itself:
WARNING: drivers/usb/serial/mos7840.o(.data+0x8): Section mismatch in reference
from the variable moschip7840_4port_device to the variable
.devinit.rodata:id_table
The variable moschip7840_4port_device references
the variable __devinitconst id_table
[v2] no attach now
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 622eb783fe upstream.
When system software decides to power down the xHC with the intent of
resuming operation at a later time, it will ask xHC to save the internal
state and restore it when resume to correctly recover from a power event.
Two bits are used to enable this operation: Save State and Restore State.
xHCI spec 4.23.2 says software should "Set the Controller Save/Restore
State flag in the USBCMD register and wait for the Save/Restore State
Status flag in the USBSTS register to transition to '0'". However, it does
not define how long software should wait for the SSS/RSS bit to transition
to 0.
Currently the timeout is set to 1ms. There is bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1002697)
indicates that the timeout is too short for ASMedia ASM1042 host controller
to save/restore the state successfully. Increase the timeout to 10ms helps to
resolve the issue.
This patch should be backported to stable kernels as old as 2.6.37, that
contain the commit 5535b1d5f8 "USB: xHCI:
PCI power management implementation"
Signed-off-by: Andiry Xu <andiry.xu@gmail.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a6dc8c0421 upstream.
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.
Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4273f9878b upstream.
Commit 8b4c6a3ab5 ("USB: option: Use generic USB wwan code")
moved option port-data allocation to usb_wwan_startup but still cast the
port data to the old struct...
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b9c3aab315 upstream.
Fix memory leak introduced by commit 383cedc3bb ("USB: serial:
full autosuspend support for the option driver") which allocates
usb-serial data but never frees it.
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 42ca7da1c2 upstream.
Later firmwares for this device now have proper subclass and
protocol info so we can identify it nicely without needing to use
the blacklist. I'm not removing the old 0xff matching as there
may be devices in the field that still need that.
Signed-off-by: Andrew Bird <ajb@spheresystems.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3b02ae586 upstream.
If the call to svc_process_common() fails, then the request
needs to be freed before we can exit bc_svc_process.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5e62625420 upstream.
Xen PV kernels allow access to the APERF/MPERF registers to read the
effective frequency. Access to the MSRs is however redirected to the
currently scheduled physical CPU, making consecutive read and
compares unreliable. In addition each rdmsr traps into the hypervisor.
So to avoid bogus readouts and expensive traps, disable the kernel
internal feature flag for APERF/MPERF if running under Xen.
This will
a) remove the aperfmperf flag from /proc/cpuinfo
b) not mislead the power scheduler (arch/x86/kernel/cpu/sched.c) to
use the feature to improve scheduling (by default disabled)
c) not mislead the cpufreq driver to use the MSRs
This does not cover userland programs which access the MSRs via the
device file interface, but this will be addressed separately.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 350ab15bb2 upstream.
The statically defined I/O memory regions for the i.MX21 on chip
peripherals and the on board I/O peripherals of the i.MX21ADS board
overlap. This results in a kernel crash during startup. This is fixed
by reducing the memory range for the on board I/O peripherals to the
actually required range.
Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22 11:34:12 -07:00
923 changed files with 10660 additions and 5083 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.