Compare commits

..

25 Commits

Author SHA1 Message Date
dd12c7c4cb Linux 3.4.81 2014-02-20 10:46:04 -08:00
478b97d4d7 nfs: tear down caches in nfs_init_writepagecache when allocation fails
commit 3dd4765fce upstream.

...and ensure that we tear down the nfs_commit_data cache too when
unloading the module.

Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[bwh: Backported to 3.2: drop the nfs_cdata_cachep cleanup; it doesn't exist]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
26fead641f lib/vsprintf.c: kptr_restrict: fix pK-error in SysRq show-all-timers(Q)
commit 3715c5309f upstream.

When using ALT+SysRq+Q all the pointers are replaced with "pK-error" like
this:

	[23153.208033]   .base:               pK-error

with echo h > /proc/sysrq-trigger it works:

	[23107.776363]   .base:       ffff88023e60d540

The intent behind this behavior was to return "pK-error" in cases where
the %pK format specifier was used in interrupt context, because the
CAP_SYSLOG check wouldn't be meaningful.  Clearly this should only apply
when kptr_restrict is actually enabled though.

Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
374d3a4168 virtio-blk: Use block layer provided spinlock
commit 2c95a32909 upstream.

Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
ba37c708e4 Input: synaptics - handle out of bounds values from the hardware
commit c0394506e6 upstream.

The touchpad on the Acer Aspire One D250 will report out of range values
in the extreme lower portion of the touchpad. These appear as abrupt
changes in the values reported by the hardware from very low values to
very high values, which can cause unexpected vertical jumps in the
position of the mouse pointer.

What seems to be happening is that the value is wrapping to a two's
compliment negative value of higher resolution than the 13-bit value
reported by the hardware, with the high-order bits being truncated. This
patch adds handling for these values by converting them to the
appropriate negative values.

The only tricky part about this is deciding when to treat a number as
negative. It stands to reason that if out of range values can be
reported on the low end then it could also happen on the high end, so
not all out of range values should be treated as negative. The approach
taken here is to split the difference between the maximum legitimate
value for the axis and the maximum possible value that the hardware can
report, treating values greater than this number as negative and all
other values as positive. This can be tweaked later if hardware is found
that operates outside of these parameters.

BugLink: http://bugs.launchpad.net/bugs/1001251
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
b249f99c02 PM / Hibernate: Hibernate/thaw fixes/improvements
commit 5a21d489fd upstream.

 1. Do not allocate memory for buffers from emergency pools, unless
    absolutely required. Do not warn about and do not retry non-essential
    failed allocations.

 2. Do not check the amount of free pages left on every single page
    write, but wait until one map is completely populated and then check.

 3. Set maximum number of pages for read buffering consistently, instead
    of inadvertently depending on the size of the sector type.

 4. Fix copyright line, which I missed when I submitted the hibernation
    threading patch.

 5. Dispense with bit shifting arithmetic to improve readability.

 6. Really recalculate the number of pages required to be free after all
    allocations have been done.

 7. Fix calculation of pages required for read buffering. Only count in
    pages that do not belong to high memory.

Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
ec90b61137 KVM: Fix buffer overflow in kvm_set_irq()
commit f2ebd422f7 upstream.

kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI.  However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.

Fix by ensuring that an MSI entry is added to an empty list.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
230a0c3b58 target/file: Re-enable optional fd_buffered_io=1 operation
commit b32f4c7ed8 upstream.

This patch re-adds the ability to optionally run in buffered FILEIO mode
(eg: w/o O_DSYNC) for device backends in order to once again use the
Linux buffered cache as a write-back storage mechanism.

This logic was originally dropped with mainline v3.5-rc commit:

commit a4dff3043c
Author: Nicholas Bellinger <nab@linux-iscsi.org>
Date:   Wed May 30 16:25:41 2012 -0700

    target/file: Use O_DSYNC by default for FILEIO backends

This difference with this patch is that fd_create_virtdevice() now
forces the explicit setting of emulate_write_cache=1 when buffered FILEIO
operation has been enabled.

(v2: Switch to FDBD_HAS_BUFFERED_IO_WCE + add more detailed
     comment as requested by hch)

Reported-by: Ferry <iscsitmp@bananateam.nl>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
45a0374fd5 target/file: Use O_DSYNC by default for FILEIO backends
commit a4dff3043c upstream.

Convert to use O_DSYNC for all cases at FILEIO backend creation time to
avoid the extra syncing of pure timestamp updates with legacy O_SYNC during
default operation as recommended by hch.  Continue to do this independently of
Write Cache Enable (WCE) bit, as WCE=0 is currently the default for all backend
devices and enabled by user on per device basis via attrib/emulate_write_cache.

This patch drops the now unnecessary fd_buffered_io= token usage that was
originally signalling when to explictly disable O_SYNC at backend creation
time for buffered I/O operation.  This can end up being dangerous for a number
of reasons during physical node failure, so go ahead and drop this option
for now when O_DSYNC is used as the default.

Also allow explict FUA WRITEs -> vfs_fsync_range() call to function in
fd_execute_cmd() independently of WCE bit setting.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[bwh: Backported to 3.2:
 - We have fd_do_task() and not fd_execute_cmd()
 - Various fields are in struct se_task rather than struct se_cmd
 - fd_create_virtdevice() flags initialisation hasn't been cleaned up]
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
f7441069a0 IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast()
commit 603e772992 upstream.

qib_user_sdma_queue_pkts() gets called with mmap_sem held for
writing. Except for get_user_pages() deep down in
qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
more interestingly the function qib_user_sdma_queue_pkts() (and also
qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
which can hit a page fault and we deadlock on trying to get mmap_sem
when handling that fault.

So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
leave mmap_sem locking for mm.

This deadlock has actually been observed in the wild when the node
is under memory pressure.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Roland Dreier <roland@purestorage.com>
[Backported to 3.4: (Thank to Ben Hutchings)
 - Adjust context
 - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:33 -08:00
f5a4c4b79e sched/nohz: Fix rq->cpu_load calculations some more
commit 5aaa0b7a2e upstream.

Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
e2d51f27e3 sched/nohz: Fix rq->cpu_load[] calculations
commit 556061b00c upstream.

While investigating why the load-balancer did funny I found that the
rq->cpu_load[] tables were completely screwy.. a bit more digging
revealed that the updates that got through were missing ticks followed
by a catchup of 2 ticks.

The catchup assumes the cpu was idle during that time (since only nohz
can cause missed ticks and the machine is idle etc..) this means that
esp. the higher indices were significantly lower than they ought to
be.

The reason for this is that its not correct to compare against jiffies
on every jiffy on any other cpu than the cpu that updates jiffies.

This patch cludges around it by only doing the catch-up stuff from
nohz_idle_balance() and doing the regular stuff unconditionally from
the tick.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-tp4kj18xdd5aj4vvj0qg55s2@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
1c2bd0db11 ftrace: Have function graph only trace based on global_ops filters
commit 23a8e8441a upstream.

Doing some different tests, I discovered that function graph tracing, when
filtered via the set_ftrace_filter and set_ftrace_notrace files, does
not always keep with them if another function ftrace_ops is registered
to trace functions.

The reason is that function graph just happens to trace all functions
that the function tracer enables. When there was only one user of
function tracing, the function graph tracer did not need to worry about
being called by functions that it did not want to trace. But now that there
are other users, this becomes a problem.

For example, one just needs to do the following:

 # cd /sys/kernel/debug/tracing
 # echo schedule > set_ftrace_filter
 # echo function_graph > current_tracer
 # cat trace
[..]
 0)               |  schedule() {
 ------------------------------------------
 0)    <idle>-0    =>   rcu_pre-7
 ------------------------------------------

 0) ! 2980.314 us |  }
 0)               |  schedule() {
 ------------------------------------------
 0)   rcu_pre-7    =>    <idle>-0
 ------------------------------------------

 0) + 20.701 us   |  }

 # echo 1 > /proc/sys/kernel/stack_tracer_enabled
 # cat trace
[..]
 1) + 20.825 us   |      }
 1) + 21.651 us   |    }
 1) + 30.924 us   |  } /* SyS_ioctl */
 1)               |  do_page_fault() {
 1)               |    __do_page_fault() {
 1)   0.274 us    |      down_read_trylock();
 1)   0.098 us    |      find_vma();
 1)               |      handle_mm_fault() {
 1)               |        _raw_spin_lock() {
 1)   0.102 us    |          preempt_count_add();
 1)   0.097 us    |          do_raw_spin_lock();
 1)   2.173 us    |        }
 1)               |        do_wp_page() {
 1)   0.079 us    |          vm_normal_page();
 1)   0.086 us    |          reuse_swap_page();
 1)   0.076 us    |          page_move_anon_rmap();
 1)               |          unlock_page() {
 1)   0.082 us    |            page_waitqueue();
 1)   0.086 us    |            __wake_up_bit();
 1)   1.801 us    |          }
 1)   0.075 us    |          ptep_set_access_flags();
 1)               |          _raw_spin_unlock() {
 1)   0.098 us    |            do_raw_spin_unlock();
 1)   0.105 us    |            preempt_count_sub();
 1)   1.884 us    |          }
 1)   9.149 us    |        }
 1) + 13.083 us   |      }
 1)   0.146 us    |      up_read();

When the stack tracer was enabled, it enabled all functions to be traced, which
now the function graph tracer also traces. This is a side effect that should
not occur.

To fix this a test is added when the function tracing is changed, as well as when
the graph tracer is enabled, to see if anything other than the ftrace global_ops
function tracer is enabled. If so, then the graph tracer calls a test trampoline
that will look at the function that is being traced and compare it with the
filters defined by the global_ops.

As an optimization, if there's no other function tracers registered, or if
the only registered function tracers also use the global ops, the function
graph infrastructure will call the registered function graph callback directly
and not go through the test trampoline.

Fixes: d2d45c7a03 "tracing: Have stack_tracer use a separate list of functions"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
2955866584 ftrace: Fix synchronization location disabling and freeing ftrace_ops
commit a4c35ed241 upstream.

The synchronization needed after ftrace_ops are unregistered must happen
after the callback is disabled from becing called by functions.

The current location happens after the function is being removed from the
internal lists, but not after the function callbacks were disabled, leaving
the functions susceptible of being called after their callbacks are freed.

This affects perf and any externel users of function tracing (LTTng and
SystemTap).

Fixes: cdbe61bfe7 "ftrace: Allow dynamically allocated function tracers"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
95bcd16ee7 ftrace: Synchronize setting function_trace_op with ftrace_trace_function
commit 405e1d8348 upstream.

[ Partial commit backported to 3.4. The ftrace_sync() code by this is
  required for other fixes that 3.4 needs. ]

ftrace_trace_function is a variable that holds what function will be called
directly by the assembly code (mcount). If just a single function is
registered and it handles recursion itself, then the assembly will call that
function directly without any helper function. It also passes in the
ftrace_op that was registered with the callback. The ftrace_op to send is
stored in the function_trace_op variable.

The ftrace_trace_function and function_trace_op needs to be coordinated such
that the called callback wont be called with the wrong ftrace_op, otherwise
bad things can happen if it expected a different op. Luckily, there's no
callback that doesn't use the helper functions that requires this. But
there soon will be and this needs to be fixed.

Use a set_function_trace_op to store the ftrace_op to set the
function_trace_op to when it is safe to do so (during the update function
within the breakpoint or stop machine calls). Or if dynamic ftrace is not
being used (static tracing) then we have to do a bit more synchronization
when the ftrace_trace_function is set as that takes affect immediately
(as oppose to dynamic ftrace doing it with the modification of the trampoline).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
a7333f3d23 dm sysfs: fix a module unload race
commit 2995fa78e4 upstream.

This reverts commit be35f48610 ("dm: wait until embedded kobject is
released before destroying a device") and provides an improved fix.

The kobject release code that calls the completion must be placed in a
non-module file, otherwise there is a module unload race (if the process
calling dm_kobject_release is preempted and the DM module unloaded after
the completion is triggered, but before dm_kobject_release returns).

To fix this race, this patch moves the completion code to dm-builtin.c
which is always compiled directly into the kernel if BLK_DEV_DM is
selected.

The patch introduces a new dm_kobject_holder structure, its purpose is
to keep the completion and kobject in one place, so that it can be
accessed from non-module code without the need to export the layout of
struct mapped_device to that code.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
3fea8b0a9f mm: setup pageblock_order before it's used by sparsemem
commit ca57df79d4 upstream.

On architectures with CONFIG_HUGETLB_PAGE_SIZE_VARIABLE set, such as
Itanium, pageblock_order is a variable with default value of 0.  It's set
to the right value by set_pageblock_order() in function
free_area_init_core().

But pageblock_order may be used by sparse_init() before free_area_init_core()
is called along path:
sparse_init()
    ->sparse_early_usemaps_alloc_node()
	->usemap_size()
	    ->SECTION_BLOCKFLAGS_BITS
		->((1UL << (PFN_SECTION_SHIFT - pageblock_order)) *
NR_PAGEBLOCK_BITS)

The uninitialized pageblock_size will cause memory wasting because
usemap_size() returns a much bigger value then it's really needed.

For example, on an Itanium platform,
sparse_init() pageblock_order=0 usemap_size=24576
free_area_init_core() before pageblock_order=0, usemap_size=24576
free_area_init_core() after pageblock_order=12, usemap_size=8

That means 24K memory has been wasted for each section, so fix it by calling
set_pageblock_order() from sparse_init().

Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
237597d8f7 mm/page_alloc.c: remove pageblock_default_order()
commit 955c1cd740 upstream.

This has always been broken: one version takes an unsigned int and the
other version takes no arguments.  This bug was hidden because one
version of set_pageblock_order() was a macro which doesn't evaluate its
argument.

Simplify it all and remove pageblock_default_order() altogether.

Reported-by: rajman mekaco <rajman.mekaco@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
dbdd2eb440 drm/i915: kick any firmware framebuffers before claiming the gtt
commit 9f846a16d2 upstream.

Especially vesafb likes to map everything as uc- (yikes), and if that
mapping hangs around still while we try to map the gtt as wc the
kernel will downgrade our request to uc-, resulting in abyssal
performance.

Unfortunately we can't do this as early as readon does (i.e. as the
first thing we do when initializing the hw) because our fb/mmio space
region moves around on a per-gen basis. So I've had to move it below
the gtt initialization, but that seems to work, too. The important
thing is that we do this before we set up the gtt wc mapping.

Now an altogether different question is why people compile their
kernels with vesafb enabled, but I guess making things just work isn't
bad per se ...

v2:
- s/radeondrmfb/inteldrmfb/
- fix up error handling

v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky.

v4: Jani Nikula complained about the pointless bool primary
initialization.

v5: Don't oops if we can't allocate, noticed by Chris Wilson.

v6: Resolve conflicts with agp rework and fixup whitespace.

This is commit e188719a28 in drm-next.

Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub
using vesa on fedora their initrd seems to load vesafb before loading
the real kms driver. So tons more people actually experience a
dead-slow gpu. Hence also the Cc: stable.

Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
4e0bc3f3cd ext4: protect group inode free counting with group lock
commit 6f2e9f0e7d upstream.

Now when we set the group inode free count, we don't have a proper
group lock so that multiple threads may decrease the inode free
count at the same time. And e2fsck will complain something like:

Free inodes count wrong for group #1 (1, counted=0).
Fix? no

Free inodes count wrong for group #2 (3, counted=0).
Fix? no

Directories count wrong for group #2 (780, counted=779).
Fix? no

Free inodes count wrong for group #3 (2272, counted=2273).
Fix? no

So this patch try to protect it with the ext4_lock_group.

btw, it is found by xfstests test case 269 and the volume is
mkfsed with the parameter
"-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr"
and I have run it 100 times and the error in e2fsck doesn't
show up again.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
5e23efd0c1 printk: Fix scheduling-while-atomic problem in console_cpu_notify()
commit 85eae82a08 upstream.

The console_cpu_notify() function runs with interrupts disabled in the
CPU_DYING case.  It therefore cannot block, for example, as will happen
when it calls console_lock().  Therefore, remove the CPU_DYING leg of
the switch statement to avoid this problem.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guillaume Morin <guillaume@morinfr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
36f0c45db5 x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
commit 6583327c4d upstream.

Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.

The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions.  It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.

This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.

Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
d89985cb8c mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
commit 227d53b397 upstream.

To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.

aio_migratepage  [disable interrupt]
  migrate_page_copy
    clear_page_dirty_for_io
      set_page_dirty
        __set_page_dirty_buffers
          __set_page_dirty
            spin_lock_irq

This mean, current aio migration is a deadlockable.  spin_lock_irqsave
is a safer alternative and we should use it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
4d4bed8141 mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
commit a85d9df1ea upstream.

During aio stress test, we observed the following lockdep warning.  This
mean AIO+numa_balancing is currently deadlockable.

The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.

Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&(&ctx->completion_lock)->rlock);
     <Interrupt>
       lock(&(&ctx->completion_lock)->rlock);

    *** DEADLOCK ***

      dump_stack+0x19/0x1b
      print_usage_bug+0x1f7/0x208
      mark_lock+0x21d/0x2a0
      mark_held_locks+0xb9/0x140
      trace_hardirqs_on_caller+0x105/0x1d0
      trace_hardirqs_on+0xd/0x10
      _raw_spin_unlock_irq+0x2c/0x50
      __set_page_dirty_nobuffers+0x8c/0xf0
      migrate_page_copy+0x434/0x540
      aio_migratepage+0xb1/0x140
      move_to_new_page+0x7d/0x230
      migrate_pages+0x5e5/0x700
      migrate_misplaced_page+0xbc/0xf0
      do_numa_page+0x102/0x190
      handle_pte_fault+0x241/0x970
      handle_mm_fault+0x265/0x370
      __do_page_fault+0x172/0x5a0
      do_page_fault+0x1a/0x70
      page_fault+0x28/0x30

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
a0f916d429 SELinux: Fix kernel BUG on empty security contexts.
commit 2172fa709a upstream.

Setting an empty security context (length=0) on a file will
lead to incorrectly dereferencing the type and other fields
of the security context structure, yielding a kernel BUG.
As a zero-length security context is never valid, just reject
all such security contexts whether coming from userspace
via setxattr or coming from the filesystem upon a getxattr
request by SELinux.

Setting a security context value (empty or otherwise) unknown to
SELinux in the first place is only possible for a root process
(CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
if the corresponding SELinux mac_admin permission is also granted
to the domain by policy.  In Fedora policies, this is only allowed for
specific domains such as livecd for setting down security contexts
that are not defined in the build host policy.

Reproducer:
su
setenforce 0
touch foo
setfattr -n security.selinux foo

Caveat:
Relabeling or removing foo after doing the above may not be possible
without booting with SELinux disabled.  Any subsequent access to foo
after doing the above will also trigger the BUG.

BUG output from Matthew Thode:
[  473.893141] ------------[ cut here ]------------
[  473.962110] kernel BUG at security/selinux/ss/services.c:654!
[  473.995314] invalid opcode: 0000 [#6] SMP
[  474.027196] Modules linked in:
[  474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G      D   I
3.13.0-grsec #1
[  474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
07/29/10
[  474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
ffff8805f50cd488
[  474.183707] RIP: 0010:[<ffffffff814681c7>]  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  474.219954] RSP: 0018:ffff8805c0ac3c38  EFLAGS: 00010246
[  474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
0000000000000100
[  474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
ffff8805e8aaa000
[  474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
0000000000000006
[  474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
0000000000000006
[  474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
0000000000000000
[  474.453816] FS:  00007f2e75220800(0000) GS:ffff88061fc00000(0000)
knlGS:0000000000000000
[  474.489254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
00000000000207f0
[  474.556058] Stack:
[  474.584325]  ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
ffff8805f1190a40
[  474.618913]  ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
ffff8805e8aac860
[  474.653955]  ffff8805c0ac3cb8 000700068113833a ffff880606c75060
ffff8805c0ac3d94
[  474.690461] Call Trace:
[  474.723779]  [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a
[  474.778049]  [<ffffffff81468824>] security_compute_av+0xf4/0x20b
[  474.811398]  [<ffffffff8196f419>] avc_compute_av+0x2a/0x179
[  474.843813]  [<ffffffff8145727b>] avc_has_perm+0x45/0xf4
[  474.875694]  [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31
[  474.907370]  [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e
[  474.938726]  [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22
[  474.970036]  [<ffffffff811b057d>] vfs_getattr+0x19/0x2d
[  475.000618]  [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91
[  475.030402]  [<ffffffff811b063b>] vfs_lstat+0x19/0x1b
[  475.061097]  [<ffffffff811b077e>] SyS_newlstat+0x15/0x30
[  475.094595]  [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3
[  475.148405]  [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b
[  475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
[  475.255884] RIP  [<ffffffff814681c7>]
context_struct_compute_av+0xce/0x308
[  475.296120]  RSP <ffff8805c0ac3c38>
[  475.328734] ---[ end trace f076482e9d754adc ]---

Reported-by:  Matthew Thode <mthode@mthode.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-20 10:45:32 -08:00
32 changed files with 425 additions and 177 deletions

View File

@ -1,6 +1,6 @@
VERSION = 3
PATCHLEVEL = 4
SUBLEVEL = 80
SUBLEVEL = 81
EXTRAVERSION =
NAME = Saber-toothed Squirrel

View File

@ -21,8 +21,6 @@ struct workqueue_struct *virtblk_wq;
struct virtio_blk
{
spinlock_t lock;
struct virtio_device *vdev;
struct virtqueue *vq;
@ -69,7 +67,7 @@ static void blk_done(struct virtqueue *vq)
unsigned int len;
unsigned long flags;
spin_lock_irqsave(&vblk->lock, flags);
spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) {
int error;
@ -104,7 +102,7 @@ static void blk_done(struct virtqueue *vq)
}
/* In case queue is stopped waiting for more buffers. */
blk_start_queue(vblk->disk->queue);
spin_unlock_irqrestore(&vblk->lock, flags);
spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
}
static bool do_req(struct request_queue *q, struct virtio_blk *vblk,
@ -438,7 +436,6 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
}
INIT_LIST_HEAD(&vblk->reqs);
spin_lock_init(&vblk->lock);
vblk->vdev = vdev;
vblk->sg_elems = sg_elems;
sg_init_table(vblk->sg, vblk->sg_elems);
@ -463,7 +460,7 @@ static int __devinit virtblk_probe(struct virtio_device *vdev)
goto out_mempool;
}
q = vblk->disk->queue = blk_init_queue(do_virtblk_request, &vblk->lock);
q = vblk->disk->queue = blk_init_queue(do_virtblk_request, NULL);
if (!q) {
err = -ENOMEM;
goto out_put_disk;

View File

@ -1934,6 +1934,27 @@ ips_ping_for_i915_load(void)
}
}
static void i915_kick_out_firmware_fb(struct drm_i915_private *dev_priv)
{
struct apertures_struct *ap;
struct pci_dev *pdev = dev_priv->dev->pdev;
bool primary;
ap = alloc_apertures(1);
if (!ap)
return;
ap->ranges[0].base = dev_priv->dev->agp->base;
ap->ranges[0].size =
dev_priv->mm.gtt->gtt_mappable_entries << PAGE_SHIFT;
primary =
pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW;
remove_conflicting_framebuffers(ap, "inteldrmfb", primary);
kfree(ap);
}
/**
* i915_driver_load - setup chip and create an initial config
* @dev: DRM device
@ -1971,6 +1992,15 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
goto free_priv;
}
dev_priv->mm.gtt = intel_gtt_get();
if (!dev_priv->mm.gtt) {
DRM_ERROR("Failed to initialize GTT\n");
ret = -ENODEV;
goto put_bridge;
}
i915_kick_out_firmware_fb(dev_priv);
pci_set_master(dev->pdev);
/* overlay on gen2 is broken and can't address above 1G */
@ -1996,13 +2026,6 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
goto put_bridge;
}
dev_priv->mm.gtt = intel_gtt_get();
if (!dev_priv->mm.gtt) {
DRM_ERROR("Failed to initialize GTT\n");
ret = -ENODEV;
goto out_rmmap;
}
agp_size = dev_priv->mm.gtt->gtt_mappable_entries << PAGE_SHIFT;
dev_priv->mm.gtt_mapping =

View File

@ -284,8 +284,7 @@ static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
int j;
int ret;
ret = get_user_pages(current, current->mm, addr,
npages, 0, 1, pages, NULL);
ret = get_user_pages_fast(addr, npages, 0, pages);
if (ret != npages) {
int i;
@ -830,10 +829,7 @@ int qib_user_sdma_writev(struct qib_ctxtdata *rcd,
while (dim) {
const int mxp = 8;
down_write(&current->mm->mmap_sem);
ret = qib_user_sdma_queue_pkts(dd, pq, &list, iov, dim, mxp);
up_write(&current->mm->mmap_sem);
if (ret <= 0)
goto done_unlock;
else {

View File

@ -40,11 +40,28 @@
* Note that newer firmware allows querying device for maximum useable
* coordinates.
*/
#define XMIN 0
#define XMAX 6143
#define YMIN 0
#define YMAX 6143
#define XMIN_NOMINAL 1472
#define XMAX_NOMINAL 5472
#define YMIN_NOMINAL 1408
#define YMAX_NOMINAL 4448
/* Size in bits of absolute position values reported by the hardware */
#define ABS_POS_BITS 13
/*
* Any position values from the hardware above the following limits are
* treated as "wrapped around negative" values that have been truncated to
* the 13-bit reporting range of the hardware. These are just reasonable
* guesses and can be adjusted if hardware is found that operates outside
* of these parameters.
*/
#define X_MAX_POSITIVE (((1 << ABS_POS_BITS) + XMAX) / 2)
#define Y_MAX_POSITIVE (((1 << ABS_POS_BITS) + YMAX) / 2)
/*
* Synaptics touchpads report the y coordinate from bottom to top, which is
* opposite from what userspace expects.
@ -555,6 +572,12 @@ static int synaptics_parse_hw_state(const unsigned char buf[],
hw->right = (buf[0] & 0x02) ? 1 : 0;
}
/* Convert wrap-around values to negative */
if (hw->x > X_MAX_POSITIVE)
hw->x -= 1 << ABS_POS_BITS;
if (hw->y > Y_MAX_POSITIVE)
hw->y -= 1 << ABS_POS_BITS;
return 0;
}

View File

@ -185,8 +185,12 @@ config MD_FAULTY
In unsure, say N.
config BLK_DEV_DM_BUILTIN
boolean
config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
---help---
Device-mapper is a low level volume manager. It works by allowing
people to specify mappings for ranges of logical sectors. Various

View File

@ -28,6 +28,7 @@ obj-$(CONFIG_MD_MULTIPATH) += multipath.o
obj-$(CONFIG_MD_FAULTY) += faulty.o
obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
obj-$(CONFIG_DM_DELAY) += dm-delay.o

50
drivers/md/dm-builtin.c Normal file
View File

@ -0,0 +1,50 @@
#include "dm.h"
#include <linux/export.h>
/*
* The kobject release method must not be placed in the module itself,
* otherwise we are subject to module unload races.
*
* The release method is called when the last reference to the kobject is
* dropped. It may be called by any other kernel code that drops the last
* reference.
*
* The release method suffers from module unload race. We may prevent the
* module from being unloaded at the start of the release method (using
* increased module reference count or synchronizing against the release
* method), however there is no way to prevent the module from being
* unloaded at the end of the release method.
*
* If this code were placed in the dm module, the following race may
* happen:
* 1. Some other process takes a reference to dm kobject
* 2. The user issues ioctl function to unload the dm device
* 3. dm_sysfs_exit calls kobject_put, however the object is not released
* because of the other reference taken at step 1
* 4. dm_sysfs_exit waits on the completion
* 5. The other process that took the reference in step 1 drops it,
* dm_kobject_release is called from this process
* 6. dm_kobject_release calls complete()
* 7. a reschedule happens before dm_kobject_release returns
* 8. dm_sysfs_exit continues, the dm device is unloaded, module reference
* count is decremented
* 9. The user unloads the dm module
* 10. The other process that was rescheduled in step 7 continues to run,
* it is now executing code in unloaded module, so it crashes
*
* Note that if the process that takes the foreign reference to dm kobject
* has a low priority and the system is sufficiently loaded with
* higher-priority processes that prevent the low-priority process from
* being scheduled long enough, this bug may really happen.
*
* In order to fix this module unload race, we place the release method
* into a helper code that is compiled directly into the kernel.
*/
void dm_kobject_release(struct kobject *kobj)
{
complete(dm_get_completion_from_kobject(kobj));
}
EXPORT_SYMBOL(dm_kobject_release);

View File

@ -79,11 +79,6 @@ static const struct sysfs_ops dm_sysfs_ops = {
.show = dm_attr_show,
};
static void dm_kobject_release(struct kobject *kobj)
{
complete(dm_get_completion_from_kobject(kobj));
}
/*
* dm kobject is embedded in mapped_device structure
* no need to define release function here

View File

@ -191,11 +191,8 @@ struct mapped_device {
/* forced geometry settings */
struct hd_geometry geometry;
/* sysfs handle */
struct kobject kobj;
/* wait until the kobject is released */
struct completion kobj_completion;
/* kobject and completion */
struct dm_kobject_holder kobj_holder;
/* zero-length flush that will be cloned and submitted to targets */
struct bio flush_bio;
@ -1894,7 +1891,7 @@ static struct mapped_device *alloc_dev(int minor)
init_waitqueue_head(&md->wait);
INIT_WORK(&md->work, dm_wq_work);
init_waitqueue_head(&md->eventq);
init_completion(&md->kobj_completion);
init_completion(&md->kobj_holder.completion);
md->disk->major = _major;
md->disk->first_minor = minor;
@ -2686,20 +2683,14 @@ struct gendisk *dm_disk(struct mapped_device *md)
struct kobject *dm_kobject(struct mapped_device *md)
{
return &md->kobj;
return &md->kobj_holder.kobj;
}
/*
* struct mapped_device should not be exported outside of dm.c
* so use this check to verify that kobj is part of md structure
*/
struct mapped_device *dm_get_from_kobject(struct kobject *kobj)
{
struct mapped_device *md;
md = container_of(kobj, struct mapped_device, kobj);
if (&md->kobj != kobj)
return NULL;
md = container_of(kobj, struct mapped_device, kobj_holder.kobj);
if (test_bit(DMF_FREEING, &md->flags) ||
dm_deleting_md(md))
@ -2709,13 +2700,6 @@ struct mapped_device *dm_get_from_kobject(struct kobject *kobj)
return md;
}
struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
{
struct mapped_device *md = container_of(kobj, struct mapped_device, kobj);
return &md->kobj_completion;
}
int dm_suspended_md(struct mapped_device *md)
{
return test_bit(DMF_SUSPENDED, &md->flags);

View File

@ -16,6 +16,7 @@
#include <linux/blkdev.h>
#include <linux/hdreg.h>
#include <linux/completion.h>
#include <linux/kobject.h>
/*
* Suspend feature flags
@ -120,11 +121,25 @@ void dm_interface_exit(void);
/*
* sysfs interface
*/
struct dm_kobject_holder {
struct kobject kobj;
struct completion completion;
};
static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
{
return &container_of(kobj, struct dm_kobject_holder, kobj)->completion;
}
int dm_sysfs_init(struct mapped_device *md);
void dm_sysfs_exit(struct mapped_device *md);
struct kobject *dm_kobject(struct mapped_device *md);
struct mapped_device *dm_get_from_kobject(struct kobject *kobj);
struct completion *dm_get_completion_from_kobject(struct kobject *kobj);
/*
* The kobject helper
*/
void dm_kobject_release(struct kobject *kobj);
/*
* Targets for linear and striped mappings

View File

@ -133,21 +133,24 @@ static struct se_device *fd_create_virtdevice(
ret = PTR_ERR(dev_p);
goto fail;
}
#if 0
if (di->no_create_file)
flags = O_RDWR | O_LARGEFILE;
else
flags = O_RDWR | O_CREAT | O_LARGEFILE;
#else
flags = O_RDWR | O_CREAT | O_LARGEFILE;
#endif
/* flags |= O_DIRECT; */
/*
* If fd_buffered_io=1 has not been set explicitly (the default),
* use O_SYNC to force FILEIO writes to disk.
* Use O_DSYNC by default instead of O_SYNC to forgo syncing
* of pure timestamp updates.
*/
if (!(fd_dev->fbd_flags & FDBD_USE_BUFFERED_IO))
flags |= O_SYNC;
flags = O_RDWR | O_CREAT | O_LARGEFILE | O_DSYNC;
/*
* Optionally allow fd_buffered_io=1 to be enabled for people
* who want use the fs buffer cache as an WriteCache mechanism.
*
* This means that in event of a hard failure, there is a risk
* of silent data-loss if the SCSI client has *not* performed a
* forced unit access (FUA) write, or issued SYNCHRONIZE_CACHE
* to write-out the entire device cache.
*/
if (fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) {
pr_debug("FILEIO: Disabling O_DSYNC, using buffered FILEIO\n");
flags &= ~O_DSYNC;
}
file = filp_open(dev_p, flags, 0600);
if (IS_ERR(file)) {
@ -215,6 +218,12 @@ static struct se_device *fd_create_virtdevice(
if (!dev)
goto fail;
if (fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) {
pr_debug("FILEIO: Forcing setting of emulate_write_cache=1"
" with FDBD_HAS_BUFFERED_IO_WCE\n");
dev->se_sub_dev->se_dev_attrib.emulate_write_cache = 1;
}
fd_dev->fd_dev_id = fd_host->fd_host_dev_id_count++;
fd_dev->fd_queue_depth = dev->queue_depth;
@ -399,26 +408,6 @@ static void fd_emulate_sync_cache(struct se_task *task)
transport_complete_sync_cache(cmd, ret == 0);
}
/*
* WRITE Force Unit Access (FUA) emulation on a per struct se_task
* LBA range basis..
*/
static void fd_emulate_write_fua(struct se_cmd *cmd, struct se_task *task)
{
struct se_device *dev = cmd->se_dev;
struct fd_dev *fd_dev = dev->dev_ptr;
loff_t start = task->task_lba * dev->se_sub_dev->se_dev_attrib.block_size;
loff_t end = start + task->task_size;
int ret;
pr_debug("FILEIO: FUA WRITE LBA: %llu, bytes: %u\n",
task->task_lba, task->task_size);
ret = vfs_fsync_range(fd_dev->fd_file, start, end, 1);
if (ret != 0)
pr_err("FILEIO: vfs_fsync_range() failed: %d\n", ret);
}
static int fd_do_task(struct se_task *task)
{
struct se_cmd *cmd = task->task_se_cmd;
@ -433,19 +422,21 @@ static int fd_do_task(struct se_task *task)
ret = fd_do_readv(task);
} else {
ret = fd_do_writev(task);
/*
* Perform implict vfs_fsync_range() for fd_do_writev() ops
* for SCSI WRITEs with Forced Unit Access (FUA) set.
* Allow this to happen independent of WCE=0 setting.
*/
if (ret > 0 &&
dev->se_sub_dev->se_dev_attrib.emulate_write_cache > 0 &&
dev->se_sub_dev->se_dev_attrib.emulate_fua_write > 0 &&
(cmd->se_cmd_flags & SCF_FUA)) {
/*
* We might need to be a bit smarter here
* and return some sense data to let the initiator
* know the FUA WRITE cache sync failed..?
*/
fd_emulate_write_fua(cmd, task);
}
struct fd_dev *fd_dev = dev->dev_ptr;
loff_t start = task->task_lba *
dev->se_sub_dev->se_dev_attrib.block_size;
loff_t end = start + task->task_size;
vfs_fsync_range(fd_dev->fd_file, start, end, 1);
}
}
if (ret < 0) {
@ -544,7 +535,7 @@ static ssize_t fd_set_configfs_dev_params(
pr_debug("FILEIO: Using buffered I/O"
" operations for struct fd_dev\n");
fd_dev->fbd_flags |= FDBD_USE_BUFFERED_IO;
fd_dev->fbd_flags |= FDBD_HAS_BUFFERED_IO_WCE;
break;
default:
break;
@ -579,8 +570,8 @@ static ssize_t fd_show_configfs_dev_params(
bl = sprintf(b + bl, "TCM FILEIO ID: %u", fd_dev->fd_dev_id);
bl += sprintf(b + bl, " File: %s Size: %llu Mode: %s\n",
fd_dev->fd_dev_name, fd_dev->fd_dev_size,
(fd_dev->fbd_flags & FDBD_USE_BUFFERED_IO) ?
"Buffered" : "Synchronous");
(fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) ?
"Buffered-WCE" : "O_DSYNC");
return bl;
}

View File

@ -18,7 +18,7 @@ struct fd_request {
#define FBDF_HAS_PATH 0x01
#define FBDF_HAS_SIZE 0x02
#define FDBD_USE_BUFFERED_IO 0x04
#define FDBD_HAS_BUFFERED_IO_WCE 0x04
struct fd_dev {
u32 fbd_flags;

View File

@ -613,14 +613,16 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
static void __set_page_dirty(struct page *page,
struct address_space *mapping, int warn)
{
spin_lock_irq(&mapping->tree_lock);
unsigned long flags;
spin_lock_irqsave(&mapping->tree_lock, flags);
if (page->mapping) { /* Race with truncate? */
WARN_ON_ONCE(warn && !PageUptodate(page));
account_page_dirtied(page, mapping);
radix_tree_tag_set(&mapping->page_tree,
page_index(page), PAGECACHE_TAG_DIRTY);
}
spin_unlock_irq(&mapping->tree_lock);
spin_unlock_irqrestore(&mapping->tree_lock, flags);
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
}

View File

@ -778,6 +778,8 @@ got:
ext4_itable_unused_set(sb, gdp,
(EXT4_INODES_PER_GROUP(sb) - ino));
up_read(&grp->alloc_sem);
} else {
ext4_lock_group(sb, group);
}
ext4_free_inodes_set(sb, gdp, ext4_free_inodes_count(sb, gdp) - 1);
if (S_ISDIR(mode)) {
@ -790,8 +792,8 @@ got:
}
if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
ext4_unlock_group(sb, group);
}
ext4_unlock_group(sb, group);
BUFFER_TRACE(group_desc_bh, "call ext4_handle_dirty_metadata");
err = ext4_handle_dirty_metadata(handle, NULL, group_desc_bh);

View File

@ -1751,12 +1751,12 @@ int __init nfs_init_writepagecache(void)
nfs_wdata_mempool = mempool_create_slab_pool(MIN_POOL_WRITE,
nfs_wdata_cachep);
if (nfs_wdata_mempool == NULL)
return -ENOMEM;
goto out_destroy_write_cache;
nfs_commit_mempool = mempool_create_slab_pool(MIN_POOL_COMMIT,
nfs_wdata_cachep);
if (nfs_commit_mempool == NULL)
return -ENOMEM;
goto out_destroy_write_mempool;
/*
* NFS congestion size, scale with available memory.
@ -1779,6 +1779,12 @@ int __init nfs_init_writepagecache(void)
nfs_congestion_kb = 256*1024;
return 0;
out_destroy_write_mempool:
mempool_destroy(nfs_wdata_mempool);
out_destroy_write_cache:
kmem_cache_destroy(nfs_wdata_cachep);
return -ENOMEM;
}
void nfs_destroy_writepagecache(void)

View File

@ -144,6 +144,7 @@ extern unsigned long this_cpu_load(void);
extern void calc_global_load(unsigned long ticks);
extern void update_cpu_load_nohz(void);
extern unsigned long get_parent_ip(unsigned long addr);

View File

@ -6,7 +6,7 @@
*
* Copyright (C) 1998,2001-2005 Pavel Machek <pavel@ucw.cz>
* Copyright (C) 2006 Rafael J. Wysocki <rjw@sisk.pl>
* Copyright (C) 2010 Bojan Smojver <bojan@rexursive.com>
* Copyright (C) 2010-2012 Bojan Smojver <bojan@rexursive.com>
*
* This file is released under the GPLv2.
*
@ -282,14 +282,17 @@ static int write_page(void *buf, sector_t offset, struct bio **bio_chain)
return -ENOSPC;
if (bio_chain) {
src = (void *)__get_free_page(__GFP_WAIT | __GFP_HIGH);
src = (void *)__get_free_page(__GFP_WAIT | __GFP_NOWARN |
__GFP_NORETRY);
if (src) {
copy_page(src, buf);
} else {
ret = hib_wait_on_bio_chain(bio_chain); /* Free pages */
if (ret)
return ret;
src = (void *)__get_free_page(__GFP_WAIT | __GFP_HIGH);
src = (void *)__get_free_page(__GFP_WAIT |
__GFP_NOWARN |
__GFP_NORETRY);
if (src) {
copy_page(src, buf);
} else {
@ -367,12 +370,17 @@ static int swap_write_page(struct swap_map_handle *handle, void *buf,
clear_page(handle->cur);
handle->cur_swap = offset;
handle->k = 0;
}
if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
error = hib_wait_on_bio_chain(bio_chain);
if (error)
goto out;
handle->reqd_free_pages = reqd_free_pages();
if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
error = hib_wait_on_bio_chain(bio_chain);
if (error)
goto out;
/*
* Recalculate the number of required free pages, to
* make sure we never take more than half.
*/
handle->reqd_free_pages = reqd_free_pages();
}
}
out:
return error;
@ -419,8 +427,9 @@ static int swap_writer_finish(struct swap_map_handle *handle,
/* Maximum number of threads for compression/decompression. */
#define LZO_THREADS 3
/* Maximum number of pages for read buffering. */
#define LZO_READ_PAGES (MAP_PAGE_ENTRIES * 8)
/* Minimum/maximum number of pages for read buffering. */
#define LZO_MIN_RD_PAGES 1024
#define LZO_MAX_RD_PAGES 8192
/**
@ -630,12 +639,6 @@ static int save_image_lzo(struct swap_map_handle *handle,
}
}
/*
* Adjust number of free pages after all allocations have been done.
* We don't want to run out of pages when writing.
*/
handle->reqd_free_pages = reqd_free_pages();
/*
* Start the CRC32 thread.
*/
@ -657,6 +660,12 @@ static int save_image_lzo(struct swap_map_handle *handle,
goto out_clean;
}
/*
* Adjust the number of required free pages after all allocations have
* been done. We don't want to run out of pages when writing.
*/
handle->reqd_free_pages = reqd_free_pages();
printk(KERN_INFO
"PM: Using %u thread(s) for compression.\n"
"PM: Compressing and saving image data (%u pages) ... ",
@ -1067,7 +1076,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
unsigned i, thr, run_threads, nr_threads;
unsigned ring = 0, pg = 0, ring_size = 0,
have = 0, want, need, asked = 0;
unsigned long read_pages;
unsigned long read_pages = 0;
unsigned char **page = NULL;
struct dec_data *data = NULL;
struct crc_data *crc = NULL;
@ -1079,7 +1088,7 @@ static int load_image_lzo(struct swap_map_handle *handle,
nr_threads = num_online_cpus() - 1;
nr_threads = clamp_val(nr_threads, 1, LZO_THREADS);
page = vmalloc(sizeof(*page) * LZO_READ_PAGES);
page = vmalloc(sizeof(*page) * LZO_MAX_RD_PAGES);
if (!page) {
printk(KERN_ERR "PM: Failed to allocate LZO page\n");
ret = -ENOMEM;
@ -1144,15 +1153,22 @@ static int load_image_lzo(struct swap_map_handle *handle,
}
/*
* Adjust number of pages for read buffering, in case we are short.
* Set the number of pages for read buffering.
* This is complete guesswork, because we'll only know the real
* picture once prepare_image() is called, which is much later on
* during the image load phase. We'll assume the worst case and
* say that none of the image pages are from high memory.
*/
read_pages = (nr_free_pages() - snapshot_get_image_size()) >> 1;
read_pages = clamp_val(read_pages, LZO_CMP_PAGES, LZO_READ_PAGES);
if (low_free_pages() > snapshot_get_image_size())
read_pages = (low_free_pages() - snapshot_get_image_size()) / 2;
read_pages = clamp_val(read_pages, LZO_MIN_RD_PAGES, LZO_MAX_RD_PAGES);
for (i = 0; i < read_pages; i++) {
page[i] = (void *)__get_free_page(i < LZO_CMP_PAGES ?
__GFP_WAIT | __GFP_HIGH :
__GFP_WAIT);
__GFP_WAIT | __GFP_NOWARN |
__GFP_NORETRY);
if (!page[i]) {
if (i < LZO_CMP_PAGES) {
ring_size = i;

View File

@ -1172,7 +1172,6 @@ static int __cpuinit console_cpu_notify(struct notifier_block *self,
switch (action) {
case CPU_ONLINE:
case CPU_DEAD:
case CPU_DYING:
case CPU_DOWN_FAILED:
case CPU_UP_CANCELED:
console_lock();

View File

@ -692,8 +692,6 @@ int tg_nop(struct task_group *tg, void *data)
}
#endif
void update_cpu_load(struct rq *this_rq);
static void set_load_weight(struct task_struct *p)
{
int prio = p->static_prio - MAX_RT_PRIO;
@ -2620,22 +2618,13 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
* scheduler tick (TICK_NSEC). With tickless idle this will not be called
* every tick. We fix it up based on jiffies.
*/
void update_cpu_load(struct rq *this_rq)
static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
unsigned long pending_updates)
{
unsigned long this_load = this_rq->load.weight;
unsigned long curr_jiffies = jiffies;
unsigned long pending_updates;
int i, scale;
this_rq->nr_load_updates++;
/* Avoid repeated calls on same jiffy, when moving in and out of idle */
if (curr_jiffies == this_rq->last_load_update_tick)
return;
pending_updates = curr_jiffies - this_rq->last_load_update_tick;
this_rq->last_load_update_tick = curr_jiffies;
/* Update our load: */
this_rq->cpu_load[0] = this_load; /* Fasttrack for idx 0 */
for (i = 1, scale = 2; i < CPU_LOAD_IDX_MAX; i++, scale += scale) {
@ -2660,9 +2649,78 @@ void update_cpu_load(struct rq *this_rq)
sched_avg_update(this_rq);
}
#ifdef CONFIG_NO_HZ
/*
* There is no sane way to deal with nohz on smp when using jiffies because the
* cpu doing the jiffies update might drift wrt the cpu doing the jiffy reading
* causing off-by-one errors in observed deltas; {0,2} instead of {1,1}.
*
* Therefore we cannot use the delta approach from the regular tick since that
* would seriously skew the load calculation. However we'll make do for those
* updates happening while idle (nohz_idle_balance) or coming out of idle
* (tick_nohz_idle_exit).
*
* This means we might still be one tick off for nohz periods.
*/
/*
* Called from nohz_idle_balance() to update the load ratings before doing the
* idle balance.
*/
void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
unsigned long load = this_rq->load.weight;
unsigned long pending_updates;
/*
* bail if there's load or we're actually up-to-date.
*/
if (load || curr_jiffies == this_rq->last_load_update_tick)
return;
pending_updates = curr_jiffies - this_rq->last_load_update_tick;
this_rq->last_load_update_tick = curr_jiffies;
__update_cpu_load(this_rq, load, pending_updates);
}
/*
* Called from tick_nohz_idle_exit() -- try and fix up the ticks we missed.
*/
void update_cpu_load_nohz(void)
{
struct rq *this_rq = this_rq();
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
unsigned long pending_updates;
if (curr_jiffies == this_rq->last_load_update_tick)
return;
raw_spin_lock(&this_rq->lock);
pending_updates = curr_jiffies - this_rq->last_load_update_tick;
if (pending_updates) {
this_rq->last_load_update_tick = curr_jiffies;
/*
* We were idle, this means load 0, the current load might be
* !0 due to remote wakeups and the sort.
*/
__update_cpu_load(this_rq, 0, pending_updates);
}
raw_spin_unlock(&this_rq->lock);
}
#endif /* CONFIG_NO_HZ */
/*
* Called from scheduler_tick()
*/
static void update_cpu_load_active(struct rq *this_rq)
{
update_cpu_load(this_rq);
/*
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
__update_cpu_load(this_rq, this_rq->load.weight, 1);
calc_load_account_active(this_rq);
}

View File

@ -5042,7 +5042,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
raw_spin_lock_irq(&this_rq->lock);
update_rq_clock(this_rq);
update_cpu_load(this_rq);
update_idle_cpu_load(this_rq);
raw_spin_unlock_irq(&this_rq->lock);
rebalance_domains(balance_cpu, CPU_IDLE);

View File

@ -873,7 +873,7 @@ extern void resched_cpu(int cpu);
extern struct rt_bandwidth def_rt_bandwidth;
extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
extern void update_cpu_load(struct rq *this_rq);
extern void update_idle_cpu_load(struct rq *this_rq);
#ifdef CONFIG_CGROUP_CPUACCT
#include <linux/cgroup.h>

View File

@ -582,6 +582,7 @@ void tick_nohz_idle_exit(void)
/* Update jiffies first */
select_nohz_load_balancer(0);
tick_do_update_jiffies64(now);
update_cpu_load_nohz();
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
/*

View File

@ -222,6 +222,29 @@ static void update_global_ops(void)
global_ops.func = func;
}
static void ftrace_sync(struct work_struct *work)
{
/*
* This function is just a stub to implement a hard force
* of synchronize_sched(). This requires synchronizing
* tasks even in userspace and idle.
*
* Yes, function tracing is rude.
*/
}
static void ftrace_sync_ipi(void *data)
{
/* Probably not needed, but do it anyway */
smp_rmb();
}
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
static void update_function_graph_func(void);
#else
static inline void update_function_graph_func(void) { }
#endif
static void update_ftrace_function(void)
{
ftrace_func_t func;
@ -240,6 +263,8 @@ static void update_ftrace_function(void)
else
func = ftrace_ops_list_func;
update_function_graph_func();
#ifdef CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST
ftrace_trace_function = func;
#else
@ -359,16 +384,6 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
} else if (ops->flags & FTRACE_OPS_FL_CONTROL) {
ret = remove_ftrace_list_ops(&ftrace_control_list,
&control_ops, ops);
if (!ret) {
/*
* The ftrace_ops is now removed from the list,
* so there'll be no new users. We must ensure
* all current users are done before we free
* the control data.
*/
synchronize_sched();
control_ops_free(ops);
}
} else
ret = remove_ftrace_ops(&ftrace_ops_list, ops);
@ -378,13 +393,6 @@ static int __unregister_ftrace_function(struct ftrace_ops *ops)
if (ftrace_enabled)
update_ftrace_function();
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
*/
if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
synchronize_sched();
return 0;
}
@ -2008,10 +2016,41 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
command |= FTRACE_UPDATE_TRACE_FUNC;
}
if (!command || !ftrace_enabled)
if (!command || !ftrace_enabled) {
/*
* If these are control ops, they still need their
* per_cpu field freed. Since, function tracing is
* not currently active, we can just free them
* without synchronizing all CPUs.
*/
if (ops->flags & FTRACE_OPS_FL_CONTROL)
control_ops_free(ops);
return 0;
}
ftrace_run_update_code(command);
/*
* Dynamic ops may be freed, we must make sure that all
* callers are done before leaving this function.
* The same goes for freeing the per_cpu data of the control
* ops.
*
* Again, normal synchronize_sched() is not good enough.
* We need to do a hard force of sched synchronization.
* This is because we use preempt_disable() to do RCU, but
* the function tracers can be called where RCU is not watching
* (like before user_exit()). We can not rely on the RCU
* infrastructure to do the synchronization, thus we must do it
* ourselves.
*/
if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_CONTROL)) {
schedule_on_each_cpu(ftrace_sync);
if (ops->flags & FTRACE_OPS_FL_CONTROL)
control_ops_free(ops);
}
return 0;
}
@ -4404,6 +4443,7 @@ int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
trace_func_graph_ret_t ftrace_graph_return =
(trace_func_graph_ret_t)ftrace_stub;
trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
@ -4544,6 +4584,30 @@ static struct ftrace_ops fgraph_ops __read_mostly = {
.flags = FTRACE_OPS_FL_GLOBAL,
};
static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
{
if (!ftrace_ops_test(&global_ops, trace->func))
return 0;
return __ftrace_graph_entry(trace);
}
/*
* The function graph tracer should only trace the functions defined
* by set_ftrace_filter and set_ftrace_notrace. If another function
* tracer ops is registered, the graph tracer requires testing the
* function against the global ops, and not just trace any function
* that any ftrace_ops registered.
*/
static void update_function_graph_func(void)
{
if (ftrace_ops_list == &ftrace_list_end ||
(ftrace_ops_list == &global_ops &&
global_ops.next == &ftrace_list_end))
ftrace_graph_entry = __ftrace_graph_entry;
else
ftrace_graph_entry = ftrace_graph_entry_test;
}
int register_ftrace_graph(trace_func_graph_ret_t retfunc,
trace_func_graph_ent_t entryfunc)
{
@ -4568,7 +4632,16 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
}
ftrace_graph_return = retfunc;
ftrace_graph_entry = entryfunc;
/*
* Update the indirect function to the entryfunc, and the
* function that gets called to the entry_test first. Then
* call the update fgraph entry function to determine if
* the entryfunc should be called directly or not.
*/
__ftrace_graph_entry = entryfunc;
ftrace_graph_entry = ftrace_graph_entry_test;
update_function_graph_func();
ret = ftrace_startup(&fgraph_ops, FTRACE_START_FUNC_RET);
@ -4587,6 +4660,7 @@ void unregister_ftrace_graph(void)
ftrace_graph_active--;
ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
ftrace_graph_entry = ftrace_graph_entry_stub;
__ftrace_graph_entry = ftrace_graph_entry_stub;
ftrace_shutdown(&fgraph_ops, FTRACE_STOP_FUNC_RET);
unregister_pm_notifier(&ftrace_suspend_notifier);
unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);

View File

@ -41,6 +41,7 @@ obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
lib-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
GCOV_PROFILE_hweight.o := n
CFLAGS_hweight.o = $(subst $(quote),,$(CONFIG_ARCH_HWEIGHT_CFLAGS))
obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o

View File

@ -926,7 +926,8 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
* %pK cannot be used in IRQ context because its test
* for CAP_SYSLOG would be meaningless.
*/
if (in_irq() || in_serving_softirq() || in_nmi()) {
if (kptr_restrict && (in_irq() || in_serving_softirq() ||
in_nmi())) {
if (spec.field_width == -1)
spec.field_width = 2 * sizeof(void *);
return string(buf, end, "pK-error", spec);

View File

@ -309,3 +309,5 @@ extern u64 hwpoison_filter_flags_mask;
extern u64 hwpoison_filter_flags_value;
extern u64 hwpoison_filter_memcg;
extern u32 hwpoison_filter_enable;
extern void set_pageblock_order(void);

View File

@ -1993,11 +1993,12 @@ int __set_page_dirty_nobuffers(struct page *page)
if (!TestSetPageDirty(page)) {
struct address_space *mapping = page_mapping(page);
struct address_space *mapping2;
unsigned long flags;
if (!mapping)
return 1;
spin_lock_irq(&mapping->tree_lock);
spin_lock_irqsave(&mapping->tree_lock, flags);
mapping2 = page_mapping(page);
if (mapping2) { /* Race with truncate? */
BUG_ON(mapping2 != mapping);
@ -2006,7 +2007,7 @@ int __set_page_dirty_nobuffers(struct page *page)
radix_tree_tag_set(&mapping->page_tree,
page_index(page), PAGECACHE_TAG_DIRTY);
}
spin_unlock_irq(&mapping->tree_lock);
spin_unlock_irqrestore(&mapping->tree_lock, flags);
if (mapping->host) {
/* !PageAnon && !swapper_space */
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);

View File

@ -4254,25 +4254,24 @@ static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone,
#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE
/* Return a sensible default order for the pageblock size. */
static inline int pageblock_default_order(void)
{
if (HPAGE_SHIFT > PAGE_SHIFT)
return HUGETLB_PAGE_ORDER;
return MAX_ORDER-1;
}
/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */
static inline void __init set_pageblock_order(unsigned int order)
void __init set_pageblock_order(void)
{
unsigned int order;
/* Check that pageblock_nr_pages has not already been setup */
if (pageblock_order)
return;
if (HPAGE_SHIFT > PAGE_SHIFT)
order = HUGETLB_PAGE_ORDER;
else
order = MAX_ORDER - 1;
/*
* Assume the largest contiguous order of interest is a huge page.
* This value may be variable depending on boot parameters on IA64
* This value may be variable depending on boot parameters on IA64 and
* powerpc.
*/
pageblock_order = order;
}
@ -4280,15 +4279,13 @@ static inline void __init set_pageblock_order(unsigned int order)
/*
* When CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is not set, set_pageblock_order()
* and pageblock_default_order() are unused as pageblock_order is set
* at compile-time. See include/linux/pageblock-flags.h for the values of
* pageblock_order based on the kernel config
* is unused as pageblock_order is set at compile-time. See
* include/linux/pageblock-flags.h for the values of pageblock_order based on
* the kernel config
*/
static inline int pageblock_default_order(unsigned int order)
void __init set_pageblock_order(void)
{
return MAX_ORDER-1;
}
#define set_pageblock_order(x) do {} while (0)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
@ -4376,7 +4373,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
if (!size)
continue;
set_pageblock_order(pageblock_default_order());
set_pageblock_order();
setup_usemap(pgdat, zone, zone_start_pfn, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);

View File

@ -486,6 +486,9 @@ void __init sparse_init(void)
struct page **map_map;
#endif
/* Setup pageblock_order for HUGETLB_PAGE_SIZE_VARIABLE */
set_pageblock_order();
/*
* map is using big page (aka 2M in x86 64 bit)
* usemap is less one page (aka 24 bytes)

View File

@ -1229,6 +1229,10 @@ static int security_context_to_sid_core(const char *scontext, u32 scontext_len,
struct context context;
int rc = 0;
/* An empty security context is never valid. */
if (!scontext_len)
return -EINVAL;
if (!ss_initialized) {
int i;

View File

@ -318,6 +318,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table *rt,
*/
hlist_for_each_entry(ei, n, &rt->map[ue->gsi], link)
if (ei->type == KVM_IRQ_ROUTING_MSI ||
ue->type == KVM_IRQ_ROUTING_MSI ||
ue->u.irqchip.irqchip == ei->irqchip.irqchip)
return r;