Compare commits

...

175 Commits

Author SHA1 Message Date
906d77a3c6 Linux 3.17.2 2014-10-30 09:43:25 -07:00
29e743171f sparc64: Implement __get_user_pages_fast().
[ Upstream commit 06090e8ed8 ]

It is not sufficient to only implement get_user_pages_fast(), you
must also implement the atomic version __get_user_pages_fast()
otherwise you end up using the weak symbol fallback implementation
which simply returns zero.

This is dangerous, because it causes the futex code to loop forever
if transparent hugepages are supported (see get_futex_key()).

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:19 -07:00
3ecc3b82d3 sparc64: Fix register corruption in top-most kernel stack frame during boot.
[ Upstream commit ef3e035c3a ]

Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
eventually narrowed this down to only impacting machines using
UltraSPARC-III and derivitive cpus.

The crash happens right when the first user process is spawned:

[   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[   54.451346]
[   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab #96
[   54.666431] Call Trace:
[   54.698453]  [0000000000762f8c] panic+0xb0/0x224
[   54.759071]  [000000000045cf68] do_exit+0x948/0x960
[   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
[   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
[   54.978662] Press Stop-A (L1-A) to return to the boot prom
[   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Further investigation showed that compiling only per_cpu_patch() with
an older compiler fixes the boot.

Detailed analysis showed that the function is not being miscompiled by
gcc-4.9, but it is using a different register allocation ordering.

With the gcc-4.9 compiled function, something during the code patching
causes some of the %i* input registers to get corrupted.  Perhaps
we have a TLB miss path into the firmware that is deep enough to
cause a register window spill and subsequent restore when we get
back from the TLB miss trap.

Let's plug this up by doing two things:

1) Stop using the firmware stack for client interface calls into
   the firmware.  Just use the kernel's stack.

2) As soon as we can, call into a new function "start_early_boot()"
   to put a one-register-window buffer between the firmware's
   deepest stack frame and the top-most initial kernel one.

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
5cf02ef6d0 sparc64: Increase size of boot string to 1024 bytes
[ Upstream commit 1cef94c36b ]

This is the longest boot string that silo supports.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
5c51a8b0cd sparc64: Kill unnecessary tables and increase MAX_BANKS.
[ Upstream commit d195b71bad ]

swapper_low_pmd_dir and swapper_pud_dir are actually completely
useless and unnecessary.

We just need swapper_pg_dir[].  Naturally the other page table chunks
will be allocated on an as-needed basis.  Since the kernel actually
accesses these tables in the PAGE_OFFSET view, there is not even a TLB
locality advantage of placing them in the kernel image.

Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
naturally page aligned.

Increase MAX_BANKS to 1024 in order to handle heavily fragmented
virtual guests.

Even with this MAX_BANKS increase, the kernel is 20K+ smaller.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
82f230ed1f sparc64: sparse irq
[ Upstream commit ee6a9333fa ]

This patch attempts to do a few things. The highlights are: 1) enable
SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
ivector_table at boot time and 4) default to cookie only VIRQ mechanism
for supported firmware. The first firmware with cookie only support for
me appears on T5. You can optionally force the HV firmware to not cookie
only mode which is the sysino support.

The sysino is a deprecated HV mechanism according to the most recent
SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
cookie/sysino firmware versioning.

The history of this interface is:

1) Major version 1.0 only supported sysino based interrupt interfaces.

2) Major version 2.0 added cookie based VIRQs, however due to the fact
   that OSs were using the VIRQs without negoatiating major version
   2.0 (Linux and Solaris are both guilty), the VIRQs calls were
   allowed even with major version 1.0

   To complicate things even further, the VIRQ interfaces were only
   actually hooked up in the hypervisor for LDC interrupt sources.
   VIRQ calls on other device types would result in HV_EINVAL errors.

   So effectively, major version 2.0 is unusable.

3) Major version 3.0 was created to signal use of VIRQs and the fact
   that the hypervisor has these calls hooked up for all interrupt
   sources, not just those for LDC devices.

A new boot option is provided should cookie only HV support have issues.
hvirq - this is the version for HV_GRP_INTR. This is related to HV API
versioning.  The code attempts major=3 first by default. The option can
be used to override this default.

I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
0c64120b21 sparc64: Adjust vmalloc region size based upon available virtual address bits.
[ Upstream commit bb4e6e85da ]

In order to accomodate embedded per-cpu allocation with large numbers
of cpus and numa nodes, we have to use as much virtual address space
as possible for the vmalloc region.  Otherwise we can get things like:

PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000

So, once we select a value for PAGE_OFFSET, derive the size of the
vmalloc region based upon that.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
004f665611 sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53.
Make sure, at compile time, that the kernel can properly support
whatever MAX_PHYS_ADDRESS_BITS is defined to.

On M7 chips, use a max_phys_bits value of 49.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
1bbd6776e3 sparc64: Use kernel page tables for vmemmap.
[ Upstream commit c06240c7f5 ]

For sparse memory configurations, the vmemmap array behaves terribly
and it takes up an inordinate amount of space in the BSS section of
the kernel image unconditionally.

Just build huge PMDs and look them up just like we do for TLB misses
in the vmalloc area.

Kernel BSS shrinks by about 2MB.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
92566c44de sparc64: Fix physical memory management regressions with large max_phys_bits.
[ Upstream commit 0dd5b7b09e ]

If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.

Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.

Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.

The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.

The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.

The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.

These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image.  Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.

So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.

We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.

On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:18 -07:00
73bf1d0c9e sparc64: Adjust KTSB assembler to support larger physical addresses.
[ Upstream commit 8c82dc0e88 ]

As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
498db9cdd8 sparc64: Define VA hole at run time, rather than at compile time.
[ Upstream commit 4397bed080 ]

Now that we use 4-level page tables, we can provide up to 53-bits of
virtual address space to the user.

Adjust the VA hole based upon the capabilities of the cpu type probed.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
665fae74eb sparc64: Switch to 4-level page tables.
[ Upstream commit ac55c76814 ]

This has become necessary with chips that support more than 43-bits
of physical addressing.

Based almost entirely upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
d9fac2ec79 sparc64: T5 PMU
The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
29893e72a9 sparc64: cpu hardware caps support for sparc M6 and M7
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
37f0a64f5a sparc64: support M6 and M7 for building CPU distribution map
Add M6 and M7 chip type in cpumap.c to correctly build CPU distribution map that spans all online CPUs.

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
319dffe735 sparc64: correctly recognise M6 and M7 cpu type
The following patch adds support for correctly
recognising M6 and M7 cpu type.

Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
3d6dc58816 sparc64: Fix hibernation code refrence to PAGE_OFFSET.
We changed PAGE_OFFSET to be a variable rather than a constant,
but this reference here in the hibernate assembler got missed.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
54bca039c0 sparc64: Do not define thread fpregs save area as zero-length array.
[ Upstream commit e2653143d7 ]

This breaks the stack end corruption detection facility.

What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.

"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.

So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.

Fix this by making the size explicit.

Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves.  So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:17 -07:00
9c47c0adf0 sparc64: Fix FPU register corruption with AES crypto offload.
[ Upstream commit f4da3628dc ]

The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls.  And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time.  Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot.  Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
a0a203b25f sparc64: Fix lockdep warnings on reboot on Ultra-5
[ Upstream commit bdcf81b658 ]

Inconsistently, the raw_* IRQ routines do not interact with and update
the irqflags tracing and lockdep state, whereas the raw_* spinlock
interfaces do.

This causes problems in p1275_cmd_direct() because we disable hardirqs
by hand using raw_local_irq_restore() and then do a raw_spin_lock()
which triggers a lockdep trace because the CPU's hw IRQ state doesn't
match IRQ tracing's internal software copy of that state.

The CPU's irqs are disabled, yet current->hardirqs_enabled is true.

====================
reboot: Restarting system
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
Modules linked in: openpromfs
CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G        W      3.17.0-dirty #145
Call Trace:
 [000000000045919c] warn_slowpath_common+0x5c/0xa0
 [0000000000459210] warn_slowpath_fmt+0x30/0x40
 [000000000048f41c] check_flags+0x7c/0x240
 [0000000000493280] lock_acquire+0x20/0x1c0
 [0000000000832b70] _raw_spin_lock+0x30/0x60
 [000000000068f2fc] p1275_cmd_direct+0x1c/0x60
 [000000000068ed28] prom_reboot+0x28/0x40
 [000000000043610c] machine_restart+0x4c/0x80
 [000000000047d2d4] kernel_restart+0x54/0x80
 [000000000047d618] SyS_reboot+0x138/0x200
 [00000000004060b4] linux_sparc_syscall32+0x34/0x60
---[ end trace 5c439fe81c05a100 ]---
possible reason: unannotated irqs-off.
irq event stamp: 2010267
hardirqs last  enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
softirqs last  enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
Resetting ...
====================

Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
all of our changes.

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
7250fa07dc sparc64: Fix reversed start/end in flush_tlb_kernel_range()
[ Upstream commit 473ad7f4fb ]

When we have to split up a flush request into multiple pieces
(in order to avoid the firmware range) we don't specify the
arguments in the right order for the second piece.

Fix the order, or else we get hangs as the code tries to
flush "a lot" of entries and we get lockups like this:

[ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
[ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
[ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
[ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
[ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
[ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
[ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
[ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
[ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
[ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
[ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
[ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
[ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
[ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
[ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
[ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
[ 4423.234193] Call Trace:
[ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
[ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
[ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
[ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
[ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
[ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
[ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
[ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
[ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
[ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
[ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
[ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
[ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c

Fixes: 4ca9a23765 ("sparc64: Guard against flushing openfirmware mappings.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
081dbb9fa9 sparc: Let memset return the address argument
[ Upstream commit 74cad25c07 ]

This makes memset follow the standard (instead of returning 0 on success). This
is needed when certain versions of gcc optimizes around memset calls and assume
that the address argument is preserved in %o0.

Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
40b18eec95 sparc64: Move request_irq() from ldc_bind() to ldc_alloc()
[ Upstream commit c21c4ab0d6 ]

The request_irq() needs to be done from ldc_alloc()
to avoid the following (caught by lockdep)

 [00000000004a0738] __might_sleep+0xf8/0x120
 [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
 [00000000004faf80] request_threaded_irq+0x80/0x160
 [000000000044f71c] ldc_bind+0x7c/0x220
 [0000000000452454] vio_port_up+0x54/0xe0
 [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
 [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
 [0000000000451a88] vio_device_probe+0x48/0x60
 [000000000074c56c] really_probe+0x6c/0x300
 [000000000074c83c] driver_probe_device+0x3c/0xa0
 [000000000074c92c] __driver_attach+0x8c/0xa0
 [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
 [000000000074c1dc] driver_attach+0x1c/0x40
 [000000000074b0fc] bus_add_driver+0xbc/0x280

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
6568283ac0 sparc64: find_node adjustment
[ Upstream commit 3dee9df548 ]

We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured.  Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
 memory size = 0x27f870000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
 memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
 memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
 memory size = 0xfdf2d0000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
 memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
 memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
1e53e8b4e2 sparc64: Fix corrupted thread fault code.
[ Upstream commit 84bd6d8b9c ]

Every path that ends up at do_sparc64_fault() must install a valid
FAULT_CODE_* bitmask in the per-thread fault code byte.

Two paths leading to the label winfix_trampoline (which expects the
FAULT_CODE_* mask in register %g4) were not doing so:

1) For pre-hypervisor TLB protection violation traps, if we took
   the 'winfix_trampoline' path we wouldn't have %g4 initialized
   with the FAULT_CODE_* value yet.  Resulting in using the
   TLB_TAG_ACCESS register address value instead.

2) In the TSB miss path, when we notice that we are going to use a
   hugepage mapping, but we haven't allocated the hugepage TSB yet, we
   still have to take the window fixup case into consideration and
   in that particular path we leave %g4 not setup properly.

Errors on this sort were largely invisible previously, but after
commit 4ccb927289 ("sparc64: sun4v TLB
error power off events") we now have a fault_code mask bit
(FAULT_CODE_BAD_RA) that triggers due to this bug.

FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
(see #1 above) and thus we get seemingly random bus errors triggered
for user processes.

Fixes: 4ccb927289 ("sparc64: sun4v TLB error power off events")
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
ebc43a92e6 sparc64: sun4v TLB error power off events
[ Upstream commit 4ccb927289 ]

We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
4ba7944566 sparc32: dma_alloc_coherent must honour gfp flags
[ Upstream commit d1105287aa ]

dma_zalloc_coherent() calls dma_alloc_coherent(__GFP_ZERO)
but the sparc32 implementations sbus_alloc_coherent() and
pci32_alloc_coherent() doesn't take the gfp flags into
account.

Tested on the SPARC32/LEON GRETH Ethernet driver which fails
due to dma_alloc_coherent(__GFP_ZERO) returns non zeroed
pages.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
8006fece34 ima: pass 'opened' flag to identify newly created files
commit 3034a14682 upstream.

Empty files and missing xattrs do not guarantee that a file was
just created.  This patch passes FILE_CREATED flag to IMA to
reliably identify new files.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
f072a54f68 ima: provide flag to identify new empty files
commit b151d6b00b upstream.

On ima_file_free(), newly created empty files are not labeled with
an initial security.ima value, because the iversion did not change.
Commit dff6efc "fs: fix iversion handling" introduced a change in
iversion behavior.  To verify this change use the shell command:

  $ (exec >foo)
  $ getfattr -h -e hex -d -m security foo

This patch defines the IMA_NEW_FILE flag.  The flag is initially
set, when IMA detects that a new file is created, and subsequently
checked on the ima_file_free() hook to set the initial security.ima
value.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:16 -07:00
90c62d0e9c ima: fix fallback to use new_sync_read()
commit 27cd1fc3ae upstream.

3.16 commit aad4f8bb42
'switch simple generic_file_aio_read() users to ->read_iter()'
replaced ->aio_read with ->read_iter in most of the file systems
and introduced new_sync_read() as a replacement for do_sync_read().

Most of file systems set '->read' and ima_kernel_read is not affected.
When ->read is not set, this patch adopts fallback call changes from the
vfs_read.

Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
11b477dd03 powerpc/eeh: Clear frozen device state in time
commit 22fca17924 upstream.

The problem was reported by Carol: In the scenario of passing mlx4
adapter to guest, EEH error could be recovered successfully. When
returning the device back to host, the driver (mlx4_core.ko)
couldn't be loaded successfully because of error number -5 (-EIO)
returned from mlx4_get_ownership(), which hits offlined PCI device.
The root cause is that we missed to put the affected devices into
normal state on clearing PE isolated state right after PE reset.

The patch fixes above issue by putting the affected devices to
normal state when clearing PE isolated state in eeh_pe_state_clear().

Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
ef409eff9c powerpc/iommu/ddw: Fix endianness
commit 9410e0185e upstream.

rtas_call() accepts and returns values in CPU endianness.
The ddw_query_response and ddw_create_response structs members are
defined and treated as BE but as they are passed to rtas_call() as
(u32 *) and they get byteswapped automatically, the data is CPU-endian.
This fixes ddw_query_response and ddw_create_response definitions and use.

of_read_number() is designed to work with device tree cells - it assumes
the input is big-endian and returns data in CPU-endian. However due
to the ddw_create_response struct fix, create.addr_hi/lo are already
CPU-endian so do not byteswap them.

ddw_avail is a pointer to the "ibm,ddw-applicable" property which contains
3 cells which are big-endian as it is a device tree. rtas_call() accepts
a RTAS token in CPU-endian. This makes use of of_property_read_u32_array
to byte swap and avoid the need for a number of be32_to_cpu calls.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: folded Anton's patch with of_property_read_u32_array]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
233f7b53f0 powerpc: Only set numa node information for present cpus at boottime
commit bc3c4327c9 upstream.

As Nish suggested, it makes more sense to init the numa node informatiion
for present cpus at boottime, which could also avoid WARN_ON(1) in
numa_setup_cpu().

With this change, we also need to change the smp_prepare_cpus() to set up
numa information only on present cpus.

For those possible, but not present cpus, their numa information
will be set up after they are started, as the original code did before commit
2fabf084b6.

Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
d74662d337 powerpc: Fix warning reported by verify_cpu_node_mapping()
commit 70ad237515 upstream.

With commit 2fabf084b6 ("powerpc: reorder per-cpu NUMA information's
initialization"), during boottime, cpu_numa_callback() is called
earlier(before their online) for each cpu, and verify_cpu_node_mapping()
uses cpu_to_node() to check whether siblings are in the same node.

It skips the checking for siblings that are not online yet. So the only
check done here is for the bootcpu, which is online at that time. But
the per-cpu numa_node cpu_to_node() uses hasn't been set up yet (which
will be set up in smp_prepare_cpus()).

So I saw something like following reported:
[    0.000000] CPU thread siblings 1/2/3 and 0 don't belong to the same
node!

As we don't actually do the checking during this early stage, so maybe
we could directly call numa_setup_cpu() in do_init_bootmem().

Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
839ba7d5d3 futex: Ensure get_futex_key_refs() always implies a barrier
commit 76835b0ebf upstream.

Commit b0c29f79ec (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd0 (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ec (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
9edb0e6bb5 mm/balloon_compaction: redesign ballooned pages management
commit d6d86c0a7f upstream.

Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags.  This function has no protection
against anonymous pages.  As result it tried to check address space flags
inside struct anon_vma.

Further investigation shows more problems in current implementation:

* Special branch in __unmap_and_move() never works:
  balloon_page_movable() checks page flags and page_count.  In
  __unmap_and_move() page is locked, reference counter is elevated, thus
  balloon_page_movable() always fails.  As a result execution goes to the
  normal migration path.  virtballoon_migratepage() returns
  MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
  move_to_new_page() thinks this is an error code and assigns
  newpage->mapping to NULL.  Newly migrated page lose connectivity with
  balloon an all ability for further migration.

* lru_lock erroneously required in isolate_migratepages_range() for
  isolation ballooned page.  This function releases lru_lock periodically,
  this makes migration mostly impossible for some pages.

* balloon_page_dequeue have a tight race with balloon_page_isolate:
  balloon_page_isolate could be executed in parallel with dequeue between
  picking page from list and locking page_lock.  Race is rare because they
  use trylock_page() for locking.

This patch fixes all of them.

Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256.  Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose.  Storing mark
directly in struct page makes everything safer and easier.

PagePrivate is used to mark pages present in page list (i.e.  not
isolated, like PageLRU for normal pages).  It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages.  This flag is protected by page_lock together with link to
the balloon device.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
db96c43598 rtc-cmos: fix wakeup from S5 without CONFIG_PM_SLEEP
commit a882b14fe8 upstream.

Commit b5ada4600d ("drivers/rtc/rtc-cmos.c: fix compilation warning
when !CONFIG_PM_SLEEP") broke wakeup from S5 by making cmos_poweroff a
nop unless CONFIG_PM_SLEEP was defined.

Fix this by restricting the #ifdef to cmos_resume and restoring the old
dependency on CONFIG_PM for cmos_suspend and cmos_poweroff.

Signed-off-by: Daniel Glöckner <daniel-gl@gmx.net>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
204530a20b kernel: add support for gcc 5
commit 71458cfc78 upstream.

We're missing include/linux/compiler-gcc5.h which is required now
because gcc branched off to v5 in trunk.

Just copy the relevant bits out of include/linux/compiler-gcc4.h,
no new code is added as of now.

This fixes a build error when using gcc 5.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
c94249271a mm/cma: fix cma bitmap aligned mask computing
commit 68faed630f upstream.

The current cma bitmap aligned mask computation is incorrect.  It could
cause an unexpected alignment when using cma_alloc() if the wanted align
order is larger than cma->order_per_bit.

Take kvm for example (PAGE_SHIFT = 12), kvm_cma->order_per_bit is set to
6.  When kvm_alloc_rma() tries to alloc kvm_rma_pages, it will use 15 as
the expected align value.  After using the current implementation however,
we get 0 as cma bitmap aligned mask other than 511.

This patch fixes the cma bitmap aligned mask calculation.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
74486de6fb fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
commit 0b37e097a6 upstream.

According to commit 80af258867 ("fanotify: groups can specify their
f_flags for new fd"), file descriptors created as part of file access
notification events inherit flags from the event_f_flags argument passed
to syscall fanotify_init(2)[1].

Unfortunately O_CLOEXEC is currently silently ignored.

Indeed, event_f_flags are only given to dentry_open(), which only seems to
care about O_ACCMODE and O_PATH in do_dentry_open(), O_DIRECT in
open_check_o_direct() and O_LARGEFILE in generic_file_open().

It's a pity, since, according to some lookup on various search engines and
http://codesearch.debian.net/, there's already some userspace code which
use O_CLOEXEC:

- in systemd's readahead[2]:

    fanotify_fd = fanotify_init(FAN_CLOEXEC|FAN_NONBLOCK, O_RDONLY|O_LARGEFILE|O_CLOEXEC|O_NOATIME);

- in clsync[3]:

    #define FANOTIFY_EVFLAGS (O_LARGEFILE|O_RDONLY|O_CLOEXEC)

    int fanotify_d = fanotify_init(FANOTIFY_FLAGS, FANOTIFY_EVFLAGS);

- in examples [4] from "Filesystem monitoring in the Linux
  kernel" article[5] by Aleksander Morgado:

    if ((fanotify_fd = fanotify_init (FAN_CLOEXEC,
                                      O_RDONLY | O_CLOEXEC | O_LARGEFILE)) < 0)

Additionally, since commit 48149e9d3a ("fanotify: check file flags
passed in fanotify_init").  having O_CLOEXEC as part of fanotify_init()
second argument is expressly allowed.

So it seems expected to set close-on-exec flag on the file descriptors if
userspace is allowed to request it with O_CLOEXEC.

But Andrew Morton raised[6] the concern that enabling now close-on-exec
might break existing applications which ask for O_CLOEXEC but expect the
file descriptor to be inherited across exec().

In the other hand, as reported by Mihai Dontu[7] close-on-exec on the file
descriptor returned as part of file access notify can break applications
due to deadlock.  So close-on-exec is needed for most applications.

More, applications asking for close-on-exec are likely expecting it to be
enabled, relying on O_CLOEXEC being effective.  If not, it might weaken
their security, as noted by Jan Kara[8].

So this patch replaces call to macro get_unused_fd() by a call to function
get_unused_fd_flags() with event_f_flags value as argument.  This way
O_CLOEXEC flag in the second argument of fanotify_init(2) syscall is
interpreted and close-on-exec get enabled when requested.

[1] http://man7.org/linux/man-pages/man2/fanotify_init.2.html
[2] http://cgit.freedesktop.org/systemd/systemd/tree/src/readahead/readahead-collect.c?id=v208#n294
[3] https://github.com/xaionaro/clsync/blob/v0.2.1/sync.c#L1631
    https://github.com/xaionaro/clsync/blob/v0.2.1/configuration.h#L38
[4] http://www.lanedo.com/~aleksander/fanotify/fanotify-example.c
[5] http://www.lanedo.com/2013/filesystem-monitoring-linux-kernel/
[6] http://lkml.kernel.org/r/20141001153621.65e9258e65a6167bf2e4cb50@linux-foundation.org
[7] http://lkml.kernel.org/r/20141002095046.3715eb69@mdontu-l
[8] http://lkml.kernel.org/r/20141002104410.GB19748@quack.suse.cz

Link: http://lkml.kernel.org/r/cover.1411562410.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Mihai Don\u021bu <mihai.dontu@gmail.com>
Cc: Pádraig Brady <P@draigBrady.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Michael Kerrisk-manpages <mtk.manpages@gmail.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Richard Guy Briggs <rgb@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:15 -07:00
ba572a5ae6 mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
commit 934f3072c1 upstream.

commit 21caf2fc19 ("mm: teach mm by current context info to not do I/O
during memory allocation") introduces PF_MEMALLOC_NOIO flag to avoid doing
I/O inside memory allocation, __GFP_IO is cleared when this flag is set,
but __GFP_FS implies __GFP_IO, it should also be cleared.  Or it may still
run into I/O, like in superblock shrinker.  And this will make the kernel
run into the deadlock case described in that commit.

See Dave Chinner's comment about io in superblock shrinker:

Filesystem shrinkers do indeed perform IO from the superblock shrinker and
have for years.  Even clean inodes can require IO before they can be freed
- e.g.  on an orphan list, need truncation of post-eof blocks, need to
wait for ordered operations to complete before it can be freed, etc.

IOWs, Ext4, btrfs and XFS all can issue and/or block on arbitrary amounts
of IO in the superblock shrinker context.  XFS, in particular, has been
doing transactions and IO from the VFS inode cache shrinker since it was
first introduced....

Fix this by clearing __GFP_FS in memalloc_noio_flags(), this function has
masked all the gfp_mask that will be passed into fs for the processes
setting PF_MEMALLOC_NOIO in the direct reclaim path.

v1 thread at: https://lkml.org/lkml/2014/9/3/32

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: joyce.xue <xuejiufei@huawei.com>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
25984b38a5 Bluetooth: 6lowpan: Route packets that are not meant to peer via correct device
commit 39e90c7763 upstream.

Packets that are supposed to be delivered via the peer device need to
be checked and sent to correct device. This requires that user has set
the routes properly so that the 6lowpan module can then figure out
the destination gateway and the correct Bluetooth device.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
db4a27fea4 Bluetooth: 6lowpan: Set the peer IPv6 address correctly
commit b2799cec22 upstream.

The peer IPv6 address contained wrong U/L bit in the EUI-64 part.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
617cef5ae6 Bluetooth: 6lowpan: Increase the connection timeout value
commit 2ae50d8d3a upstream.

Use the default connection timeout value defined in l2cap.h because
the current timeout was too short and most of the time the connection
attempts timed out.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
e8442e3c8e Bluetooth: Fix setting correct security level when initiating SMP
commit 5eb596f55c upstream.

We can only determine the final security level when both pairing request
and response have been exchanged. When initiating pairing the starting
target security level is set to MEDIUM unless explicitly specified to be
HIGH, so that we can still perform pairing even if the remote doesn't
have MITM capabilities. However, once we've received the pairing
response we should re-consult the remote and local IO capabilities and
upgrade the target security level if necessary.

Without this patch the resulting Long Term Key will occasionally be
reported to be unauthenticated when it in reality is an authenticated
one.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
12662c19b8 Bluetooth: Fix issue with USB suspend in btusb driver
commit 85560c4a82 upstream.

Suspend could fail for some platforms because
btusb_suspend==> btusb_stop_traffic ==> usb_kill_anchored_urbs.

When btusb_bulk_complete returns before system suspend and resubmits
an URB, the system cannot enter suspend state.

Signed-off-by: Champion Chen <champion_chen@realsil.com.cn>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
b62619907c Bluetooth: Fix incorrect LE CoC PDU length restriction based on HCI MTU
commit 72c6fb915f upstream.

The l2cap_create_le_flowctl_pdu() function that l2cap_segment_le_sdu()
calls is perfectly capable of doing packet fragmentation if given bigger
PDUs than the HCI buffers allow. Forcing the PDU length based on the HCI
MTU (conn->mtu) would therefore needlessly strict operation on hardware
with limited LE buffers (e.g. both Intel and Broadcom seem to have this
set to just 27 bytes).

This patch removes the restriction and makes it possible to send PDUs of
the full length that the remote MPS value allows.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
95d4ce059f Bluetooth: Fix HCI H5 corrupted ack value
commit 4807b51895 upstream.

In this expression: seq = (seq - 1) % 8
seq (u8) is implicitly converted to an int in the arithmetic operation.
So if seq value is 0, operation is ((0 - 1) % 8) => (-1 % 8) => -1.
The new seq value is 0xff which is an invalid ACK value, we expect 0x07.
It leads to frequent dropped ACK and retransmission.
Fix this by using '&' binary operator instead of '%'.

Signed-off-by: Loic Poulain <loic.poulain@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
3465ae92c7 Revert "ath9k_hw: reduce ANI firstep range for older chips"
commit 171cdab8c7 upstream.

This reverts commit 09efc56345

I've received reports that this change is decreasing throughput in some
rare conditions on an AR9280 based device

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
e8beb3b22d rt2800: correct BBP1_TX_POWER_CTRL mask
commit 01f7feeaf4 upstream.

Two bits control TX power on BBP_R1 register. Correct the mask,
otherwise we clear additional bit on BBP_R1 register, what can have
unknown, possible negative effect.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:14 -07:00
9e4d53a9a8 PCI: Generate uppercase hex for modalias interface class
commit 89ec3dcf17 upstream.

Some implementations of modprobe fail to load the driver for a PCI device
automatically because the "interface" part of the modalias from the kernel
is lowercase, and the modalias from file2alias is uppercase.

The "interface" is the low-order byte of the Class Code, defined in PCI
r3.0, Appendix D.  Most interface types defined in the spec do not use
alpha characters, so they won't be affected.  For example, 00h, 01h, 10h,
20h, etc. are unaffected.

Print the "interface" byte of the Class Code in uppercase hex, as we
already do for the Vendor ID, Device ID, Class, etc.

[bhelgaas: changelog]
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
f5ff4624c7 PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
commit 9fe373f999 upstream.

The Crocodile chip occasionally comes up with 4k and 8k BAR sizes.  Due to
an erratum, setting the SR-IOV page size causes the physical function BARs
to expand to the system page size.  Since ppc64 uses 64k pages, when Linux
tries to assign the smaller resource sizes to the now 64k BARs the address
will be truncated and the BARs will overlap.

Force Linux to allocate the resource as a full page, which avoids the
overlap.

[bhelgaas: print expanded resource, too]
Signed-off-by: Douglas Lehr <dllehr@us.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
1687546e7c PCI: Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources()
commit d61b0e87d2 upstream.

In 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to
64-bit resources"), we added IORESOURCE_MEM_64 to the mask in
pci_assign_unassigned_root_bus_resources(), but not to the mask in
pci_assign_unassigned_bridge_resources().

Add IORESOURCE_MEM_64 to the pci_assign_unassigned_bridge_resources() type
mask.

Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
09a18554d6 PCI: mvebu: Fix uninitialized variable in mvebu_get_tgt_attr()
commit 56fab6e189 upstream.

Geert Uytterhoeven reported a warning when building pci-mvebu:

  drivers/pci/host/pci-mvebu.c: In function 'mvebu_get_tgt_attr':
  drivers/pci/host/pci-mvebu.c:887:39: warning: 'rtype' may be used uninitialized in this function [-Wmaybe-uninitialized]
     if (slot == PCI_SLOT(devfn) && type == rtype) {
					 ^

And indeed, the code of mvebu_get_tgt_attr() may lead to the usage of rtype
when being uninitialized, even though it would only happen if we had
entries other than I/O space and 32 bits memory space.

This commit fixes that by simply skipping the current DT range being
considered, if it doesn't match the resource type we're looking for.

Reported-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
11db06170f xfs: fix agno increment in xfs_inumbers() loop
commit a8b1ee8baf upstream.

caused a regression in xfs_inumbers, which in turn broke
xfsdump, causing incomplete dumps.

The loop in xfs_inumbers() needs to fill the user-supplied
buffers, and iterates via xfs_btree_increment, reading new
ags as needed.

But the first time through the loop, if xfs_btree_increment()
succeeds, we continue, which triggers the ++agno at the bottom
of the loop, and we skip to soon to the next ag - without
the proper setup under next_ag to read the next ag.

Fix this by removing the agno increment from the loop conditional,
and only increment agno if we have actually hit the code under
the next_ag: target.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
e802422206 xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly
commit 0d085a529b upstream.

XFS has been having trouble with stray delayed allocation extents
beyond EOF for a long time. Recent changes to the collapse range
code has triggered erroneous EBUSY errors on page invalidtion for
block size smaller than page size filesystems. These
have been caused by dirty buffers beyond EOF on a partial page which
do not get written to disk during a sync.

The issue is that write-ahead in xfs_cluster_write() finds such a
partial page and handles it by leaving the page dirty but pushing it
into a writeback state. This used to work just fine, as the
write_cache_pages() code would then find the dirty partial page in
the next mapping tree lookup as the dirty tag is still set.

Unfortunately, when we moved to a mark and sweep approach to
writeback to fix other writeback sync issues, we broken this. THe
act of marking the page as under writeback now clears the TOWRITE
tag in the radix tree, even though the page is still dirty. This
causes the TOWRITE tag to be cleared, and hence the next lookup on
the mapping tree does not find the dirty partial page and so doesn't
try to write it again.

This same writeback bug was found recently in ext4 and fixed in
commit 1c8349a ("ext4: fix data integrity sync in ordered mode")
without communication to the wider filesystem community. We can use
exactly the same fix here so the TOWRITE flag is not cleared on
partial page writes.

cc: stable@vger.kernel.org # dependent on 1c8349a171
Root-cause-found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
b6975072a1 vfio-pci: Fix remove path locking
commit 93899a679f upstream.

Locking both the remove() and release() path results in a deadlock
that should have been obvious.  To fix this we can get and hold the
vfio_device reference as we evaluate whether to do a bus/slot reset.
This will automatically block any remove() calls, allowing us to
remove the explict lock.  Fixes 61d792562b.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
801c7a20d2 udf: Fix loading of special inodes
commit 6174c2eb8e upstream.

Some UDF media have special inodes (like VAT or metadata partition
inodes) whose link_count is 0. Thus commit 4071b91362 (udf: Properly
detect stale inodes) broke loading these inodes because udf_iget()
started returning -ESTALE for them. Since we still need to properly
detect stale inodes queried by NFS, create two variants of udf_iget() -
one which is used for looking up special inodes (which ignores
link_count == 0) and one which is used for other cases which return
ESTALE when link_count == 0.

Fixes: 4071b91362
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
aa9e8c5561 ecryptfs: avoid to access NULL pointer when write metadata in xattr
commit 35425ea249 upstream.

Christopher Head 2014-06-28 05:26:20 UTC described:
"I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
PGD d7840067 PUD b2c3c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nvidia(PO)
CPU: 3 PID: 3566 Comm: bash Tainted: P           O 3.12.21-gentoo-r1 #2
Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
RIP: 0010:[<ffffffff8110eb39>]  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP: 0018:ffff8800bad71c10  EFLAGS: 00010246
RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
FS:  00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
Stack:
ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
Call Trace:
[<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
[<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
[<ffffffff81082c2c>] ? should_resched+0x5/0x23
[<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
[<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
[<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
[<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
[<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
[<ffffffff810fa2a3>] ? path_openat+0x224/0x472
[<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
[<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
[<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
[<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
RIP  [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP <ffff8800bad71c10>
CR2: 0000000000000000
---[ end trace df9dba5f1ddb8565 ]---"

If we create a file when we mount with ecryptfs_xattr_metadata option, we will
encounter a crash in this path:
->ecryptfs_create
  ->ecryptfs_initialize_file
    ->ecryptfs_write_metadata
      ->ecryptfs_write_metadata_to_xattr
        ->ecryptfs_setxattr
          ->fsstack_copy_attr_all
It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
will be initialized when ecryptfs_initialize_file finish.

So we should skip copying attr from lower inode when the value of ->d_inode is
invalid.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
35dabdcd78 ARM: dts: imx28-evk: Let i2c0 run at 100kHz
commit d1e61eb443 upstream.

Commit 78b81f4666 ("ARM: dts: imx28-evk: Run I2C0 at 400kHz") caused issues
when doing the following sequence in loop:

- Boot the kernel
- Perform audio playback
- Reboot the system via 'reboot' command

In many times the audio card cannot be probed, which causes playback to fail.

After restoring to the original i2c0 frequency of 100kHz there is no such
problem anymore.

This reverts commit 78b81f4666.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:13 -07:00
78027e4fc7 ARM: mvebu: Netgear RN102: Use Hardware BCH ECC
commit ace8578182 upstream.

The bootloader on the Netgear ReadyNAS RN102 uses Hardware BCH ECC
(strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

Fixes: 92beaccd8b ("ARM: mvebu: Enable NAND controller in ReadyNAS 102 .dts file")
Signed-off-by: Ben Peddell <klightspeed@killerwolves.net>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Tested-by: Arnaud Ebalard <arno@natisbad.org>
Link: https://lkml.kernel.org/r/1410339341-3372-1-git-send-email-klightspeed@killerwolves.net
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
9243e74152 ARM: mvebu: Netgear RN2120: Use Hardware BCH ECC
commit 500abb6ccb upstream.

The bootloader on the Netgear ReadyNAS RN2120 uses Hardware BCH
ECC (strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

The issue was initially reported and fixed by Ben Pedell for
RN102. The RN2120 shares the same Hynix H27U1G8F2BTR NAND
flash and setup. This patch is based on Ben's fix for RN102.

Fixes: ad51eddd95 ("ARM: mvebu: Enable NAND controller in ReadyNAS 2120 .dts file")
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Link: https://lkml.kernel.org/r/61f6a1b7ad0adc57a0e201b9680bc2e5f214a317.1410035142.git.arno@natisbad.org
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
edc6193357 ARM: mvebu: Netgear RN104: Use Hardware BCH ECC
commit 225b94cdf7 upstream.

The bootloader on the Netgear ReadyNAS RN104 uses Hardware BCH
ECC (strength = 4), while the pxa3xx NAND driver by default uses
Hamming ECC (strength = 1).

This patch changes the ECC mode on these machines to match that
of the bootloader and of the stock firmware. That way, it is
now possible to update the kernel from userland (e.g. using
standard tools from mtd-utils package); u-boot will happily
load and boot it.

The issue was initially reported and fixed by Ben Pedell for
RN102. The RN104 shares the same Hynix H27U1G8F2BTR NAND
flash and setup. This patch is based on Ben's fix for RN102.

Fixes: 0373a558bd ("ARM: mvebu: Enable NAND controller in ReadyNAS 104 .dts file")
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Link: https://lkml.kernel.org/r/920c7e7169dc6aaaa3eb4bced2336d38e77b8864.1410035142.git.arno@natisbad.org
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
20f1e615d0 ARM: Kirkwood: Fix DT based DSA.
commit 4f5e01e96d upstream.

During the conversion of boards to use DT to instantiate Distributed
Switch Architecture, nobody volunteered to test. As to be expected,
the conversion was flawed. Testers and access to hardware has now
become available, and this patch hopefully fixes the problems.

dsa,mii-bus must be a phandle to the top level mdio node, not the port
specific subnode of the mdio device.

dsa,ethernet must be a phandle to the port subnode within the ethernet
DT node, not the ethernet node.

Don't pinctrl hog the card detect gpio for mvsdio.

Rename the .dts files to make it clearer which file is for the Z0
stepping and which for the A0 or later stepping.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Cc: seugene@marvell.com
Tested-by: Eugene Sanivsky <seugene@marvell.com>
Fixes: e2eaa339af: ("ARM: Kirkwood: convert rd88f6281-setup.c to DT.")
Fixes: e7c8f3808b: ("ARM: kirkwood: Convert mv88f6281gtw_ge switch setup to DT")
Link: https://lkml.kernel.org/r/1409592941-22244-1-git-send-email-andrew@lunn.ch
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
4f0d096436 ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
commit cfa1950e6c upstream.

When introducing support for sama5d3, the write to PMC_PCDR register has
been accidentally removed.

Reported-by: Nathalie Cyrille <nathalie.cyrille@atmel.com>
Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
162a454f28 ARM: at91: fix at91sam9263ek DT mmc pinmuxing settings
commit b65e0fb3d0 upstream.

As discovered on a custom board similar to at91sam9263ek and basing
its devicetree on that one apparently the pin muxing doesn't get
set up properly. This was discovered since the custom boards u-boot
does funky stuff with the pin muxing and leaved it set to SPI
which made the MMC driver not work under Linux.
The fix is simply to define the given configuration as the default.
This probably worked by pure luck before, but it's better to
make the muxing explicitly set.

Signed-off-by: Andreas Henriksson <andreas.henriksson@endian.se>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
224e67ebe4 ARM: at91/dt: Fix typo regarding can0_clk
commit 0a51d644c2 upstream.

Otherwise the clock for can0 will never get enabled.

Signed-off-by: David Dueck <davidcdueck@googlemail.com>
Signed-off-by: Anthony Harivel <anthony.harivel@emtrion.de>
Acked-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
22ac17ef36 clk: qcom: Add IPQ8064 PLL required for USB
commit dc1b3f657f upstream.

This patch adds the PLL0 that is required for the USB clocks to
work properly.

Signed-off-by: Andy Gross <agross@codeaurora.org>
Fixes: 24d8fba44a "clk: qcom: Add support for IPQ8064's global clock controller (GCC)"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
27d0241a2f ALSA: hda - Add missing terminating entry to SND_HDA_PIN_QUIRK macro
commit fb54a645b2 upstream.

Without this terminating entry, the pin matching would continue
across random memory until a zero or a non-matching entry was found.

The result being that in some cases, the pin quirk would not be
applied correctly.

Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
ddcf6ee737 ALSA: hda - Fix inverted LED gpio setup for Lenovo Ideapad
commit b1974f965a upstream.

We implemented in a wrong way for mute LED on Lenovo Ideapad; the bit
must be flipped.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16373
Fixes: 3e887f379d ('ALSA: hda - Add mute LED support to Lenovo Ideapad')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
6bd96259b2 ALSA: hda - hdmi: Fix missing ELD change event on plug/unplug
commit 6acce400d9 upstream.

The ELD ALSA control change event is sent by hdmi_present_sense() when
eld_changed is true.

Currently, it is only true when the ELD buffer contents have been
modified. However, the user-visible ELD controls also change to a
zero-length value and back when eld_valid is unset/set, and no event is
currently sent in such cases (such as when unplugging or replugging a
sink).

Fix the code to always set eld_changed if eld_valid value is changed,
and therefore to always send the change event when the user-visible
value changes.

Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Cc: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
e2a707074b ALSA: usb-audio: Add support for Steinberg UR22 USB interface
commit f0b127fbfd upstream.

Adding support for Steinberg UR22 USB interface via quirks table patch

See Ubuntu bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317244
Also see threads:
http://linux-audio.4202.n7.nabble.com/Support-for-Steinberg-UR22-Yamaha-USB-chipset-0499-1509-tc82888.html#a82917
http://www.steinberg.net/forums/viewtopic.php?t=62290

Tested by at least 4 people judging by the threads.
Did not test MIDI interface, but audio output and capture both are
functional. Built 3.17 kernel with this driver on Ubuntu 14.04 & tested with mpg123
Patch applied to 3.13 Ubuntu kernel works well enough for daily use.

Signed-off-by: Vlad Catoi <vladcatoi@gmail.com>
Acked-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:12 -07:00
bcd444c288 ALSA: ALC283 codec - Avoid pop noise on headphones during suspend/resume
commit b450b17c15 upstream.

This patch sets the headphones mode to default before suspending
which helps avoid the pop noise on headphones

Signed-off-by: Harsha Priya <harshapriya.n@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
a1c2070f77 ALSA: emu10k1: Fix deadlock in synth voice lookup
commit 95926035b1 upstream.

The emu10k1 voice allocator takes voice_lock spinlock.  When there is
no empty stream available, it tries to release a voice used by synth,
and calls get_synth_voice.  The callback function,
snd_emu10k1_synth_get_voice(), however, also takes the voice_lock,
thus it deadlocks.

The fix is simply removing the voice_lock holds in
snd_emu10k1_synth_get_voice(), as this is always called in the
spinlock context.

Reported-and-tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
0608517bdd ALSA: bebob: Fix failure to detect source of clock for Terratec Phase 88
commit 3f4032861c upstream.

This patch fixes a failure to open PCM device with -ENOSYS in
Terratec Phase 88.

Terratec Phase 88 has two Selector Function Blocks of AVC Audio subunit
to switch source of clock. One is to switch internal/external for the
source and another is to switch word/spdif for the external clock.

The IDs for these Selector Function Blocks are 9 and 8. But in current
implementation they're 0 and 0.

Reported-by: András Murányi <muranyia@gmail.com>
Tested-by: András Murányi <muranyia@gmail.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
ae1b5d01ad ALSA: pcm: use the same dma mmap codepath both for arm and arm64
commit a011e213f3 upstream.

This avoids following kernel crash when try to playback on arm64

[  107.497203] [<ffffffc00046b310>] snd_pcm_mmap_data_fault+0x90/0xd4
[  107.503405] [<ffffffc0001541ac>] __do_fault+0xb0/0x498
[  107.508565] [<ffffffc0001576a0>] handle_mm_fault+0x224/0x7b0
[  107.514246] [<ffffffc000092640>] do_page_fault+0x11c/0x310
[  107.519738] [<ffffffc000081100>] do_mem_abort+0x38/0x98

Tested: backported to 3.14 and tried to playback on arm64 machine

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
b44bc03221 arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
commit 971a5b6fe6 upstream.

The compat_elf_prpsinfo structure does not match the arch/arm struct
elf_pspsinfo definition. As result NT_PRPSINFO note in core file
created by arm64 kernel for aarch32 (compat) process has wrong size.
So gdb cannot display command that caused process crash.

Fix is to change size of __compat_uid_t, __compat_gid_t so it would
match size of similar fields in arch/arm case.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
d5985844f4 arm64: Fix compilation error on UP builds
commit ceab3fe694 upstream.

In file included from ./arch/arm64/include/asm/irq_work.h:4:0,
        from include/linux/irq_work.h:46,
        from include/linux/perf_event.h:49,
        from include/linux/ftrace_event.h:9,
        from include/trace/syscall.h:6,
        from include/linux/syscalls.h:81,
        from init/main.c:18:
./arch/arm64/include/asm/smp.h:24:3:
        error: #error "<asm/smp.h> included in non-SMP build"
 # error "<asm/smp.h> included in non-SMP build"

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 3631073659 ("arm64: Tell irq work about self IPI support")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
bc60bfded8 spi: dw-mid: terminate ongoing transfers at exit
commit 8e45ef682c upstream.

Do full clean up at exit, means terminate all ongoing DMA transfers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:11 -07:00
44f8ea82ca iwlwifi: Add missing PCI IDs for the 7260 series
commit 4f08970f52 upstream.

Add 4 missing PCI IDs for the 7260 series.

Signed-off-by: Oren Givon <oren.givon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
02ee80194f iwlwifi: mvm: disable BT Co-running by default
commit 9b60bb6d86 upstream.

The tables still contain dummy values.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
f63a0b1585 NFSv4.1/pnfs: replace broken pnfs_put_lseg_async
commit 6543f80367 upstream.

You cannot call pnfs_put_lseg_async() more than once per lseg, so it
is really an inappropriate way to deal with a refcount issue.

Instead, replace it with a function that decrements the refcount, and
puts the final 'free' operation (which is incompatible with locks) on
the workqueue.

Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: e6cf82d183: pnfs: add pnfs_put_lseg_async
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
8897bf1c7f NFS: Fix a bogus warning in nfs_generic_pgio
commit b8fb9c30f2 upstream.

It is OK for pageused == pagecount in the loop, as long as we don't add
another entry to the *pages array. Move the test so that it only triggers
in that case.

Reported-by: Steve Dickson <SteveD@redhat.com>
Fixes: bba5c1887a (nfs: disallow duplicate pages in pgio page vectors)
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
56319a855a NFS: Fix an uninitialised pointer Oops in the writeback error path
commit 3caa0c6ed7 upstream.

SteveD reports the following Oops:
 RIP: 0010:[<ffffffffa053461d>]  [<ffffffffa053461d>] __put_nfs_open_context+0x1d/0x100 [nfs]
 RSP: 0018:ffff880fed687b90  EFLAGS: 00010286
 RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000006
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffff880fed687bc0 R08: 0000000000000092 R09: 000000000000047a
 R10: 0000000000000000 R11: ffff880fed6878d6 R12: ffff880fed687d20
 R13: ffff880fed687d20 R14: 0000000000000070 R15: ffffea000aa33ec0
 FS:  00007fce290f0740(0000) GS:ffff8807ffc60000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000070 CR3: 00000007f2e79000 CR4: 00000000000007e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  0000000000000000 ffff880036c5e510 ffff880fed687d20 ffff880fed687d20
  ffff880036c5e200 ffffea000aa33ec0 ffff880fed687bd0 ffffffffa0534710
  ffff880fed687be8 ffffffffa053d5f0 ffff880036c5e200 ffff880fed687c08
 Call Trace:
  [<ffffffffa0534710>] put_nfs_open_context+0x10/0x20 [nfs]
  [<ffffffffa053d5f0>] nfs_pgio_data_destroy+0x20/0x40 [nfs]
  [<ffffffffa053d672>] nfs_pgio_error+0x22/0x40 [nfs]
  [<ffffffffa053d8f4>] nfs_generic_pgio+0x74/0x2e0 [nfs]
  [<ffffffffa06b18c3>] pnfs_generic_pg_writepages+0x63/0x210 [nfsv4]
  [<ffffffffa053d579>] nfs_pageio_doio+0x19/0x50 [nfs]
  [<ffffffffa053eb84>] nfs_pageio_complete+0x24/0x30 [nfs]
  [<ffffffffa053cb25>] nfs_direct_write_schedule_iovec+0x115/0x1f0 [nfs]
  [<ffffffffa053675f>] ? nfs_get_lock_context+0x4f/0x120 [nfs]
  [<ffffffffa053d252>] nfs_file_direct_write+0x262/0x420 [nfs]
  [<ffffffffa0532d91>] nfs_file_write+0x131/0x1d0 [nfs]
  [<ffffffffa0532c60>] ? nfs_need_sync_write.isra.17+0x40/0x40 [nfs]
  [<ffffffff812127b8>] do_io_submit+0x3b8/0x840
  [<ffffffff81212c50>] SyS_io_submit+0x10/0x20
  [<ffffffff81610f29>] system_call_fastpath+0x16/0x1b

This is due to the calls to nfs_pgio_error() in nfs_generic_pgio(), which
happen before the nfs_pgio_header's open context is referenced in
nfs_pgio_rpcsetup().

Reported-by: Steve Dickson <SteveD@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
8a3072e208 nfsd4: reserve adequate space for LOCK op
commit f7b43d0c99 upstream.

As of  8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space", we permit the server to process a LOCK operation even if
there might not be space to return the conflicting lockowner, because
we've made returning the conflicting lockowner optional.

However, the rpc server still wants to know the most we might possibly
return, so we need to take into account the possible conflicting
lockowner in the svc_reserve_space() call here.

Symptoms were log messages like "RPC request reserved 88 but used 108".

Fixes: 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low on space"
Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
9873d3330d NFSv4.1: Fix an NFSv4.1 state renewal regression
commit d1f456b0b9 upstream.

Commit 2f60ea6b8c ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.

The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:

In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.

Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.

An example application:
open, lock, write a file.

sleep for 6 * lease (could be less)

ulock, close.

In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.

This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.

Signed-off-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-git-send-email-andros@netapp.com
Fixes: 2f60ea6b8c (NFSv4: The NFSv4.0 client must send RENEW calls...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
c11415eedf NFSv4: fix open/lock state recovery error handling
commit df817ba357 upstream.

The current open/lock state recovery unfortunately does not handle errors
such as NFS4ERR_CONN_NOT_BOUND_TO_SESSION correctly. Instead of looping,
just proceeds as if the state manager is finished recovering.
This patch ensures that we loop back, handle higher priority errors
and complete the open/lock state recovery.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
7810cb1949 NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
commit a4339b7b68 upstream.

If a NFSv4.x server returns NFS4ERR_STALE_CLIENTID in response to a
CREATE_SESSION or SETCLIENTID_CONFIRM in order to tell us that it rebooted
a second time, then the client will currently take this to mean that it must
declare all locks to be stale, and hence ineligible for reboot recovery.

RFC3530 and RFC5661 both suggest that the client should instead rely on the
server to respond to inelegible open share, lock and delegation reclaim
requests with NFS4ERR_NO_GRACE in this situation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
40456158e3 nfs: fix duplicate proc entries
commit 2f3169fb18 upstream.

Commit 65b38851a1
("NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes")

updated the following function:
static int nfs_volume_list_open(struct inode *inode, struct file *file)

it used &nfs_server_list_ops instead of &nfs_volume_list_ops
which means cat /proc/fs/nfsfs/volumes = /proc/fs/nfsfs/servers

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Fixes: 65b38851a1 (NFS: Fix /proc/fs/nfsfs/servers and...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:10 -07:00
3a2a7dcfd7 tty: omap-serial: fix division by zero
commit dc3187564e upstream.

If the chosen baud rate is large enough (e.g. 3.5 megabaud), the
calculated n values in serial_omap_is_baud_mode16() may become 0. This
causes a division by zero when calculating the difference between
calculated and desired baud rates. To prevent this, cap the n13 and n16
values on 1.

Division by zero in kernel.
[<c00132e0>] (unwind_backtrace) from [<c00112ec>] (show_stack+0x10/0x14)
[<c00112ec>] (show_stack) from [<c01ed7bc>] (Ldiv0+0x8/0x10)
[<c01ed7bc>] (Ldiv0) from [<c023805c>] (serial_omap_baud_is_mode16+0x4c/0x68)
[<c023805c>] (serial_omap_baud_is_mode16) from [<c02396b4>] (serial_omap_set_termios+0x90/0x8d8)
[<c02396b4>] (serial_omap_set_termios) from [<c0230a0c>] (uart_change_speed+0xa4/0xa8)
[<c0230a0c>] (uart_change_speed) from [<c0231798>] (uart_set_termios+0xa0/0x1fc)
[<c0231798>] (uart_set_termios) from [<c022bb44>] (tty_set_termios+0x248/0x2c0)
[<c022bb44>] (tty_set_termios) from [<c022c17c>] (set_termios+0x248/0x29c)
[<c022c17c>] (set_termios) from [<c022c3e4>] (tty_mode_ioctl+0x1c8/0x4e8)
[<c022c3e4>] (tty_mode_ioctl) from [<c0227e70>] (tty_ioctl+0xa94/0xb18)
[<c0227e70>] (tty_ioctl) from [<c00cf45c>] (do_vfs_ioctl+0x4a0/0x560)
[<c00cf45c>] (do_vfs_ioctl) from [<c00cf568>] (SyS_ioctl+0x4c/0x74)
[<c00cf568>] (SyS_ioctl) from [<c000e480>] (ret_fast_syscall+0x0/0x30)

Signed-off-by: Frans Klaver <frans.klaver@xsens.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
d6398ac2e6 lzo: check for length overrun in variable length encoding.
commit 72cf90124e upstream.

This fix ensures that we never meet an integer overflow while adding
255 while parsing a variable length encoding. It works differently from
commit 206a81c ("lzo: properly check for overruns") because instead of
ensuring that we don't overrun the input, which is tricky to guarantee
due to many assumptions in the code, it simply checks that the cumulated
number of 255 read cannot overflow by bounding this number.

The MAX_255_COUNT is the maximum number of times we can add 255 to a base
count without overflowing an integer. The multiply will overflow when
multiplying 255 by more than MAXINT/255. The sum will overflow earlier
depending on the base count. Since the base count is taken from a u8
and a few bits, it is safe to assume that it will always be lower than
or equal to 2*255, thus we can always prevent any overflow by accepting
two less 255 steps.

This patch also reduces the CPU overhead and actually increases performance
by 1.1% compared to the initial code, while the previous fix costs 3.1%
(measured on x86_64).

The fix needs to be backported to all currently supported stable kernels.

Reported-by: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
1fd3277936 Revert "lzo: properly check for overruns"
commit af958a38a6 upstream.

This reverts commit 206a81c ("lzo: properly check for overruns").

As analysed by Willem Pinckaers, this fix is still incomplete on
certain rare corner cases, and it is easier to restart from the
original code.

Reported-by: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
200206470d Documentation: lzo: document part of the encoding
commit d98a052643 upstream.

Add a complete description of the LZO format as processed by the
decompressor. I have not found a public specification of this format
hence this analysis, which will be used to better understand the code.

Cc: Willem Pinckaers <willem@lekkertech.net>
Cc: "Don A. Bailey" <donb@securitymouse.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
f19982ef5d Fixing lease renewal
commit 8faaa6d5d4 upstream.

Commit c9fdeb28 removed a 'continue' after checking if the lease needs
to be renewed. However, if client hasn't moved, the code falls down to
starting reboot recovery erroneously (ie., sends open reclaim and gets
back stale_clientid error) before recovering from getting stale_clientid
on the renew operation.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: c9fdeb280b (NFS: Add basic migration support to state manager thread)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
14f90acbe8 m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
commit e4dc601bf9 upstream.

hwreg_present() and hwreg_write() temporarily change the VBR register to
another vector table. This table contains a valid bus error handler
only, all other entries point to arbitrary addresses.

If an interrupt comes in while the temporary table is active, the
processor will start executing at such an arbitrary address, and the
kernel will crash.

While most callers run early, before interrupts are enabled, or
explicitly disable interrupts, Finn Thain pointed out that macsonic has
one callsite that doesn't, causing intermittent boot crashes.
There's another unsafe callsite in hilkbd.

Fix this for good by disabling and restoring interrupts inside
hwreg_present() and hwreg_write().

Explicitly disabling interrupts can be removed from the callsites later.

Reported-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
2ef3e6b09b mei: bus: fix possible boundaries violation
commit cfda2794b5 upstream.

function 'strncpy' will fill whole buffer 'id.name' of fixed size (32)
with string value and will not leave place for NULL-terminator.
Possible buffer boundaries violation in following string operations.
Replace strncpy with strlcpy.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
0ab92d294a Drivers: hv: vmbus: Cleanup hv_post_message()
commit b29ef3546a upstream.

Minimize failures in this function by pre-allocating the buffer
for posting messages. The hypercall for posting the message can fail
for a number of reasons:

        1. Transient resource related issues
        2. Buffer alignment
        3. Buffer cannot span a page boundry

We address issues 2 and 3 by preallocating a per-cpu page for the buffer.
Transient resource related failures are handled by retrying by the callers
of this function.

This patch is based on the investigation
done by Dexuan Cui <decui@microsoft.com>.

I would like to thank Sitsofe Wheeler <sitsofe@yahoo.com>
for reporting the issue and helping in debuggging.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
51d2221f27 Drivers: hv: vmbus: Fix a bug in vmbus_open()
commit 45d727cee9 upstream.

Fix a bug in vmbus_open() and properly propagate the error. I would
like to thank Dexuan Cui <decui@microsoft.com> for identifying the
issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:09 -07:00
96ef7d4bc5 Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
commit 72c6b71c24 upstream.

Eliminate the call to BUG_ON() by waiting for the host to respond. We are
trying to reclaim the ownership of memory that was given to the host and so
we will have to wait until the host responds.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
73ac8e5ab3 Drivers: hv: vmbus: Cleanup vmbus_close_internal()
commit 98d731bb06 upstream.

Eliminate calls to BUG_ON() in vmbus_close_internal().
We have chosen to potentially leak memory, than crash the guest
in case of failures.

In this version of the patch I have addressed comments from
Dan Carpenter (dan.carpenter@oracle.com).

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
552cb2dc6c Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
commit 66be653083 upstream.

Eliminate calls to BUG_ON() by properly handling errors. In cases where
rollback is possible, we will return the appropriate error to have the
calling code decide how to rollback state. In the case where we are
transferring ownership of the guest physical pages to the host,
we will wait for the host to respond.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
1b51cf3ee3 Drivers: hv: vmbus: Cleanup vmbus_post_msg()
commit fdeebcc622 upstream.

Posting messages to the host can fail because of transient resource
related failures. Correctly deal with these failures and increase the
number of attempts to post the message before giving up.

In this version of the patch, I have normalized the error code to
Linux error code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
6d9ac4d681 Drivers: hv: util: Properly pack the data for file copy functionality
commit bc5a5b0233 upstream.

Properly pack the data for file copy functionality. Patch based on
investigation done by Matej Muzila <mmuzila@redhat.com>

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: <qge@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
a29498d454 arm64: debug: don't re-enable debug exceptions on return from el1_dbg
commit 1059c6bf85 upstream.

When returning from a debug exception taken from EL1, we unmask debug
exceptions after handling the exception. This is crucial for debug
exceptions taken from EL0, so that any kernel work on the ret_to_user
path can be debugged by kgdb.

However, when returning back to EL1 the only thing left to do is to
restore the original register state before the exception return. If
single-step has been enabled by the debug exception handler, we will
get stuck in an infinite debug exception loop, since we will take the
step exception as soon as we unmask debug exceptions.

This patch avoids unmasking debug exceptions on the debug exception
return path when the exception was taken from EL1.

Fixes: 2a2830703a (arm64: debug: avoid accessing mdscr_el1 on fault paths where possible)
Reported-by: David Long <dave.long@linaro.org>
Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
64ab2bddeb firmware_class: make sure fw requests contain a name
commit 471b095dfe upstream.

An empty firmware request name will trigger warnings when building
device names. Make sure this is caught earlier and rejected.

The warning was visible via the test_firmware.ko module interface:

echo -ne "\x00" > /sys/devices/virtual/misc/test_firmware/trigger_request

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
e562f858da dmaengine: pl330: Fix NULL pointer dereference on driver unbind
commit 6e4a2a83f9 upstream.

Fix a NULL pointer dereference after unbinding the driver, if channel
resources were not yet allocated (no call to
pl330_alloc_chan_resources()):
$ echo 12850000.mdma > /sys/bus/amba/drivers/dma-pl330/unbind
[   13.606533] DMA pl330_control: removing pch: eeab6800, chan: eeab6814, thread:   (null)
[   13.614472] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[   13.622537] pgd = ee284000
[   13.625228] [0000000c] *pgd=6e1e4831, *pte=00000000, *ppte=00000000
[   13.631482] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[   13.636859] Modules linked in:
[   13.639903] CPU: 0 PID: 1 Comm: sh Not tainted 3.17.0-rc3-next-20140904-00004-g7020ffc33ca3-dirty #420
[   13.649187] task: ee80a800 ti: ee888000 task.ti: ee888000
[   13.654589] PC is at _stop+0x8/0x2c8
[   13.658131] LR is at pl330_control+0x70/0x2e8
[   13.662468] pc : [<c0206028>]    lr : [<c020649c>]    psr: 60000093
[   13.662468] sp : ee889e58  ip : 00000001  fp : 000bab70
[   13.673922] r10: eeab6814  r9 : ee16debc  r8 : 00000000
[   13.679131] r7 : eeab685c  r6 : 60000013  r5 : ee16de10  r4 : eeab6800
[   13.685641] r3 : 00000002  r2 : 00000000  r1 : 00010000  r0 : 00000000
[   13.692153] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   13.699357] Control: 10c5387d  Table: 6e28404a  DAC: 00000015
[   13.705085] Process sh (pid: 1, stack limit = 0xee888240)
[   13.710466] Stack: (0xee889e58 to 0xee88a000)
[   13.714808] 9e40:                                                       00000002 eeab6800
[   13.722969] 9e60: ee16de10 eeab6800 ee16de10 60000013 eeab685c c020649c 00000000 c040280c
[   13.731128] 9e80: ee889e80 ee889e80 ee16de18 ee16de10 eeab6880 eeab6814 00200200 eeab68a8
[   13.739287] 9ea0: 00100100 c0208048 00000000 c0409fc4 eea80800 eea808f8 c0605c44 0000000e
[   13.747446] 9ec0: 0000000e eeb3960c eeb39600 c0203c48 eea80800 c0605c44 c0605a8c c023f694
[   13.755605] 9ee0: ee80a800 eea80834 eea80800 c023f704 ee80a800 eea80800 c0605c44 c023e8ec
[   13.763764] 9f00: 0000000e ee149780 ee29e580 ee889f80 ee29e580 c023e19c 0000000e c01167e4
[   13.771923] 9f20: c01167a0 00000000 00000000 c0115e88 00000000 00000000 ee0b1a00 0000000e
[   13.780082] 9f40: b6f48000 ee889f80 0000000e ee888000 b6f48000 c00bfadc 00000000 00000003
[   13.788241] 9f60: 00000000 00000000 00000000 ee0b1a00 ee0b1a00 0000000e b6f48000 c00bfdf4
[   13.796401] 9f80: 00000000 00000000 ffffffff 0000000e b6f48000 b6edc5d0 00000004 c000e7a4
[   13.804560] 9fa0: 00000000 c000e620 0000000e b6f48000 00000001 b6f48000 0000000e 00000000
[   13.812719] 9fc0: 0000000e b6f48000 b6edc5d0 00000004 0000000e b6f4c8c0 000c3470 000bab70
[   13.820879] 9fe0: 00000000 bed2aa50 b6e18bdc b6e6b52c 60000010 00000001 c0c0c0c0 c0c0c0c0
[   13.829058] [<c0206028>] (_stop) from [<c020649c>] (pl330_control+0x70/0x2e8)
[   13.836165] [<c020649c>] (pl330_control) from [<c0208048>] (pl330_remove+0xb0/0xdc)
[   13.843800] [<c0208048>] (pl330_remove) from [<c0203c48>] (amba_remove+0x24/0xc0)
[   13.851272] [<c0203c48>] (amba_remove) from [<c023f694>] (__device_release_driver+0x70/0xc4)
[   13.859685] [<c023f694>] (__device_release_driver) from [<c023f704>] (device_release_driver+0x1c/0x28)
[   13.868971] [<c023f704>] (device_release_driver) from [<c023e8ec>] (unbind_store+0x58/0x90)
[   13.877303] [<c023e8ec>] (unbind_store) from [<c023e19c>] (drv_attr_store+0x20/0x2c)
[   13.885036] [<c023e19c>] (drv_attr_store) from [<c01167e4>] (sysfs_kf_write+0x44/0x48)
[   13.892928] [<c01167e4>] (sysfs_kf_write) from [<c0115e88>] (kernfs_fop_write+0xc0/0x17c)
[   13.901090] [<c0115e88>] (kernfs_fop_write) from [<c00bfadc>] (vfs_write+0xa0/0x1a8)
[   13.908812] [<c00bfadc>] (vfs_write) from [<c00bfdf4>] (SyS_write+0x40/0x8c)
[   13.915850] [<c00bfdf4>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30)
[   13.923392] Code: e5813010 e12fff1e e92d40f0 e24dd00c (e590200c)
[   13.929467] ---[ end trace 10064e15a5929cf8 ]---

Terminate the thread and free channel resource only if channel resources
were allocated (thread is not NULL).

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: b3040e4067 ("DMA: PL330: Add dma api driver")
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
98a58b899a dmaengine: pl330: Fix NULL pointer dereference on probe failure
commit 0f5ebabdd0 upstream.

If dma_async_device_register() returns error and probe should clean up
and return error, a NULL pointer exception happens because of
dereference of not allocated channel thread:

Dmesg log (from early printk):
dma-pl330 12680000.pdma: unable to register DMAC
DMA pl330_control: removing pch: eeac4000, chan: eeac4014, thread:   (null)
Unable to handle kernel NULL pointer dereference at virtual address 0000000c
pgd = c0004000
[0000000c] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc3-next-20140904-00005-g6cc4c1937d90-dirty #427
task: ee80a800 ti: ee888000 task.ti: ee888000
PC is at _stop+0x8/0x2c8
LR is at pl330_control+0x70/0x2e8
pc : [<c0205dc8>]    lr : [<c020623c>]    psr: 60000193
sp : ee889df8  ip : 00000002  fp : 00000000
r10: eeac4014  r9 : ee0e62bc  r8 : 00000000
r7 : eeac405c  r6 : 60000113  r5 : ee0e6210  r4 : eeac4000
r3 : 00000002  r2 : 00000002  r1 : 00010000  r0 : 00000000
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c5387d  Table: 4000404a  DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0xee888240)
Stack: (0xee889df8 to 0xee88a000)
9de0:                                                       00000002 eeac4000
9e00: ee0e6210 eeac4000 ee0e6210 60000113 eeac405c c020623c 00000000 c020725c
9e20: ee889e20 ee889e20 ee0e6210 eeac4080 00200200 00100100 eeac4014 00000020
9e40: ee0e6218 c0208374 00000000 ee9bb340 ee0e6210 00000000 00000000 c0605cd8
9e60: ee970000 c0605c84 ee9700f8 00000000 c05c4270 00000000 00000000 c0203b3c
9e80: ee970000 c06624a8 00000000 c0605c84 00000000 c023f890 ee970000 c0605c84
9ea0: ee970034 00000000 c05b23d0 c023fa3c 00000000 c0605c84 c023f9b0 c023e0d4
9ec0: ee947e78 ee9b9440 c0605c84 eea1e780 c0605acc c023f094 c0513b50 c0605c84
9ee0: c05ecbd8 c0605c84 c05ecbd8 ee11ba40 c0626500 c0240064 00000000 c05ecbd8
9f00: c05ecbd8 c0008964 c040f13c 0000009f c0626500 c057465c ee80a800 60000113
9f20: 00000000 c05efdb0 60000113 00000000 ef7fc89d c0421168 0000008f c003787c
9f40: c0573d6c 00000006 ef7fc8bb 00000006 c05efd50 ef7fc800 c05dfbc4 00000006
9f60: c05c4264 c0626500 0000008f c05c4270 c059b518 c059bcb4 00000006 00000006
9f80: c059b518 c003c08c 00000000 c040091c 00000000 00000000 00000000 00000000
9fa0: 00000000 c0400924 00000000 c000e7b8 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 c0c0c0c0 c0c0c0c0
[<c0205dc8>] (_stop) from [<c020623c>] (pl330_control+0x70/0x2e8)
[<c020623c>] (pl330_control) from [<c0208374>] (pl330_probe+0x594/0x75c)
[<c0208374>] (pl330_probe) from [<c0203b3c>] (amba_probe+0xb8/0x120)
[<c0203b3c>] (amba_probe) from [<c023f890>] (driver_probe_device+0x10c/0x22c)
[<c023f890>] (driver_probe_device) from [<c023fa3c>] (__driver_attach+0x8c/0x90)
[<c023fa3c>] (__driver_attach) from [<c023e0d4>] (bus_for_each_dev+0x54/0x88)
[<c023e0d4>] (bus_for_each_dev) from [<c023f094>] (bus_add_driver+0xd4/0x1d0)
[<c023f094>] (bus_add_driver) from [<c0240064>] (driver_register+0x78/0xf4)
[<c0240064>] (driver_register) from [<c0008964>] (do_one_initcall+0x80/0x1d0)
[<c0008964>] (do_one_initcall) from [<c059bcb4>] (kernel_init_freeable+0x108/0x1d4)
[<c059bcb4>] (kernel_init_freeable) from [<c0400924>] (kernel_init+0x8/0xec)
[<c0400924>] (kernel_init) from [<c000e7b8>] (ret_from_fork+0x14/0x3c)
Code: e5813010 e12fff1e e92d40f0 e24dd00c (e590200c)
---[ end trace c94b2f4f38dff3bf ]---

This happens because the necessary resources were not yet allocated - no
call to pl330_alloc_chan_resources().

Terminate the thread and free channel resource only if channel thread is not NULL.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Fixes: 0b94c57717 ("DMA: PL330: Add check if device tree compatible")
Reviewed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
935e054a61 dmaengine: fix xor sources continuation
commit 87cea76384 upstream.

the partial xor result must be kept until the next
tx is generated.

Signed-off-by: Xuelin Shi <xuelin.shi@freescale.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
103997a332 qla2xxx: Fix shost use-after-free on device removal
commit db7157d4cf upstream.

Once calling scsi_host_put, be careful to not access qla_hw_data through
the Scsi_Host private data (ie, scsi_qla_host base_vha).

Fixes: fe1b806f4f ("qla2xxx: Refactor shutdown code so some functionality can be reused")
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
4778734aaa qla2xxx: Use correct offset to req-q-out for reserve calculation
commit 75554b68ac upstream.

Signed-off-by: Arun Easi <arun.easi@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:08 -07:00
d20854ce0d qla2xxx: fix kernel NULL pointer access
commit 78c2106a50 upstream.

This patch is to fix regression added by commit id
51a07f8464.

When allocating memory for new session original patch does
not assign vha to op->vha resulting into NULL pointer
access during qlt_create_sess_from_atio().

Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
4ad6504a7d regulator: ltc3589: fix broken voltage transitions
commit c5bb725ac2 upstream.

VCCR is used as a trigger to start voltage transitions, so
we need to mark it volatile in order to make sure it gets
written to hardware every time we set a new voltage.

Fixes regulator voltage being stuck at the first voltage
set after driver load.

[lst: reworded commit message]
Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
f5fba53e0a mptfusion: enable no_write_same for vmware scsi disks
commit 4089b71cc8 upstream.

When using a virtual SCSI disk in a VMWare VM if blkdev_issue_zeroout is used
data can be improperly zeroed out using the mptfusion driver. This patch
disables write_same for this driver and the vmware subsystem_vendor which
ensures that manual zeroing out is used instead.

BugLink: http://bugs.launchpad.net/bugs/1371591
Reported-by: Bruce Lucas <bruce.lucas@mongodb.com>
Tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
17b48b83ce be2iscsi: check ip buffer before copying
commit a41a9ad3bb upstream.

Dan Carpenter found a issue where be2iscsi would copy the ip
from userspace to the driver buffer before checking the len
of the data being copied:
http://marc.info/?l=linux-scsi&m=140982651504251&w=2

This patch just has us only copy what we the driver buffer
can support.

Tested-by: John Soni Jose <sony.john-n@emulex.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
cd487ab059 regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.
commit d6b41cb060 upstream.

Since we cannot make sure the 'val_count' will always be none zero
here, and then if it equals to zero, the kmemdup() will return
ZERO_SIZE_PTR, which equals to ((void *)16).

So this patch fix this with just doing the zero check before calling
kmemdup().

Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
a574c3b4ef regmap: fix NULL pointer dereference in _regmap_write/read
commit 5336be8416 upstream.

If LOG_DEVICE is defined and map->dev is NULL it will lead to NULL
pointer dereference. This patch fixes this issue by adding check for
dev->NULL in all such places in regmap.c

Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
6a911fb78f regmap: debugfs: fix possbile NULL pointer dereference
commit 2c98e0c1cc upstream.

If 'map->dev' is NULL and there will lead dev_name() to be NULL pointer
dereference. So before dev_name(), we need to have check of the map->dev
pionter.

We also should make sure that the 'name' pointer shouldn't be NULL for
debugfs_create_dir(). So here using one default "dummy" debugfs name when
the 'name' pointer and 'map->dev' are both NULL.

Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
358cc1e19e mpc85xx_edac: Make L2 interrupt shared too
commit a18c3f16a9 upstream.

The other two interrupt handlers in this driver are shared, except this
one. When loading the driver, it fails like this.

So make the IRQ line shared.

Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
mpc85xx_mc_err_probe: No ECC DIMMs discovered
EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT)
genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err)
mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err
remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:521
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1
task: ee058000 ti: ee046000 task.ti: ee046000
NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c
REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty)
MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000

GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003
GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394
GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660
NIP [c016c0c4] remove_proc_entry
LR [c016c0c4] remove_proc_entry
Call Trace:
remove_proc_entry (unreliable)
unregister_irq_proc
free_desc
irq_free_descs
mpc85xx_l2_err_probe
platform_drv_probe
really_probe
__driver_attach
bus_for_each_dev
bus_add_driver
driver_register
mpc85xx_mc_init
do_one_initcall
kernel_init_freeable
kernel_init
ret_from_kernel_thread
Instruction dump: ...

Reported-and-tested-by: <lpb_098@163.com>
Acked-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
ce3eba07f5 HID: rmi: check sanity of the incoming report
commit 5b65c2a029 upstream.

In the Dell XPS 13 9333, it appears that sometimes the bus get confused
and corrupts the incoming data. It fills the input report with the
sentinel value "ff". Synaptics told us that such behavior does not comes
from the touchpad itself, so we filter out such reports here.

Unfortunately, we can not simply discard the incoming data because they
may contain useful information. Most of the time, the misbehavior is
quite near the end of the report, so we can still use the valid part of
it.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1123584

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
42b0daa78e HID: wacom: fix timeout on probe for some wacoms
commit 8ffffd5212 upstream.

Some Wacom tablets (at least the ISDv4 found in the Lenovo X230) timeout
during probe while retrieving the input reports.
The only time this information is valuable is during the feature_mapping
stage, so we can ask for it there and discard the generic input reports
retrieval.

This gives a code path closer to the wacom.ko driver when it was in the
input subtree (not HID).

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
dfdbe2eb14 HID: wacom - remove report_id from wacom_get_report interface
commit c64d883476 upstream.

It is assigned in buf[0] anyway.

Signed-off-by: Ping Cheng <pingc@wacom.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:07 -07:00
79f91ed4fd spi: dw-mid: check that DMA was inited before exit
commit fb57862ead upstream.

If the driver was compiled with DMA support, but DMA channels weren't acquired
by some reason, mid_spi_dma_exit() will crash the kernel.

Fixes: 7063c0d942 (spi/dw_spi: add DMA support)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
00c803f502 spi/rockchip: fix bug that cause the failure to read data in DMA mode
commit a24e70c0ac upstream.

In my test on RK3288-pinky board, if spi is enabled, it will begin to
read data from slave regardless of whether the DMA is ready. So we
need prepare DMA before spi is enable.

Signed-off-by: Addy Ke <addy.ke@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
d772230128 spi: dw-mid: respect 8 bit mode
commit b41583e729 upstream.

In case of 8 bit mode and DMA usage we end up with every second byte written as
0. We have to respect bits_per_word settings what this patch actually does.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
51f4610581 x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
commit ee1b5b165c upstream.

Quark x1000 advertises PGE via the standard CPUID method
PGE bits exist in Quark X1000's PTEs. In order to flush
an individual PTE it is necessary to reload CR3 irrespective
of the PTE.PGE bit.

See Quark Core_DevMan_001.pdf section 6.4.11

This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
crash and burn on this platform.

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
0af1b6719e x86,kvm,vmx: Preserve CR4 across VM entry
commit d974baa398 upstream.

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry.  Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
2b40b3639f kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
commit 2ea75be321 upstream.

vcpu ioctls can hang the calling thread if issued while a vcpu is running.
However, invalid ioctls can happen when userspace tries to probe the kind
of file descriptors (e.g. isatty() calls ioctl(TCGETS)); in that case,
we know the ioctl is going to be rejected as invalid anyway and we can
fail before trying to take the vcpu mutex.

This patch does not change functionality, it just makes invalid ioctls
fail faster.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
30d863aa7a KVM: s390: unintended fallthrough for external call
commit f346026e55 upstream.

We must not fallthrough if the conditions for external call are not met.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
56f90a37fc KVM: do not bias the generation number in kvm_current_mmio_generation
commit 00f034a12f upstream.

The next patch will give a meaning (a la seqcount) to the low bit of the
generation number.  Ensure that it matches between kvm->memslots->generation
and kvm_current_mmio_generation().

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
0e31fc03c3 kvm: fix potentially corrupt mmio cache
commit ee3d1570b5 upstream.

vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
39ebec5778 kvm: x86: fix stale mmio cache bug
commit 56f17dd3fb upstream.

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

Signed-off-by: David Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
ce74113f40 pci_ids: Add support for Intel Quark ILB
commit bb048713bb upstream.

This patch adds the PCI id for Intel Quark ILB.
It will be used for GPIO and Multifunction device driver.

Signed-off-by: Josef Ahmad <josef.ahmad@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:06 -07:00
28f64f916f fs: Add a missing permission check to do_umount
commit a1480dcc3c upstream.

Accessing do_remount_sb should require global CAP_SYS_ADMIN, but
only one of the two call sites was appropriately protected.

Fixes CVE-2014-7975.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
0bd7729011 Revert "Btrfs: race free update of commit root for ro snapshots"
commit d37973082b upstream.

This reverts commit 9c3b306e1c.

Switching only one commit root during a transaction is wrong because it
leads the fs into an inconsistent state. All commit roots should be
switched at once, at transaction commit time, otherwise backref walking
can often miss important references that were only accessible through
the old commit root.  Plus, the root item for the snapshot's root wasn't
getting updated and preventing the next transaction commit to do it.

This made several users get into random corruption issues after creation
of readonly snapshots.

A regression test for xfstests will follow soon.

Cc: stable@vger.kernel.org # 3.17
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
a321813efe Btrfs: fix race in WAIT_SYNC ioctl
commit 42383020be upstream.

We check whether transid is already committed via last_trans_committed and
then search through trans_list for pending transactions.  If
last_trans_committed is updated by btrfs_commit_transaction after we check
it (there is no locking), we will fail to find the committed transaction
and return EINVAL to the caller.  This has been observed occasionally by
ceph-osd (which uses this ioctl heavily).

Fix by rechecking whether the provided transid <= last_trans_committed
after the search fails, and if so return 0.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
d20a3a26e5 btrfs: Fix the wrong condition judgment about subset extent map
commit 32be3a1ac6 upstream.

Previous commit: btrfs: Fix and enhance merge_extent_mapping() to insert
best fitted extent map
is using wrong condition to judgement whether the range is a subset of a
existing extent map.

This may cause bug in btrfs no-holes mode.

This patch will correct the judgment and fix the bug.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
985ee9f487 Btrfs: fix build_backref_tree issue with multiple shared blocks
commit bbe9051441 upstream.

Marc Merlin sent me a broken fs image months ago where it would blow up in the
upper->checked BUG_ON() in build_backref_tree.  This is because we had a
scenario like this

block a -- level 4 (not shared)
   |
block b -- level 3 (reloc block, shared)
   |
block c -- level 2 (not shared)
   |
block d -- level 1 (shared)
   |
block e -- level 0 (shared)

We go to build a backref tree for block e, we notice block d is shared and add
it to the list of blocks to lookup it's backrefs for.  Now when we loop around
we will check edges for the block, so we will see we looked up block c last
time.  So we lookup block d and then see that the block that points to it is
block c and we can just skip that edge since we've already been up this path.
The problem is because we clear need_check when we see block d (as it is shared)
we never add block b as needing to be checked.  And because block c is in our
path already we bail out before we walk up to block b and add it to the backref
check list.

To fix this we need to reset need_check if we trip over a block that doesn't
need to be checked.  This will make sure that any subsequent blocks in the path
as we're walking up afterwards are added to the list to be processed.  With this
patch I can now mount Marc's fs image and it'll complete the balance without
panicing.  Thanks,

Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
210fac10fa Btrfs: cleanup error handling in build_backref_tree
commit 75bfb9aff4 upstream.

When balance panics it tends to panic in the

BUG_ON(!upper->checked);

test, because it means it couldn't build the backref tree properly.  This is
annoying to users and frankly a recoverable error, nothing in this function is
actually fatal since it is just an in-memory building of the backrefs for a
given bytenr.  So go through and change all the BUG_ON()'s to ASSERT()'s, and
fix the BUG_ON(!upper->checked) thing to just return an error.

This patch also fixes the error handling so it tears down the work we've done
properly.  This code was horribly broken since we always just panic'ed instead
of actually erroring out, so it needed to be completely re-worked.  With this
patch my broken image no longer panics when I mount it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
f29d743dd1 Btrfs: try not to ENOSPC on log replay
commit 1d52c78afb upstream.

When doing log replay we may have to update inodes, which traditionally goes
through our delayed inode stuff.  This will try to move space over from the
trans handle, but we don't reserve space in our trans handle on replay since we
don't know how much we will need, so instead we try to flush.  But because we
have a trans handle open we won't flush anything, so if we are out of reserve
space we will simply return ENOSPC.  Since we know that if an operation made it
into the log then we definitely had space before the box bought the farm then we
don't need to worry about doing this space reservation.  Use the
fs_info->log_root_recovering flag to skip the delayed inode stuff and update the
item directly.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
7433c1d96f Btrfs: don't do async reclaim during log replay
commit f6acfd5011 upstream.

Trying to reproduce a log enospc bug I hit a panic in the async reclaim code
during log replay.  This is because we use fs_info->fs_root as our root for
shrinking and such.  Technically we can use whatever root we want, but let's
just not allow async reclaim while we're doing log replay.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
1e1078225d btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
commit e6c4efd87a upstream.

The following commit enhanced the merge_extent_mapping() to reduce
fragment in extent map tree, but it can't handle case which existing
lies before map_start:
51f39 btrfs: Use right extent length when inserting overlap extent map.

[BUG]
When existing extent map's start is before map_start,
the em->len will be minus, which will corrupt the extent map and fail to
insert the new extent map.
This will happen when someone get a large extent map, but when it is
going to insert it into extent map tree, some one has already commit
some write and split the huge extent into small parts.

[REPRODUCER]
It is very easy to tiger using filebench with randomrw personality.
It is about 100% to reproduce when using 8G preallocated file in 60s
randonrw test.

[FIX]
This patch can now handle any existing extent position.
Since it does not directly use existing->start, now it will find the
previous and next extent around map_start.
So the old existing->start < map_start bug will never happen again.

[ENHANCE]
This patch will insert the best fitted extent map into extent map tree,
other than the oldest [map_start, map_start + sectorsize) or the
relatively newer but not perfect [map_start, existing->start).

The patch will first search existing extent that does not intersects with
the desired map range [map_start, map_start + len).
The existing extent will be either before or behind map_start, and based
on the existing extent, we can find out the previous and next extent
around map_start.

So the best fitted extent would be [prev->end, next->start).
For prev or next is not found, em->start would be prev->end and em->end
wold be next->start.

With this patch, the fragment in extent map tree should be reduced much
more than the 51f39 commit and reduce an unneeded extent map tree search.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:05 -07:00
9043fe5bc2 Btrfs: fix up bounds checking in lseek
commit 4d1a40c66b upstream.

An user reported this, it is because that lseek's SEEK_SET/SEEK_CUR/SEEK_END
allow a negative value for @offset, but btrfs's SEEK_DATA/SEEK_HOLE don't
prepare for that and convert the negative @offset into unsigned type,
so we get (end < start) warning.

[ 1269.835374] ------------[ cut here ]------------
[ 1269.836809] WARNING: CPU: 0 PID: 1241 at fs/btrfs/extent_io.c:430 insert_state+0x11d/0x140()
[ 1269.838816] BTRFS: end < start 4094 18446744073709551615
[ 1269.840334] CPU: 0 PID: 1241 Comm: a.out Tainted: G        W      3.16.0+ #306
[ 1269.858229] Call Trace:
[ 1269.858612]  [<ffffffff81801a69>] dump_stack+0x4e/0x68
[ 1269.858952]  [<ffffffff8107894c>] warn_slowpath_common+0x8c/0xc0
[ 1269.859416]  [<ffffffff81078a36>] warn_slowpath_fmt+0x46/0x50
[ 1269.859929]  [<ffffffff813b0fbd>] insert_state+0x11d/0x140
[ 1269.860409]  [<ffffffff813b1396>] __set_extent_bit+0x3b6/0x4e0
[ 1269.860805]  [<ffffffff813b21c7>] lock_extent_bits+0x87/0x200
[ 1269.861697]  [<ffffffff813a5b28>] btrfs_file_llseek+0x148/0x2a0
[ 1269.862168]  [<ffffffff811f201e>] SyS_lseek+0xae/0xc0
[ 1269.862620]  [<ffffffff8180b212>] system_call_fastpath+0x16/0x1b
[ 1269.862970] ---[ end trace 4d33ea885832054b ]---

This assumes that btrfs starts finding DATA/HOLE from the beginning of file
if the assigned @offset is negative.

Also we add alignment for lock_extent_bits 's range.

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:04 -07:00
2b5528ba94 Btrfs: add missing compression property remove in btrfs_ioctl_setflags
commit 78a017a2c9 upstream.

The behaviour of a 'chattr -c' consists of getting the current flags,
clearing the FS_COMPR_FL bit and then sending the result to the set
flags ioctl - this means the bit FS_NOCOMP_FL isn't set in the flags
passed to the ioctl. This results in the compression property not being
cleared from the inode - it was cleared only if the bit FS_NOCOMP_FL
was set in the received flags.

Reproducer:

    $ mkfs.btrfs -f /dev/sdd
    $ mount /dev/sdd /mnt && cd /mnt
    $ mkdir a
    $ chattr +c a
    $ touch a/file
    $ lsattr a/file
    --------c------- a/file
    $ chattr -c a
    $ touch a/file2
    $ lsattr a/file2
    --------c------- a/file2
    $ lsattr -d a
    ---------------- a

Reported-by: Andreas Schneider <asn@cryptomilk.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:04 -07:00
c3caf69f3c btrfs: Fix a deadlock in btrfs_dev_replace_finishing()
commit 12b894cb28 upstream.

btrfs-transacion:5657
[stack snip]
btrfs_bio_map()
    btrfs_bio_counter_inc_blocked()
        percpu_counter_inc(&fs_info->bio_counter)  ###bio_counter > 0(A)
        __btrfs_bio_map()
            btrfs_dev_replace_lock()
                mutex_lock(dev_replace->lock)	   ###wait mutex(B)

btrfs:32612
[stack snip]
btrfs_dev_replace_start()
    btrfs_dev_replace_lock()
	mutex_lock(dev_replace->lock)		   ###hold mutex(B)
    btrfs_dev_replace_finishing()
        btrfs_rm_dev_replace_blocked()
            wait until percpu_counter_sum == 0	   ###wait on bio_counter(A)

This bug can be triggered quite easily by the following test script:
http://pastebin.com/MQmb37Cy

This patch will fix the ABBA problem by calling
btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().

The consistency of btrfs devices list and their superblocks is protected
by device_list_mutex, not btrfs_dev_replace_lock/unlock().
So it is safe the move btrfs_dev_replace_unlock() before
btrfs_rm_dev_replace_blocked().

Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:04 -07:00
fe9133f10c btrfs: don't go readonly on existing qgroup items
commit 0b4699dcb6 upstream.

btrfs_drop_snapshot() leaves subvolume qgroup items on disk after
completion. This can cause problems with snapshot creation. If a new
snapshot tries to claim the deleted subvolumes id, btrfs will get -EEXIST
from add_qgroup_item() and go read-only. The following commands will
reproduce this problem (assume btrfs is on /dev/sda and is mounted at
/btrfs)

mkfs.btrfs -f /dev/sda
mount -t btrfs /dev/sda /btrfs/
btrfs quota enable /btrfs/
btrfs su sna /btrfs/ /btrfs/snap
btrfs su de /btrfs/snap
sleep 45
umount /btrfs/
mount -t btrfs /dev/sda /btrfs/

We can fix this by catching -EEXIST in add_qgroup_item() and
initializing the existing items. We have the problem of orphaned
relation items being on disk from an old snapshot but that is outside
the scope of this patch.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:04 -07:00
e207143d7e btrfs: wake up transaction thread from SYNC_FS ioctl
commit 2fad4e83e1 upstream.

The transaction thread may want to do more work, namely it pokes the
cleaner ktread that will start processing uncleaned subvols.

This can be triggered by user via the 'btrfs fi sync' command, otherwise
there was a delay up to 30 seconds before the cleaner started to clean
old snapshots.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-30 09:43:04 -07:00
9db8a8bb98 Linux 3.17.1 2014-10-15 12:29:30 +02:00
fd979860f7 arm64: Tell irq work about self IPI support
commit 3631073659 upstream.

ARM64 irq work self-IPI support depends on __smp_cross_call to point to
some relevant IRQ controller operations. This information should be
available after the call to init_IRQ().

Lets implement arch_irq_work_has_interrupt() accordingly.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:25 +02:00
1063368632 libata: Un-break ATA blacklist
commit 1c40279960 upstream.

lib/glob.c provides a new glob_match() function, with arguments in
(pattern, string) order.  It replaced a private function with arguments
in (string, pattern) order, but I didn't swap the call site...

The result was the entire ATA blacklist was effectively disabled.

The lesson for today is "I f***ed up *how* badly *how* many months ago?",
er, I mean "Nobody Tests RC Kernels On Legacy Hardware".

This was not a subtle break, but it made it through an entire RC
cycle unreported, presumably because all the people doing testing
have full-featured hardware.

(FWIW, the reason for the argument swap was because fnmatch() does it that
way, and for a while implementing a full fnmatch() was being considered.)

Fixes: 428ac5fc05 (libata: Use glob_match from lib/glob.c)
Reported-by: Steven Honeyman <stevenhoneyman@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=71371#c21
Signed-off-by: George Spelvin <linux@horizon.com>
Tested-by: Steven Honeyman <stevenhoneyman@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:25 +02:00
5ec1014828 serial: 8250: Add Quark X1000 to 8250_pci.c
commit 1ede7dcca3 upstream.

Quark X1000 contains two designware derived 8250 serial ports.
Each port has a unique PCI configuration space consisting of
BAR0:UART BAR1:DMA respectively.

Unlike the standard 8250 the register width is 32 bits for RHR,IER etc
The Quark UART has a fundamental clock @ 44.2368 MHz allowing for a
bitrate of up to about 2.76 megabits per second.

This patch enables standard 8250 mode

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
243e25b517 driver/base/node: remove unnecessary kfree of node struct from unregister_one_node
commit 33ead538f6 upstream.

Commit 92d585ef06 ("numa: fix NULL pointer access and memory
leak in unregister_one_node()") added kfree() of node struct in
unregister_one_node(). But node struct is freed by node_device_release()
which is called in  unregister_node(). So by adding the kfree(),
node struct is freed two times.

While hot removing memory, the commit leads the following BUG_ON():

  kernel BUG at mm/slub.c:3346!
  invalid opcode: 0000 [#1] SMP
  [...]
  Call Trace:
   [...] unregister_one_node
   [...] try_offline_node
   [...] remove_memory
   [...] acpi_memory_device_remove
   [...] acpi_bus_trim
   [...] acpi_bus_trim
   [...] acpi_device_hotplug
   [...] acpi_hotplug_work_fn
   [...] process_one_work
   [...] worker_thread
   [...] ? rescuer_thread
   [...] kthread
   [...] ? kthread_create_on_node
   [...] ret_from_fork
   [...] ? kthread_create_on_node

This patch removes unnecessary kfree() from unregister_one_node().

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Fixes: 92d585ef06 "numa: fix NULL pointer access and memory leak in unregister_one_node()"
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
26ad374b2e crypto: caam - fix addressing of struct member
commit 4451d494b1 upstream.

buf_0 and buf_1 in caam_hash_state are not next to each other.
Accessing buf_1 is incorrect from &buf_0 with an offset of only
size_of(buf_0). The same issue is also with buflen_0 and buflen_1

Signed-off-by: Cristian Stoica <cristian.stoica@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
3cd99033c7 USB: Add device quirk for ASUS T100 Base Station keyboard
commit ddbe1fca0b upstream.

This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.

This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.

This patch could be back-ported to kernels as old as 2.6.39.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
b2dd17bca1 usb: musb: dsps: kill OTG timer on suspend
commit 468bcc2a2c upstream.

if we don't make sure to kill the timer, it could
expire after we have already gated our clocks.

That will trigger a Data Abort exception because
we would try to access register while clock is gated.

Fix that bug.

Fixes 869c597 (usb: musb: dsps: add support for suspend and resume)
Tested-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
17ea14cd58 USB: cp210x: add support for Seluxit USB dongle
commit dee80ad12d upstream.

Added the Seluxit ApS USB Serial Dongle to cp210x driver.

Signed-off-by: Andreas Bomholtz <andreas@seluxit.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
4aaf2c94cb USB: serial: cp210x: added Ketra N1 wireless interface support
commit bfc2d7dfdd upstream.

Added support for Ketra N1 wireless interface, which uses the
Silicon Labs' CP2104 USB to UART bridge with customized PID 8946.

Signed-off-by: Joe Savage <joe.savage@goketra.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
2d8b382aca Revert "usb: gadget: composite: dequeue cdev->req before free it in composite_dev_cleanup"
commit bf17eba7ae upstream.

This reverts commit f2267089ea.

That commit causes more problem than fixes. Firstly, kfree()
should be called after usb_ep_dequeue() and secondly, the way
things are, we will try to dequeue a request that has already
completed much more frequently than one which is pending.

Cc: Li Jun <b47624@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
6ed8a450f6 usb: gadget: f_fs: signedness bug in __ffs_func_bind_do_descs()
commit 85b06f5e53 upstream.

We need "idx" to be signed for the error handling to work.

Fixes: 6d5c1c77bb ('usb: gadget: f_fs: fix the redundant ep files problem')
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
3842f8ec4a uas: Add another ASM1051 usb-id to the uas blacklist
commit 710f1bf16a upstream.

As most ASM1051 based devices, this one has unfixable issues with uas too.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
de3af5fefe uas: Add US_FL_NO_ATA_1X quirk for Seagate (0bc2:ab20) drives
commit f9554a6b19 upstream.

https://bbs.archlinux.org/viewtopic.php?pid=1457492

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
c0f5e44c17 uas: Add no-report-opcodes quirk
commit 734016b00b upstream.

Besides the ASM1051 (*) needing sdev->no_report_opcodes = 1, it turns out that
the JMicron JMS567 also needs it to work properly with uas (usb-storage always
sets it). Since some of the scsi devs were not to keen on the idea to
outrightly set sdev->no_report_opcodes = 1 for all uas devices, so add a quirk
for this, and set it for the JMS567.

*) Which has become a non-issue since we've completely blacklisted uas on
the ASM1051 for other reasons

Reported-and-tested-by: Claudio Bizzarri <claudio.bizzarri@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:24 +02:00
c36cac9f20 uas: Add a quirk for rejecting ATA_12 and ATA_16 commands
commit 593078525c upstream.

And set this quirk for the Seagate Expansion Desk (0bc2:2312), as that one
seems to hang upon receiving an ATA_12 or ATA_16 command.

https://bugzilla.kernel.org/show_bug.cgi?id=79511
https://bbs.archlinux.org/viewtopic.php?id=183190

While at it also add missing documentation for the u value for usb-storage
quirks.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
117ef0db3c PCI: pciehp: Fix wait time in timeout message
commit d433889cd5 upstream.

When we warned about a timeout on a hotplug command, we previously printed
the time between calls to pcie_write_cmd(), without accounting for any time
spent actually waiting.  Consider this sequence:

  pcie_write_cmd
    write SLTCTL
    cmd_started = jiffies          # T1

  pcie_write_cmd
    pcie_wait_cmd
      now = jiffies                # T2
      wait_event_timeout           # we may wait here
      if (timeout)
        ctrl_info("Timeout on command issued %u msec ago",
                  jiffies_to_msecs(now - cmd_started))

We previously printed (T2 - T1), but that doesn't include the time spent in
wait_event_timeout().

Fix this by using the current jiffies value, not the one cached before
calling wait_event_timeout().

[bhelgaas: changelog, use current jiffies instead of adding timeout]
Fixes: 40b960831c ("PCI: pciehp: Compute timeout from hotplug command start time")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
6c3dfd886d arm: Tell irq work about self IPI support
commit 09f6edd424 upstream.

ARM irq work IPI support depends on SMP support. That information is
partly known at early boottime. Lets implement
arch_irq_work_has_interrupt() accordingly.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
1d3408209d x86: Tell irq work about self IPI support
commit 3010279f0f upstream.

x86 supports irq work self-IPIs when local apic is available. This is
partly known on runtime so lets implement arch_irq_work_has_interrupt()
accordingly.

This should be safely called after setup_arch().

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
6fd5de08a5 irq_work: Force raised irq work to run on irq work interrupt
commit 76a33061b9 upstream.

The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.

This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.

However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.

Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:

	Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
	CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
	Workqueue: btrfs-endio-write normal_work_helper [btrfs]
	 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
	 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
	 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
	Call Trace:
	 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
	 [<ffffffff8a7ef928>] panic+0xd4/0x207
	 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
	 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
	 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
	 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
	 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
	 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
	 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
	 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
	 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
	 [<ffffffff8a008268>] do_nmi+0xb8/0x100
	 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
	 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
	 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
	 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
	 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
	 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
	 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
	 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
	 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
	 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
	 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
	 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
	 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
	 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
	 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
	 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
b677a767bc irq_work: Introduce arch_irq_work_has_interrupt()
commit c5c38ef3d7 upstream.

The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.

Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
3bc0d3356f net_sched: copy exts->type in tcf_exts_change()
[ Upstream commit 5301e3e117 ]

We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.

Fixes: commit 33be627159 ("net_sched: act: use standard struct list_head")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
f823cda9b8 3c59x: fix bad split of cpu_to_le32(pci_map_single())
[ Upstream commit 88b09a6d95 ]

In commit 6f2b6a3005,
  # 3c59x: Add dma error checking and recovery
the intent is to split out the mapping from the byte-swapping in order to
insert a dma_mapping_error() check.

Kinda this semantic patch:

    // See http://coccinelle.lip6.fr/
    //
    // Beware, grouik-and-dirty!
    @@
    expression DEV, X, Y, Z;
    @@
    -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
    +   dma_addr_t addr = pci_map_single(DEV, X, Y, Z);
    +   if (dma_mapping_error(&DEV->dev, addr))
    +       /* snip */;
    +   cpu_to_le32(addr)

However, the #else part (of the #if DO_ZEROCOPY test) is changed this way:

    -   cpu_to_le32(pci_map_single(DEV, X, Y, Z))
    +   dma_addr_t addr = cpu_to_le32(pci_map_single(DEV, X, Y, Z));
    //                    ^^^^^^^^^^^
    //                    That mismatches the 3 other changes!
    +   if (dma_mapping_error(&DEV->dev, addr))
    +       /* snip */;
    +   cpu_to_le32(addr)

Let's remove the leftover cpu_to_le32() for coherency.

v2: Better changelog.
v3: Add Acked-by

Fixes: 6f2b6a3005
  # 3c59x: Add dma error checking and recovery
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
404db05025 sctp: handle association restarts when the socket is closed.
[ Upstream commit bdf6fa52f0 ]

Currently association restarts do not take into consideration the
state of the socket.  When a restart happens, the current assocation
simply transitions into established state.  This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user.  The conditions
to trigger this are as follows:
  1) Remote does not acknoledge some data causing data to remain
     outstanding.
  2) Local application calls close() on the socket.  Since data
     is still outstanding, the association is placed in SHUTDOWN_PENDING
     state.  However, the socket is closed.
  3) The remote tries to create a new association, triggering a restart
     on the local system.  The association moves from SHUTDOWN_PENDING
     to ESTABLISHED.  At this point, it is no longer reachable by
     any socket on the local system.

This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk.  This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.

Reported-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
ed54569e2b hyperv: Fix a bug in netvsc_send()
[ Upstream commit 3a67c9ccad ]

After the packet is successfully sent, we should not touch the packet
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.

David, please queue this up for stable.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
c96de3dca9 team: avoid race condition in scheduling delayed work
[ Upstream commit 47549650ab ]

When team_notify_peers and team_mcast_rejoin are called, they both reset
their respective .count_pending atomic variable. Then when the actual
worker function is executed, the variable is atomically decremented.
This pattern introduces a potential race condition where the
.count_pending rolls over and the worker function keeps rescheduling
until .count_pending decrements to zero again:

THREAD 1                           THREAD 2

========                           ========
team_notify_peers(teamX)
  atomic_set count_pending = 1
  schedule_delayed_work
                                   team_notify_peers(teamX)
                                   atomic_set count_pending = 1
team_notify_peers_work
  atomic_dec_and_test
    count_pending = 0
  (return)
                                   schedule_delayed_work
                                   team_notify_peers_work
                                   atomic_dec_and_test
                                     count_pending = -1
                                   schedule_delayed_work
                                   (repeat until count_pending = 0)

Instead of assigning a new value to .count_pending, use atomic_add to
tack-on the additional desired worker function invocations.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Fixes: fc423ff00d ("team: add peer notification")
Fixes: 492b200efd ("team: add support for sending multicast rejoins")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
e0ff82758e net: systemport: fix bcm_sysport_insert_tsb()
[ Upstream commit e87474a6e6 ]

Similar to commit bc23333ba1 ("net:
bcmgenet: fix bcmgenet_put_tx_csum()"), we need to return the skb
pointer in case we had to reallocate the SKB headroom.

Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:23 +02:00
570f480a54 ip6_gre: fix flowi6_proto value in xmit path
[ Upstream commit 3be07244b7 ]

In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-15 12:29:22 +02:00
232 changed files with 2810 additions and 1554 deletions

View File

@ -3522,6 +3522,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
READ_DISC_INFO command);
e = NO_READ_CAPACITY_16 (don't use
READ_CAPACITY_16 command);
f = NO_REPORT_OPCODES (don't use report opcodes
command, uas only);
h = CAPACITY_HEURISTICS (decrease the
reported device capacity by one
sector if the number is odd);
@ -3541,6 +3543,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
bogus residue values);
s = SINGLE_LUN (the device has only one
Logical Unit);
t = NO_ATA_1X (don't allow ATA(12) and ATA(16)
commands, uas only);
u = IGNORE_UAS (don't bind to the uas driver);
w = NO_WP_DETECT (don't test whether the
medium is write-protected).

164
Documentation/lzo.txt Normal file
View File

@ -0,0 +1,164 @@
LZO stream format as understood by Linux's LZO decompressor
===========================================================
Introduction
This is not a specification. No specification seems to be publicly available
for the LZO stream format. This document describes what input format the LZO
decompressor as implemented in the Linux kernel understands. The file subject
of this analysis is lib/lzo/lzo1x_decompress_safe.c. No analysis was made on
the compressor nor on any other implementations though it seems likely that
the format matches the standard one. The purpose of this document is to
better understand what the code does in order to propose more efficient fixes
for future bug reports.
Description
The stream is composed of a series of instructions, operands, and data. The
instructions consist in a few bits representing an opcode, and bits forming
the operands for the instruction, whose size and position depend on the
opcode and on the number of literals copied by previous instruction. The
operands are used to indicate :
- a distance when copying data from the dictionary (past output buffer)
- a length (number of bytes to copy from dictionary)
- the number of literals to copy, which is retained in variable "state"
as a piece of information for next instructions.
Optionally depending on the opcode and operands, extra data may follow. These
extra data can be a complement for the operand (eg: a length or a distance
encoded on larger values), or a literal to be copied to the output buffer.
The first byte of the block follows a different encoding from other bytes, it
seems to be optimized for literal use only, since there is no dictionary yet
prior to that byte.
Lengths are always encoded on a variable size starting with a small number
of bits in the operand. If the number of bits isn't enough to represent the
length, up to 255 may be added in increments by consuming more bytes with a
rate of at most 255 per extra byte (thus the compression ratio cannot exceed
around 255:1). The variable length encoding using #bits is always the same :
length = byte & ((1 << #bits) - 1)
if (!length) {
length = ((1 << #bits) - 1)
length += 255*(number of zero bytes)
length += first-non-zero-byte
}
length += constant (generally 2 or 3)
For references to the dictionary, distances are relative to the output
pointer. Distances are encoded using very few bits belonging to certain
ranges, resulting in multiple copy instructions using different encodings.
Certain encodings involve one extra byte, others involve two extra bytes
forming a little-endian 16-bit quantity (marked LE16 below).
After any instruction except the large literal copy, 0, 1, 2 or 3 literals
are copied before starting the next instruction. The number of literals that
were copied may change the meaning and behaviour of the next instruction. In
practice, only one instruction needs to know whether 0, less than 4, or more
literals were copied. This is the information stored in the <state> variable
in this implementation. This number of immediate literals to be copied is
generally encoded in the last two bits of the instruction but may also be
taken from the last two bits of an extra operand (eg: distance).
End of stream is declared when a block copy of distance 0 is seen. Only one
instruction may encode this distance (0001HLLL), it takes one LE16 operand
for the distance, thus requiring 3 bytes.
IMPORTANT NOTE : in the code some length checks are missing because certain
instructions are called under the assumption that a certain number of bytes
follow because it has already been garanteed before parsing the instructions.
They just have to "refill" this credit if they consume extra bytes. This is
an implementation design choice independant on the algorithm or encoding.
Byte sequences
First byte encoding :
0..17 : follow regular instruction encoding, see below. It is worth
noting that codes 16 and 17 will represent a block copy from
the dictionary which is empty, and that they will always be
invalid at this place.
18..21 : copy 0..3 literals
state = (byte - 17) = 0..3 [ copy <state> literals ]
skip byte
22..255 : copy literal string
length = (byte - 17) = 4..238
state = 4 [ don't copy extra literals ]
skip byte
Instruction encoding :
0 0 0 0 X X X X (0..15)
Depends on the number of literals copied by the last instruction.
If last instruction did not copy any literal (state == 0), this
encoding will be a copy of 4 or more literal, and must be interpreted
like this :
0 0 0 0 L L L L (0..15) : copy long literal string
length = 3 + (L ?: 15 + (zero_bytes * 255) + non_zero_byte)
state = 4 (no extra literals are copied)
If last instruction used to copy between 1 to 3 literals (encoded in
the instruction's opcode or distance), the instruction is a copy of a
2-byte block from the dictionary within a 1kB distance. It is worth
noting that this instruction provides little savings since it uses 2
bytes to encode a copy of 2 other bytes but it encodes the number of
following literals for free. It must be interpreted like this :
0 0 0 0 D D S S (0..15) : copy 2 bytes from <= 1kB distance
length = 2
state = S (copy S literals after this block)
Always followed by exactly one byte : H H H H H H H H
distance = (H << 2) + D + 1
If last instruction used to copy 4 or more literals (as detected by
state == 4), the instruction becomes a copy of a 3-byte block from the
dictionary from a 2..3kB distance, and must be interpreted like this :
0 0 0 0 D D S S (0..15) : copy 3 bytes from 2..3 kB distance
length = 3
state = S (copy S literals after this block)
Always followed by exactly one byte : H H H H H H H H
distance = (H << 2) + D + 2049
0 0 0 1 H L L L (16..31)
Copy of a block within 16..48kB distance (preferably less than 10B)
length = 2 + (L ?: 7 + (zero_bytes * 255) + non_zero_byte)
Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S
distance = 16384 + (H << 14) + D
state = S (copy S literals after this block)
End of stream is reached if distance == 16384
0 0 1 L L L L L (32..63)
Copy of small block within 16kB distance (preferably less than 34B)
length = 2 + (L ?: 31 + (zero_bytes * 255) + non_zero_byte)
Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S
distance = D + 1
state = S (copy S literals after this block)
0 1 L D D D S S (64..127)
Copy 3-4 bytes from block within 2kB distance
state = S (copy S literals after this block)
length = 3 + L
Always followed by exactly one byte : H H H H H H H H
distance = (H << 3) + D + 1
1 L L D D D S S (128..255)
Copy 5-8 bytes from block within 2kB distance
state = S (copy S literals after this block)
length = 5 + L
Always followed by exactly one byte : H H H H H H H H
distance = (H << 3) + D + 1
Authors
This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an
analysis of the decompression code available in Linux 3.16-rc5. The code is
tricky, it is possible that this document contains mistakes or that a few
corner cases were overlooked. In any case, please report any doubt, fix, or
proposed updates to the author(s) so that the document can be updated.

View File

@ -425,6 +425,20 @@ fault through the slow path.
Since only 19 bits are used to store generation-number on mmio spte, all
pages are zapped when there is an overflow.
Unfortunately, a single memory access might access kvm_memslots(kvm) multiple
times, the last one happening when the generation number is retrieved and
stored into the MMIO spte. Thus, the MMIO spte might be created based on
out-of-date information, but with an up-to-date generation number.
To avoid this, the generation number is incremented again after synchronize_srcu
returns; thus, the low bit of kvm_memslots(kvm)->generation is only 1 during a
memslot update, while some SRCU readers might be using the old copy. We do not
want to use an MMIO sptes created with an odd generation number, and we can do
this without losing a bit in the MMIO spte. The low bit of the generation
is not stored in MMIO spte, and presumed zero when it is extracted out of the
spte. If KVM is unlucky and creates an MMIO spte while the low bit is 1,
the next access to the spte will always be a cache miss.
Further reading
===============

View File

@ -1,6 +1,6 @@
VERSION = 3
PATCHLEVEL = 17
SUBLEVEL = 0
SUBLEVEL = 2
EXTRAVERSION =
NAME = Shuffling Zombie Juror

View File

@ -4,6 +4,7 @@ generic-y += clkdev.h
generic-y += cputime.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -18,6 +18,7 @@ generic-y += ioctl.h
generic-y += ioctls.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kmap_types.h
generic-y += kvm_para.h
generic-y += local.h

View File

@ -144,8 +144,8 @@ dtb-$(CONFIG_MACH_KIRKWOOD) += kirkwood-b3.dtb \
kirkwood-openrd-client.dtb \
kirkwood-openrd-ultimate.dtb \
kirkwood-rd88f6192.dtb \
kirkwood-rd88f6281-a0.dtb \
kirkwood-rd88f6281-a1.dtb \
kirkwood-rd88f6281-z0.dtb \
kirkwood-rd88f6281-a.dtb \
kirkwood-rs212.dtb \
kirkwood-rs409.dtb \
kirkwood-rs411.dtb \

View File

@ -143,6 +143,10 @@
marvell,nand-enable-arbiter;
nand-on-flash-bbt;
/* Use Hardware BCH ECC */
nand-ecc-strength = <4>;
nand-ecc-step-size = <512>;
partition@0 {
label = "u-boot";
reg = <0x0000000 0x180000>; /* 1.5MB */

View File

@ -145,6 +145,10 @@
marvell,nand-enable-arbiter;
nand-on-flash-bbt;
/* Use Hardware BCH ECC */
nand-ecc-strength = <4>;
nand-ecc-step-size = <512>;
partition@0 {
label = "u-boot";
reg = <0x0000000 0x180000>; /* 1.5MB */

View File

@ -223,6 +223,10 @@
marvell,nand-enable-arbiter;
nand-on-flash-bbt;
/* Use Hardware BCH ECC */
nand-ecc-strength = <4>;
nand-ecc-step-size = <512>;
partition@0 {
label = "u-boot";
reg = <0x0000000 0x180000>; /* 1.5MB */

View File

@ -834,6 +834,7 @@
compatible = "atmel,hsmci";
reg = <0xfff80000 0x600>;
interrupts = <10 IRQ_TYPE_LEVEL_HIGH 0>;
pinctrl-names = "default";
#address-cells = <1>;
#size-cells = <0>;
clocks = <&mci0_clk>;
@ -845,6 +846,7 @@
compatible = "atmel,hsmci";
reg = <0xfff84000 0x600>;
interrupts = <11 IRQ_TYPE_LEVEL_HIGH 0>;
pinctrl-names = "default";
#address-cells = <1>;
#size-cells = <0>;
clocks = <&mci1_clk>;

View File

@ -193,7 +193,6 @@
i2c0: i2c@80058000 {
pinctrl-names = "default";
pinctrl-0 = <&i2c0_pins_a>;
clock-frequency = <400000>;
status = "okay";
sgtl5000: codec@0a {

View File

@ -123,11 +123,11 @@
dsa@0 {
compatible = "marvell,dsa";
#address-cells = <2>;
#address-cells = <1>;
#size-cells = <0>;
dsa,ethernet = <&eth0>;
dsa,mii-bus = <&ethphy0>;
dsa,ethernet = <&eth0port>;
dsa,mii-bus = <&mdio>;
switch@0 {
#address-cells = <1>;
@ -169,17 +169,13 @@
&mdio {
status = "okay";
ethphy0: ethernet-phy@ff {
reg = <0xff>; /* No phy attached */
speed = <1000>;
duplex = <1>;
};
};
&eth0 {
status = "okay";
ethernet0-port@0 {
phy-handle = <&ethphy0>;
speed = <1000>;
duplex = <1>;
};
};

View File

@ -0,0 +1,43 @@
/*
* Marvell RD88F6181 A Board descrition
*
* Andrew Lunn <andrew@lunn.ch>
*
* This file is licensed under the terms of the GNU General Public
* License version 2. This program is licensed "as is" without any
* warranty of any kind, whether express or implied.
*
* This file contains the definitions for the board with the A0 or
* higher stepping of the SoC. The ethernet switch does not have a
* "wan" port.
*/
/dts-v1/;
#include "kirkwood-rd88f6281.dtsi"
/ {
model = "Marvell RD88f6281 Reference design, with A0 or higher SoC";
compatible = "marvell,rd88f6281-a", "marvell,rd88f6281","marvell,kirkwood-88f6281", "marvell,kirkwood";
dsa@0 {
switch@0 {
reg = <10 0>; /* MDIO address 10, switch 0 in tree */
};
};
};
&mdio {
status = "okay";
ethphy1: ethernet-phy@11 {
reg = <11>;
};
};
&eth1 {
status = "okay";
ethernet1-port@0 {
phy-handle = <&ethphy1>;
};
};

View File

@ -1,26 +0,0 @@
/*
* Marvell RD88F6181 A0 Board descrition
*
* Andrew Lunn <andrew@lunn.ch>
*
* This file is licensed under the terms of the GNU General Public
* License version 2. This program is licensed "as is" without any
* warranty of any kind, whether express or implied.
*
* This file contains the definitions for the board with the A0 variant of
* the SoC. The ethernet switch does not have a "wan" port.
*/
/dts-v1/;
#include "kirkwood-rd88f6281.dtsi"
/ {
model = "Marvell RD88f6281 Reference design, with A0 SoC";
compatible = "marvell,rd88f6281-a0", "marvell,rd88f6281","marvell,kirkwood-88f6281", "marvell,kirkwood";
dsa@0 {
switch@0 {
reg = <10 0>; /* MDIO address 10, switch 0 in tree */
};
};
};

View File

@ -1,5 +1,5 @@
/*
* Marvell RD88F6181 A1 Board descrition
* Marvell RD88F6181 Z0 stepping descrition
*
* Andrew Lunn <andrew@lunn.ch>
*
@ -7,17 +7,17 @@
* License version 2. This program is licensed "as is" without any
* warranty of any kind, whether express or implied.
*
* This file contains the definitions for the board with the A1 variant of
* the SoC. The ethernet switch has a "wan" port.
*/
* This file contains the definitions for the board using the Z0
* stepping of the SoC. The ethernet switch has a "wan" port.
*/
/dts-v1/;
#include "kirkwood-rd88f6281.dtsi"
/ {
model = "Marvell RD88f6281 Reference design, with A1 SoC";
compatible = "marvell,rd88f6281-a1", "marvell,rd88f6281","marvell,kirkwood-88f6281", "marvell,kirkwood";
model = "Marvell RD88f6281 Reference design, with Z0 SoC";
compatible = "marvell,rd88f6281-z0", "marvell,rd88f6281","marvell,kirkwood-88f6281", "marvell,kirkwood";
dsa@0 {
switch@0 {
@ -28,4 +28,8 @@
};
};
};
};
};
&eth1 {
status = "disabled";
};

View File

@ -37,7 +37,6 @@
ocp@f1000000 {
pinctrl: pin-controller@10000 {
pinctrl-0 = <&pmx_sdio_cd>;
pinctrl-names = "default";
pmx_sdio_cd: pmx-sdio-cd {
@ -69,8 +68,8 @@
#address-cells = <2>;
#size-cells = <0>;
dsa,ethernet = <&eth0>;
dsa,mii-bus = <&ethphy1>;
dsa,ethernet = <&eth0port>;
dsa,mii-bus = <&mdio>;
switch@0 {
#address-cells = <1>;
@ -119,35 +118,19 @@
};
partition@300000 {
label = "data";
label = "rootfs";
reg = <0x0300000 0x500000>;
};
};
&mdio {
status = "okay";
ethphy0: ethernet-phy@0 {
reg = <0>;
};
ethphy1: ethernet-phy@ff {
reg = <0xff>; /* No PHY attached */
speed = <1000>;
duple = <1>;
};
};
&eth0 {
status = "okay";
ethernet0-port@0 {
phy-handle = <&ethphy0>;
};
};
&eth1 {
status = "okay";
ethernet1-port@0 {
phy-handle = <&ethphy1>;
speed = <1000>;
duplex = <1>;
};
};

View File

@ -309,7 +309,7 @@
marvell,tx-checksum-limit = <1600>;
status = "disabled";
ethernet0-port@0 {
eth0port: ethernet0-port@0 {
compatible = "marvell,kirkwood-eth-port";
reg = <0>;
interrupts = <11>;
@ -342,7 +342,7 @@
pinctrl-names = "default";
status = "disabled";
ethernet1-port@0 {
eth1port: ethernet1-port@0 {
compatible = "marvell,kirkwood-eth-port";
reg = <0>;
interrupts = <15>;

View File

@ -40,7 +40,7 @@
atmel,clk-output-range = <0 66000000>;
};
can1_clk: can0_clk {
can1_clk: can1_clk {
#clock-cells = <0>;
reg = <41>;
atmel,clk-output-range = <0 66000000>;

View File

@ -0,0 +1,11 @@
#ifndef __ASM_ARM_IRQ_WORK_H
#define __ASM_ARM_IRQ_WORK_H
#include <asm/smp_plat.h>
static inline bool arch_irq_work_has_interrupt(void)
{
return is_smp();
}
#endif /* _ASM_ARM_IRQ_WORK_H */

View File

@ -503,7 +503,7 @@ void arch_send_call_function_single_ipi(int cpu)
#ifdef CONFIG_IRQ_WORK
void arch_irq_work_raise(void)
{
if (is_smp())
if (arch_irq_work_has_interrupt())
smp_cross_call(cpumask_of(smp_processor_id()), IPI_IRQ_WORK);
}
#endif

View File

@ -962,6 +962,7 @@ static int __init at91_clock_reset(void)
}
at91_pmc_write(AT91_PMC_SCDR, scdr);
at91_pmc_write(AT91_PMC_PCDR, pcdr);
if (cpu_is_sama5d3())
at91_pmc_write(AT91_PMC_PCDR1, pcdr1);

View File

@ -9,8 +9,8 @@ generic-y += current.h
generic-y += delay.h
generic-y += div64.h
generic-y += dma.h
generic-y += emergency-restart.h
generic-y += early_ioremap.h
generic-y += emergency-restart.h
generic-y += errno.h
generic-y += ftrace.h
generic-y += hash.h

View File

@ -37,8 +37,8 @@ typedef s32 compat_ssize_t;
typedef s32 compat_time_t;
typedef s32 compat_clock_t;
typedef s32 compat_pid_t;
typedef u32 __compat_uid_t;
typedef u32 __compat_gid_t;
typedef u16 __compat_uid_t;
typedef u16 __compat_gid_t;
typedef u16 __compat_uid16_t;
typedef u16 __compat_gid16_t;
typedef u32 __compat_uid32_t;

View File

@ -0,0 +1,22 @@
#ifndef __ASM_IRQ_WORK_H
#define __ASM_IRQ_WORK_H
#ifdef CONFIG_SMP
#include <asm/smp.h>
static inline bool arch_irq_work_has_interrupt(void)
{
return !!__smp_cross_call;
}
#else
static inline bool arch_irq_work_has_interrupt(void)
{
return false;
}
#endif
#endif /* __ASM_IRQ_WORK_H */

View File

@ -48,6 +48,8 @@ extern void smp_init_cpus(void);
*/
extern void set_smp_cross_call(void (*)(const struct cpumask *, unsigned int));
extern void (*__smp_cross_call)(const struct cpumask *, unsigned int);
/*
* Called from the secondary holding pen, this is the secondary CPU entry point.
*/

View File

@ -324,7 +324,6 @@ el1_dbg:
mrs x0, far_el1
mov x2, sp // struct pt_regs
bl do_debug_exception
enable_dbg
kernel_exit 1
el1_inv:
// TODO: add support for undefined instructions in kernel mode

View File

@ -470,7 +470,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
}
}
static void (*__smp_cross_call)(const struct cpumask *, unsigned int);
void (*__smp_cross_call)(const struct cpumask *, unsigned int);
void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int))
{

View File

@ -9,6 +9,7 @@ generic-y += exec.h
generic-y += futex.h
generic-y += hash.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += local.h
generic-y += local64.h
generic-y += mcs_spinlock.h

View File

@ -15,6 +15,7 @@ generic-y += hw_irq.h
generic-y += ioctl.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += kvm_para.h

View File

@ -22,6 +22,7 @@ generic-y += ioctl.h
generic-y += ioctls.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += local.h

View File

@ -8,6 +8,7 @@ generic-y += clkdev.h
generic-y += cputime.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += kvm_para.h
generic-y += linkage.h
generic-y += mcs_spinlock.h

View File

@ -3,6 +3,7 @@ generic-y += clkdev.h
generic-y += cputime.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -23,6 +23,7 @@ generic-y += ioctls.h
generic-y += iomap.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += local.h

View File

@ -2,6 +2,7 @@
generic-y += clkdev.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += kvm_para.h
generic-y += mcs_spinlock.h
generic-y += preempt.h

View File

@ -3,6 +3,7 @@ generic-y += clkdev.h
generic-y += cputime.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += module.h
generic-y += preempt.h

View File

@ -11,6 +11,7 @@ generic-y += hw_irq.h
generic-y += ioctl.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += kvm_para.h

View File

@ -28,9 +28,11 @@
int hwreg_present( volatile void *regp )
{
int ret = 0;
unsigned long flags;
long save_sp, save_vbr;
long tmp_vectors[3];
local_irq_save(flags);
__asm__ __volatile__
( "movec %/vbr,%2\n\t"
"movel #Lberr1,%4@(8)\n\t"
@ -46,6 +48,7 @@ int hwreg_present( volatile void *regp )
: "=&d" (ret), "=&r" (save_sp), "=&r" (save_vbr)
: "a" (regp), "a" (tmp_vectors)
);
local_irq_restore(flags);
return( ret );
}
@ -58,9 +61,11 @@ EXPORT_SYMBOL(hwreg_present);
int hwreg_write( volatile void *regp, unsigned short val )
{
int ret;
unsigned long flags;
long save_sp, save_vbr;
long tmp_vectors[3];
local_irq_save(flags);
__asm__ __volatile__
( "movec %/vbr,%2\n\t"
"movel #Lberr2,%4@(8)\n\t"
@ -78,6 +83,7 @@ int hwreg_write( volatile void *regp, unsigned short val )
: "=&d" (ret), "=&r" (save_sp), "=&r" (save_vbr)
: "a" (regp), "a" (tmp_vectors), "g" (val)
);
local_irq_restore(flags);
return( ret );
}

View File

@ -19,6 +19,7 @@ generic-y += ioctl.h
generic-y += ioctls.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += kvm_para.h

View File

@ -5,6 +5,7 @@ generic-y += cputime.h
generic-y += device.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -3,6 +3,7 @@ generic-y += cputime.h
generic-y += current.h
generic-y += emergency-restart.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += local64.h
generic-y += mcs_spinlock.h
generic-y += mutex.h

View File

@ -4,6 +4,7 @@ generic-y += clkdev.h
generic-y += cputime.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -31,6 +31,7 @@ generic-y += ioctl.h
generic-y += ioctls.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kmap_types.h
generic-y += kvm_para.h

View File

@ -10,6 +10,7 @@ generic-y += exec.h
generic-y += hash.h
generic-y += hw_irq.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
generic-y += kvm_para.h
generic-y += local.h

View File

@ -1,6 +1,7 @@
generic-y += clkdev.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += rwsem.h

View File

@ -584,6 +584,8 @@ static void *__eeh_pe_state_clear(void *data, void *flag)
{
struct eeh_pe *pe = (struct eeh_pe *)data;
int state = *((int *)flag);
struct eeh_dev *edev, *tmp;
struct pci_dev *pdev;
/* Keep the state of permanently removed PE intact */
if ((pe->freeze_count > EEH_MAX_ALLOWED_FREEZES) &&
@ -592,9 +594,22 @@ static void *__eeh_pe_state_clear(void *data, void *flag)
pe->state &= ~state;
/* Clear check count since last isolation */
if (state & EEH_PE_ISOLATED)
pe->check_count = 0;
/*
* Special treatment on clearing isolated state. Clear
* check count since last isolation and put all affected
* devices to normal state.
*/
if (!(state & EEH_PE_ISOLATED))
return NULL;
pe->check_count = 0;
eeh_pe_for_each_dev(pe, edev, tmp) {
pdev = eeh_dev_to_pci_dev(edev);
if (!pdev)
continue;
pdev->error_state = pci_channel_io_normal;
}
return NULL;
}

View File

@ -379,8 +379,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
/*
* numa_node_id() works after this.
*/
set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
set_cpu_numa_mem(cpu, local_memory_node(numa_cpu_lookup_table[cpu]));
if (cpu_present(cpu)) {
set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
set_cpu_numa_mem(cpu,
local_memory_node(numa_cpu_lookup_table[cpu]));
}
}
cpumask_set_cpu(boot_cpuid, cpu_sibling_mask(boot_cpuid));
@ -728,6 +731,9 @@ void start_secondary(void *unused)
}
traverse_core_siblings(cpu, true);
set_numa_node(numa_cpu_lookup_table[cpu]);
set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
smp_wmb();
notify_cpu_starting(cpu);
set_cpu_online(cpu, true);

View File

@ -1127,9 +1127,8 @@ void __init do_init_bootmem(void)
* even before we online them, so that we can use cpu_to_{node,mem}
* early in boot, cf. smp_prepare_cpus().
*/
for_each_possible_cpu(cpu) {
cpu_numa_callback(&ppc64_numa_nb, CPU_UP_PREPARE,
(void *)(unsigned long)cpu);
for_each_present_cpu(cpu) {
numa_setup_cpu((unsigned long)cpu);
}
}

View File

@ -329,16 +329,16 @@ struct direct_window {
/* Dynamic DMA Window support */
struct ddw_query_response {
__be32 windows_available;
__be32 largest_available_block;
__be32 page_size;
__be32 migration_capable;
u32 windows_available;
u32 largest_available_block;
u32 page_size;
u32 migration_capable;
};
struct ddw_create_response {
__be32 liobn;
__be32 addr_hi;
__be32 addr_lo;
u32 liobn;
u32 addr_hi;
u32 addr_lo;
};
static LIST_HEAD(direct_window_list);
@ -725,16 +725,18 @@ static void remove_ddw(struct device_node *np, bool remove_prop)
{
struct dynamic_dma_window_prop *dwp;
struct property *win64;
const u32 *ddw_avail;
u32 ddw_avail[3];
u64 liobn;
int len, ret = 0;
int ret = 0;
ret = of_property_read_u32_array(np, "ibm,ddw-applicable",
&ddw_avail[0], 3);
ddw_avail = of_get_property(np, "ibm,ddw-applicable", &len);
win64 = of_find_property(np, DIRECT64_PROPNAME, NULL);
if (!win64)
return;
if (!ddw_avail || len < 3 * sizeof(u32) || win64->length < sizeof(*dwp))
if (ret || win64->length < sizeof(*dwp))
goto delprop;
dwp = win64->value;
@ -872,8 +874,9 @@ static int create_ddw(struct pci_dev *dev, const u32 *ddw_avail,
do {
/* extra outputs are LIOBN and dma-addr (hi, lo) */
ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create, cfg_addr,
BUID_HI(buid), BUID_LO(buid), page_shift, window_shift);
ret = rtas_call(ddw_avail[1], 5, 4, (u32 *)create,
cfg_addr, BUID_HI(buid), BUID_LO(buid),
page_shift, window_shift);
} while (rtas_busy_delay(ret));
dev_info(&dev->dev,
"ibm,create-pe-dma-window(%x) %x %x %x %x %x returned %d "
@ -910,7 +913,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
int page_shift;
u64 dma_addr, max_addr;
struct device_node *dn;
const u32 *uninitialized_var(ddw_avail);
u32 ddw_avail[3];
struct direct_window *window;
struct property *win64;
struct dynamic_dma_window_prop *ddwprop;
@ -942,8 +945,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
* for the given node in that order.
* the property is actually in the parent, not the PE
*/
ddw_avail = of_get_property(pdn, "ibm,ddw-applicable", &len);
if (!ddw_avail || len < 3 * sizeof(u32))
ret = of_property_read_u32_array(pdn, "ibm,ddw-applicable",
&ddw_avail[0], 3);
if (ret)
goto out_failed;
/*
@ -966,11 +970,11 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
dev_dbg(&dev->dev, "no free dynamic windows");
goto out_failed;
}
if (be32_to_cpu(query.page_size) & 4) {
if (query.page_size & 4) {
page_shift = 24; /* 16MB */
} else if (be32_to_cpu(query.page_size) & 2) {
} else if (query.page_size & 2) {
page_shift = 16; /* 64kB */
} else if (be32_to_cpu(query.page_size) & 1) {
} else if (query.page_size & 1) {
page_shift = 12; /* 4kB */
} else {
dev_dbg(&dev->dev, "no supported direct page size in mask %x",
@ -980,7 +984,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
/* verify the window * number of ptes will map the partition */
/* check largest block * page size > max memory hotplug addr */
max_addr = memory_hotplug_max();
if (be32_to_cpu(query.largest_available_block) < (max_addr >> page_shift)) {
if (query.largest_available_block < (max_addr >> page_shift)) {
dev_dbg(&dev->dev, "can't map partiton max 0x%llx with %u "
"%llu-sized pages\n", max_addr, query.largest_available_block,
1ULL << page_shift);
@ -1006,8 +1010,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
if (ret != 0)
goto out_free_prop;
ddwprop->liobn = create.liobn;
ddwprop->dma_base = cpu_to_be64(of_read_number(&create.addr_hi, 2));
ddwprop->liobn = cpu_to_be32(create.liobn);
ddwprop->dma_base = cpu_to_be64(((u64)create.addr_hi << 32) |
create.addr_lo);
ddwprop->tce_shift = cpu_to_be32(page_shift);
ddwprop->window_shift = cpu_to_be32(len);
@ -1039,7 +1044,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
list_add(&window->list, &direct_window_list);
spin_unlock(&direct_window_list_lock);
dma_addr = of_read_number(&create.addr_hi, 2);
dma_addr = be64_to_cpu(ddwprop->dma_base);
goto out_unlock;
out_free_window:

View File

@ -2,6 +2,7 @@
generic-y += clkdev.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -85,6 +85,7 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x2000ul)
return 1;
return 0;
case KVM_S390_INT_EMERGENCY:
if (psw_extint_disabled(vcpu))
return 0;

View File

@ -6,6 +6,7 @@ generic-y += barrier.h
generic-y += clkdev.h
generic-y += cputime.h
generic-y += hash.h
generic-y += irq_work.h
generic-y += mcs_spinlock.h
generic-y += preempt.h
generic-y += scatterlist.h

View File

@ -12,6 +12,7 @@ generic-y += hash.h
generic-y += ioctl.h
generic-y += ipcbuf.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kvm_para.h
generic-y += local.h
generic-y += local64.h

View File

@ -67,6 +67,7 @@ config SPARC64
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_CONTEXT_TRACKING
select HAVE_DEBUG_KMEMLEAK
select SPARSE_IRQ
select RTC_DRV_CMOS
select RTC_DRV_BQ4802
select RTC_DRV_SUN4V

View File

@ -8,6 +8,7 @@ generic-y += emergency-restart.h
generic-y += exec.h
generic-y += hash.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += linkage.h
generic-y += local.h
generic-y += local64.h

View File

@ -2947,6 +2947,16 @@ unsigned long sun4v_vt_set_perfreg(unsigned long reg_num,
unsigned long reg_val);
#endif
#define HV_FAST_T5_GET_PERFREG 0x1a8
#define HV_FAST_T5_SET_PERFREG 0x1a9
#ifndef __ASSEMBLY__
unsigned long sun4v_t5_get_perfreg(unsigned long reg_num,
unsigned long *reg_val);
unsigned long sun4v_t5_set_perfreg(unsigned long reg_num,
unsigned long reg_val);
#endif
/* Function numbers for HV_CORE_TRAP. */
#define HV_CORE_SET_VER 0x00
#define HV_CORE_PUTCHAR 0x01
@ -2978,6 +2988,7 @@ unsigned long sun4v_vt_set_perfreg(unsigned long reg_num,
#define HV_GRP_VF_CPU 0x0205
#define HV_GRP_KT_CPU 0x0209
#define HV_GRP_VT_CPU 0x020c
#define HV_GRP_T5_CPU 0x0211
#define HV_GRP_DIAG 0x0300
#ifndef __ASSEMBLY__

View File

@ -37,7 +37,7 @@
*
* ino_bucket->irq allocation is made during {sun4v_,}build_irq().
*/
#define NR_IRQS 255
#define NR_IRQS (2048)
void irq_install_pre_handler(int irq,
void (*func)(unsigned int, void *, void *),
@ -57,11 +57,8 @@ unsigned int sun4u_build_msi(u32 portid, unsigned int *irq_p,
unsigned long iclr_base);
void sun4u_destroy_msi(unsigned int irq);
unsigned char irq_alloc(unsigned int dev_handle,
unsigned int dev_ino);
#ifdef CONFIG_PCI_MSI
unsigned int irq_alloc(unsigned int dev_handle, unsigned int dev_ino);
void irq_free(unsigned int irq);
#endif
void __init init_IRQ(void);
void fixup_irqs(void);

View File

@ -53,13 +53,14 @@ struct ldc_channel;
/* Allocate state for a channel. */
struct ldc_channel *ldc_alloc(unsigned long id,
const struct ldc_channel_config *cfgp,
void *event_arg);
void *event_arg,
const char *name);
/* Shut down and free state for a channel. */
void ldc_free(struct ldc_channel *lp);
/* Register TX and RX queues of the link with the hypervisor. */
int ldc_bind(struct ldc_channel *lp, const char *name);
int ldc_bind(struct ldc_channel *lp);
/* For non-RAW protocols we need to complete a handshake before
* communication can proceed. ldc_connect() does that, if the

View File

@ -62,7 +62,8 @@ struct linux_mem_p1275 {
/* You must call prom_init() before using any of the library services,
* preferably as early as possible. Pass it the romvec pointer.
*/
void prom_init(void *cif_handler, void *cif_stack);
void prom_init(void *cif_handler);
void prom_init_report(void);
/* Boot argument acquisition, returns the boot command line string. */
char *prom_getbootargs(void);

View File

@ -57,18 +57,21 @@ void copy_user_page(void *to, void *from, unsigned long vaddr, struct page *topa
typedef struct { unsigned long pte; } pte_t;
typedef struct { unsigned long iopte; } iopte_t;
typedef struct { unsigned long pmd; } pmd_t;
typedef struct { unsigned long pud; } pud_t;
typedef struct { unsigned long pgd; } pgd_t;
typedef struct { unsigned long pgprot; } pgprot_t;
#define pte_val(x) ((x).pte)
#define iopte_val(x) ((x).iopte)
#define pmd_val(x) ((x).pmd)
#define pud_val(x) ((x).pud)
#define pgd_val(x) ((x).pgd)
#define pgprot_val(x) ((x).pgprot)
#define __pte(x) ((pte_t) { (x) } )
#define __iopte(x) ((iopte_t) { (x) } )
#define __pmd(x) ((pmd_t) { (x) } )
#define __pud(x) ((pud_t) { (x) } )
#define __pgd(x) ((pgd_t) { (x) } )
#define __pgprot(x) ((pgprot_t) { (x) } )
@ -77,18 +80,21 @@ typedef struct { unsigned long pgprot; } pgprot_t;
typedef unsigned long pte_t;
typedef unsigned long iopte_t;
typedef unsigned long pmd_t;
typedef unsigned long pud_t;
typedef unsigned long pgd_t;
typedef unsigned long pgprot_t;
#define pte_val(x) (x)
#define iopte_val(x) (x)
#define pmd_val(x) (x)
#define pud_val(x) (x)
#define pgd_val(x) (x)
#define pgprot_val(x) (x)
#define __pte(x) (x)
#define __iopte(x) (x)
#define __pmd(x) (x)
#define __pud(x) (x)
#define __pgd(x) (x)
#define __pgprot(x) (x)
@ -96,21 +102,14 @@ typedef unsigned long pgprot_t;
typedef pte_t *pgtable_t;
/* These two values define the virtual address space range in which we
* must forbid 64-bit user processes from making mappings. It used to
* represent precisely the virtual address space hole present in most
* early sparc64 chips including UltraSPARC-I. But now it also is
* further constrained by the limits of our page tables, which is
* 43-bits of virtual address.
*/
#define SPARC64_VA_HOLE_TOP _AC(0xfffffc0000000000,UL)
#define SPARC64_VA_HOLE_BOTTOM _AC(0x0000040000000000,UL)
extern unsigned long sparc64_va_hole_top;
extern unsigned long sparc64_va_hole_bottom;
/* The next two defines specify the actual exclusion region we
* enforce, wherein we use a 4GB red zone on each side of the VA hole.
*/
#define VA_EXCLUDE_START (SPARC64_VA_HOLE_BOTTOM - (1UL << 32UL))
#define VA_EXCLUDE_END (SPARC64_VA_HOLE_TOP + (1UL << 32UL))
#define VA_EXCLUDE_START (sparc64_va_hole_bottom - (1UL << 32UL))
#define VA_EXCLUDE_END (sparc64_va_hole_top + (1UL << 32UL))
#define TASK_UNMAPPED_BASE (test_thread_flag(TIF_32BIT) ? \
_AC(0x0000000070000000,UL) : \
@ -118,20 +117,16 @@ typedef pte_t *pgtable_t;
#include <asm-generic/memory_model.h>
#define PAGE_OFFSET_BY_BITS(X) (-(_AC(1,UL) << (X)))
extern unsigned long PAGE_OFFSET;
#endif /* !(__ASSEMBLY__) */
/* The maximum number of physical memory address bits we support, this
* is used to size various tables used to manage kernel TLB misses and
* also the sparsemem code.
/* The maximum number of physical memory address bits we support. The
* largest value we can support is whatever "KPGD_SHIFT + KPTE_BITS"
* evaluates to.
*/
#define MAX_PHYS_ADDRESS_BITS 47
#define MAX_PHYS_ADDRESS_BITS 53
/* These two shift counts are used when indexing sparc64_valid_addr_bitmap
* and kpte_linear_bitmap.
*/
#define ILOG2_4MB 22
#define ILOG2_256MB 28

View File

@ -15,6 +15,13 @@
extern struct kmem_cache *pgtable_cache;
static inline void __pgd_populate(pgd_t *pgd, pud_t *pud)
{
pgd_set(pgd, pud);
}
#define pgd_populate(MM, PGD, PUD) __pgd_populate(PGD, PUD)
static inline pgd_t *pgd_alloc(struct mm_struct *mm)
{
return kmem_cache_alloc(pgtable_cache, GFP_KERNEL);
@ -25,7 +32,23 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
kmem_cache_free(pgtable_cache, pgd);
}
#define pud_populate(MM, PUD, PMD) pud_set(PUD, PMD)
static inline void __pud_populate(pud_t *pud, pmd_t *pmd)
{
pud_set(pud, pmd);
}
#define pud_populate(MM, PUD, PMD) __pud_populate(PUD, PMD)
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
{
return kmem_cache_alloc(pgtable_cache,
GFP_KERNEL|__GFP_REPEAT);
}
static inline void pud_free(struct mm_struct *mm, pud_t *pud)
{
kmem_cache_free(pgtable_cache, pud);
}
static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
{
@ -91,4 +114,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, pte_t *pte,
#define __pmd_free_tlb(tlb, pmd, addr) \
pgtable_free_tlb(tlb, pmd, false)
#define __pud_free_tlb(tlb, pud, addr) \
pgtable_free_tlb(tlb, pud, false)
#endif /* _SPARC64_PGALLOC_H */

View File

@ -20,8 +20,6 @@
#include <asm/page.h>
#include <asm/processor.h>
#include <asm-generic/pgtable-nopud.h>
/* The kernel image occupies 0x4000000 to 0x6000000 (4MB --> 96MB).
* The page copy blockops can use 0x6000000 to 0x8000000.
* The 8K TSB is mapped in the 0x8000000 to 0x8400000 range.
@ -42,10 +40,7 @@
#define LOW_OBP_ADDRESS _AC(0x00000000f0000000,UL)
#define HI_OBP_ADDRESS _AC(0x0000000100000000,UL)
#define VMALLOC_START _AC(0x0000000100000000,UL)
#define VMALLOC_END _AC(0x0000010000000000,UL)
#define VMEMMAP_BASE _AC(0x0000010000000000,UL)
#define vmemmap ((struct page *)VMEMMAP_BASE)
#define VMEMMAP_BASE VMALLOC_END
/* PMD_SHIFT determines the size of the area a second-level page
* table can map
@ -55,13 +50,25 @@
#define PMD_MASK (~(PMD_SIZE-1))
#define PMD_BITS (PAGE_SHIFT - 3)
/* PGDIR_SHIFT determines what a third-level page table entry can map */
#define PGDIR_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-3) + PMD_BITS)
/* PUD_SHIFT determines the size of the area a third-level page
* table can map
*/
#define PUD_SHIFT (PMD_SHIFT + PMD_BITS)
#define PUD_SIZE (_AC(1,UL) << PUD_SHIFT)
#define PUD_MASK (~(PUD_SIZE-1))
#define PUD_BITS (PAGE_SHIFT - 3)
/* PGDIR_SHIFT determines what a fourth-level page table entry can map */
#define PGDIR_SHIFT (PUD_SHIFT + PUD_BITS)
#define PGDIR_SIZE (_AC(1,UL) << PGDIR_SHIFT)
#define PGDIR_MASK (~(PGDIR_SIZE-1))
#define PGDIR_BITS (PAGE_SHIFT - 3)
#if (PGDIR_SHIFT + PGDIR_BITS) != 43
#if (MAX_PHYS_ADDRESS_BITS > PGDIR_SHIFT + PGDIR_BITS)
#error MAX_PHYS_ADDRESS_BITS exceeds what kernel page tables can support
#endif
#if (PGDIR_SHIFT + PGDIR_BITS) != 53
#error Page table parameters do not cover virtual address space properly.
#endif
@ -71,28 +78,18 @@
#ifndef __ASSEMBLY__
extern unsigned long VMALLOC_END;
#define vmemmap ((struct page *)VMEMMAP_BASE)
#include <linux/sched.h>
extern unsigned long sparc64_valid_addr_bitmap[];
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
static inline bool __kern_addr_valid(unsigned long paddr)
{
if ((paddr >> MAX_PHYS_ADDRESS_BITS) != 0UL)
return false;
return test_bit(paddr >> ILOG2_4MB, sparc64_valid_addr_bitmap);
}
static inline bool kern_addr_valid(unsigned long addr)
{
unsigned long paddr = __pa(addr);
return __kern_addr_valid(paddr);
}
bool kern_addr_valid(unsigned long addr);
/* Entries per page directory level. */
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-3))
#define PTRS_PER_PMD (1UL << PMD_BITS)
#define PTRS_PER_PUD (1UL << PUD_BITS)
#define PTRS_PER_PGD (1UL << PGDIR_BITS)
/* Kernel has a separate 44bit address space. */
@ -101,6 +98,9 @@ static inline bool kern_addr_valid(unsigned long addr)
#define pmd_ERROR(e) \
pr_err("%s:%d: bad pmd %p(%016lx) seen at (%pS)\n", \
__FILE__, __LINE__, &(e), pmd_val(e), __builtin_return_address(0))
#define pud_ERROR(e) \
pr_err("%s:%d: bad pud %p(%016lx) seen at (%pS)\n", \
__FILE__, __LINE__, &(e), pud_val(e), __builtin_return_address(0))
#define pgd_ERROR(e) \
pr_err("%s:%d: bad pgd %p(%016lx) seen at (%pS)\n", \
__FILE__, __LINE__, &(e), pgd_val(e), __builtin_return_address(0))
@ -112,6 +112,7 @@ static inline bool kern_addr_valid(unsigned long addr)
#define _PAGE_R _AC(0x8000000000000000,UL) /* Keep ref bit uptodate*/
#define _PAGE_SPECIAL _AC(0x0200000000000000,UL) /* Special page */
#define _PAGE_PMD_HUGE _AC(0x0100000000000000,UL) /* Huge page */
#define _PAGE_PUD_HUGE _PAGE_PMD_HUGE
/* Advertise support for _PAGE_SPECIAL */
#define __HAVE_ARCH_PTE_SPECIAL
@ -658,6 +659,13 @@ static inline unsigned long pmd_large(pmd_t pmd)
return pte_val(pte) & _PAGE_PMD_HUGE;
}
static inline unsigned long pmd_pfn(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));
return pte_pfn(pte);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline unsigned long pmd_young(pmd_t pmd)
{
@ -673,13 +681,6 @@ static inline unsigned long pmd_write(pmd_t pmd)
return pte_write(pte);
}
static inline unsigned long pmd_pfn(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));
return pte_pfn(pte);
}
static inline unsigned long pmd_trans_huge(pmd_t pmd)
{
pte_t pte = __pte(pmd_val(pmd));
@ -771,13 +772,15 @@ static inline int pmd_present(pmd_t pmd)
* the top bits outside of the range of any physical address size we
* support are clear as well. We also validate the physical itself.
*/
#define pmd_bad(pmd) ((pmd_val(pmd) & ~PAGE_MASK) || \
!__kern_addr_valid(pmd_val(pmd)))
#define pmd_bad(pmd) (pmd_val(pmd) & ~PAGE_MASK)
#define pud_none(pud) (!pud_val(pud))
#define pud_bad(pud) ((pud_val(pud) & ~PAGE_MASK) || \
!__kern_addr_valid(pud_val(pud)))
#define pud_bad(pud) (pud_val(pud) & ~PAGE_MASK)
#define pgd_none(pgd) (!pgd_val(pgd))
#define pgd_bad(pgd) (pgd_val(pgd) & ~PAGE_MASK)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
void set_pmd_at(struct mm_struct *mm, unsigned long addr,
@ -815,10 +818,31 @@ static inline unsigned long __pmd_page(pmd_t pmd)
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
#define pud_present(pud) (pud_val(pud) != 0U)
#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL)
#define pgd_page_vaddr(pgd) \
((unsigned long) __va(pgd_val(pgd)))
#define pgd_present(pgd) (pgd_val(pgd) != 0U)
#define pgd_clear(pgdp) (pgd_val(*(pgd)) = 0UL)
static inline unsigned long pud_large(pud_t pud)
{
pte_t pte = __pte(pud_val(pud));
return pte_val(pte) & _PAGE_PMD_HUGE;
}
static inline unsigned long pud_pfn(pud_t pud)
{
pte_t pte = __pte(pud_val(pud));
return pte_pfn(pte);
}
/* Same in both SUN4V and SUN4U. */
#define pte_none(pte) (!pte_val(pte))
#define pgd_set(pgdp, pudp) \
(pgd_val(*(pgdp)) = (__pa((unsigned long) (pudp))))
/* to find an entry in a page-table-directory. */
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
@ -826,6 +850,11 @@ static inline unsigned long __pmd_page(pmd_t pmd)
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
/* Find an entry in the third-level page table.. */
#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
#define pud_offset(pgdp, address) \
((pud_t *) pgd_page_vaddr(*(pgdp)) + pud_index(address))
/* Find an entry in the second-level page table.. */
#define pmd_offset(pudp, address) \
((pmd_t *) pud_page_vaddr(*(pudp)) + \
@ -898,7 +927,6 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
#endif
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
extern pmd_t swapper_low_pmd_dir[PTRS_PER_PMD];
void paging_init(void);
unsigned long find_ecache_flush_span(unsigned long size);

View File

@ -48,6 +48,8 @@ unsigned long safe_compute_effective_address(struct pt_regs *, unsigned int);
#endif
#ifdef CONFIG_SPARC64
void __init start_early_boot(void);
/* unaligned_64.c */
int handle_ldf_stq(u32 insn, struct pt_regs *regs);
void handle_ld_nf(u32 insn, struct pt_regs *regs);

View File

@ -45,6 +45,8 @@
#define SUN4V_CHIP_NIAGARA3 0x03
#define SUN4V_CHIP_NIAGARA4 0x04
#define SUN4V_CHIP_NIAGARA5 0x05
#define SUN4V_CHIP_SPARC_M6 0x06
#define SUN4V_CHIP_SPARC_M7 0x07
#define SUN4V_CHIP_SPARC64X 0x8a
#define SUN4V_CHIP_UNKNOWN 0xff

View File

@ -63,7 +63,8 @@ struct thread_info {
struct pt_regs *kern_una_regs;
unsigned int kern_una_insn;
unsigned long fpregs[0] __attribute__ ((aligned(64)));
unsigned long fpregs[(7 * 256) / sizeof(unsigned long)]
__attribute__ ((aligned(64)));
};
#endif /* !(__ASSEMBLY__) */
@ -102,6 +103,7 @@ struct thread_info {
#define FAULT_CODE_ITLB 0x04 /* Miss happened in I-TLB */
#define FAULT_CODE_WINFIXUP 0x08 /* Miss happened during spill/fill */
#define FAULT_CODE_BLKCOMMIT 0x10 /* Use blk-commit ASI in copy_page */
#define FAULT_CODE_BAD_RA 0x20 /* Bad RA for sun4v */
#if PAGE_SHIFT == 13
#define THREAD_SIZE (2*PAGE_SIZE)

View File

@ -133,9 +133,24 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
sub TSB, 0x8, TSB; \
TSB_STORE(TSB, TAG);
/* Do a kernel page table walk. Leaves physical PTE pointer in
* REG1. Jumps to FAIL_LABEL on early page table walk termination.
* VADDR will not be clobbered, but REG2 will.
/* Do a kernel page table walk. Leaves valid PTE value in
* REG1. Jumps to FAIL_LABEL on early page table walk
* termination. VADDR will not be clobbered, but REG2 will.
*
* There are two masks we must apply to propagate bits from
* the virtual address into the PTE physical address field
* when dealing with huge pages. This is because the page
* table boundaries do not match the huge page size(s) the
* hardware supports.
*
* In these cases we propagate the bits that are below the
* page table level where we saw the huge page mapping, but
* are still within the relevant physical bits for the huge
* page size in question. So for PMD mappings (which fall on
* bit 23, for 8MB per PMD) we must propagate bit 22 for a
* 4MB huge page. For huge PUDs (which fall on bit 33, for
* 8GB per PUD), we have to accomodate 256MB and 2GB huge
* pages. So for those we propagate bits 32 to 28.
*/
#define KERN_PGTABLE_WALK(VADDR, REG1, REG2, FAIL_LABEL) \
sethi %hi(swapper_pg_dir), REG1; \
@ -145,15 +160,40 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
andn REG2, 0x7, REG2; \
ldx [REG1 + REG2], REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - PMD_SHIFT, REG2; \
sethi %uhi(_PAGE_PUD_HUGE), REG2; \
brz,pn REG1, FAIL_LABEL; \
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
sethi %hi(0xf8000000), REG2; \
bne,pt %xcc, 697f; \
sllx REG2, 1, REG2; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
add REG1, REG2, REG1;
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
sethi %uhi(_PAGE_PMD_HUGE), REG2; \
brz,pn REG1, FAIL_LABEL; \
sllx REG2, 32, REG2; \
andcc REG1, REG2, %g0; \
be,pn %xcc, 698f; \
sethi %hi(0x400000), REG2; \
697: brgez,pn REG1, FAIL_LABEL; \
andn REG1, REG2, REG1; \
and VADDR, REG2, REG2; \
ba,pt %xcc, 699f; \
or REG1, REG2, REG1; \
698: sllx VADDR, 64 - PMD_SHIFT, REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brgez,pn REG1, FAIL_LABEL; \
nop; \
699:
/* PMD has been loaded into REG1, interpret the value, seeing
* if it is a HUGE PMD or a normal one. If it is not valid
@ -197,6 +237,11 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [PHYS_PGD + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PUD_SHIFT + PUD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
andn REG2, 0x7, REG2; \
ldxa [REG1 + REG2] ASI_PHYS_USE_EC, REG1; \
brz,pn REG1, FAIL_LABEL; \
sllx VADDR, 64 - (PMD_SHIFT + PMD_BITS), REG2; \
srlx REG2, 64 - PAGE_SHIFT, REG2; \
@ -246,8 +291,6 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
(KERNEL_TSB_SIZE_BYTES / 16)
#define KERNEL_TSB4M_NENTRIES 4096
#define KTSB_PHYS_SHIFT 15
/* Do a kernel TSB lookup at tl>0 on VADDR+TAG, branch to OK_LABEL
* on TSB hit. REG1, REG2, REG3, and REG4 are used as temporaries
* and the found TTE will be left in REG1. REG3 and REG4 must
@ -256,17 +299,15 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
* VADDR and TAG will be preserved and not clobbered by this macro.
*/
#define KERN_TSB_LOOKUP_TL1(VADDR, TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
661: sethi %hi(swapper_tsb), REG1; \
or REG1, %lo(swapper_tsb), REG1; \
661: sethi %uhi(swapper_tsb), REG1; \
sethi %hi(swapper_tsb), REG2; \
or REG1, %ulo(swapper_tsb), REG1; \
or REG2, %lo(swapper_tsb), REG2; \
.section .swapper_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
661: nop; \
.section .tsb_ldquad_phys_patch, "ax"; \
.word 661b; \
sllx REG1, KTSB_PHYS_SHIFT, REG1; \
sllx REG1, KTSB_PHYS_SHIFT, REG1; \
.previous; \
sllx REG1, 32, REG1; \
or REG1, REG2, REG1; \
srlx VADDR, PAGE_SHIFT, REG2; \
and REG2, (KERNEL_TSB_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
@ -281,17 +322,15 @@ extern struct tsb_phys_patch_entry __tsb_phys_patch, __tsb_phys_patch_end;
* we can make use of that for the index computation.
*/
#define KERN_TSB4M_LOOKUP_TL1(TAG, REG1, REG2, REG3, REG4, OK_LABEL) \
661: sethi %hi(swapper_4m_tsb), REG1; \
or REG1, %lo(swapper_4m_tsb), REG1; \
661: sethi %uhi(swapper_4m_tsb), REG1; \
sethi %hi(swapper_4m_tsb), REG2; \
or REG1, %ulo(swapper_4m_tsb), REG1; \
or REG2, %lo(swapper_4m_tsb), REG2; \
.section .swapper_4m_tsb_phys_patch, "ax"; \
.word 661b; \
.previous; \
661: nop; \
.section .tsb_ldquad_phys_patch, "ax"; \
.word 661b; \
sllx REG1, KTSB_PHYS_SHIFT, REG1; \
sllx REG1, KTSB_PHYS_SHIFT, REG1; \
.previous; \
sllx REG1, 32, REG1; \
or REG1, REG2, REG1; \
and TAG, (KERNEL_TSB4M_NENTRIES - 1), REG2; \
sllx REG2, 4, REG2; \
add REG1, REG2, REG2; \

View File

@ -39,6 +39,14 @@
297: wr %o5, FPRS_FEF, %fprs; \
298:
#define VISEntryHalfFast(fail_label) \
rd %fprs, %o5; \
andcc %o5, FPRS_FEF, %g0; \
be,pt %icc, 297f; \
nop; \
ba,a,pt %xcc, fail_label; \
297: wr %o5, FPRS_FEF, %fprs;
#define VISExitHalf \
wr %o5, 0, %fprs;

View File

@ -494,6 +494,18 @@ static void __init sun4v_cpu_probe(void)
sparc_pmu_type = "niagara5";
break;
case SUN4V_CHIP_SPARC_M6:
sparc_cpu_type = "SPARC-M6";
sparc_fpu_type = "SPARC-M6 integrated FPU";
sparc_pmu_type = "sparc-m6";
break;
case SUN4V_CHIP_SPARC_M7:
sparc_cpu_type = "SPARC-M7";
sparc_fpu_type = "SPARC-M7 integrated FPU";
sparc_pmu_type = "sparc-m7";
break;
case SUN4V_CHIP_SPARC64X:
sparc_cpu_type = "SPARC64-X";
sparc_fpu_type = "SPARC64-X integrated FPU";

View File

@ -326,6 +326,8 @@ static int iterate_cpu(struct cpuinfo_tree *t, unsigned int root_index)
case SUN4V_CHIP_NIAGARA3:
case SUN4V_CHIP_NIAGARA4:
case SUN4V_CHIP_NIAGARA5:
case SUN4V_CHIP_SPARC_M6:
case SUN4V_CHIP_SPARC_M7:
case SUN4V_CHIP_SPARC64X:
rover_inc_table = niagara_iterate_method;
break;

View File

@ -1200,14 +1200,14 @@ static int ds_probe(struct vio_dev *vdev, const struct vio_device_id *id)
ds_cfg.tx_irq = vdev->tx_irq;
ds_cfg.rx_irq = vdev->rx_irq;
lp = ldc_alloc(vdev->channel_id, &ds_cfg, dp);
lp = ldc_alloc(vdev->channel_id, &ds_cfg, dp, "DS");
if (IS_ERR(lp)) {
err = PTR_ERR(lp);
goto out_free_ds_states;
}
dp->lp = lp;
err = ldc_bind(lp, "DS");
err = ldc_bind(lp);
if (err)
goto out_free_ldc;

View File

@ -24,11 +24,11 @@
mov TLB_TAG_ACCESS, %g4 ! For reload of vaddr
/* PROT ** ICACHE line 2: More real fault processing */
ldxa [%g4] ASI_DMMU, %g5 ! Put tagaccess in %g5
bgu,pn %xcc, winfix_trampoline ! Yes, perform winfixup
ldxa [%g4] ASI_DMMU, %g5 ! Put tagaccess in %g5
ba,pt %xcc, sparc64_realfault_common ! Nope, normal fault
mov FAULT_CODE_DTLB | FAULT_CODE_WRITE, %g4
nop
ba,pt %xcc, sparc64_realfault_common ! Nope, normal fault
nop
nop
nop
nop

View File

@ -65,13 +65,10 @@ struct pause_patch_entry {
extern struct pause_patch_entry __pause_3insn_patch,
__pause_3insn_patch_end;
void __init per_cpu_patch(void);
void sun4v_patch_1insn_range(struct sun4v_1insn_patch_entry *,
struct sun4v_1insn_patch_entry *);
void sun4v_patch_2insn_range(struct sun4v_2insn_patch_entry *,
struct sun4v_2insn_patch_entry *);
void __init sun4v_patch(void);
void __init boot_cpu_id_too_large(int cpu);
extern unsigned int dcache_parity_tl1_occurred;
extern unsigned int icache_parity_tl1_occurred;

View File

@ -427,6 +427,12 @@ sun4v_chip_type:
cmp %g2, '5'
be,pt %xcc, 5f
mov SUN4V_CHIP_NIAGARA5, %g4
cmp %g2, '6'
be,pt %xcc, 5f
mov SUN4V_CHIP_SPARC_M6, %g4
cmp %g2, '7'
be,pt %xcc, 5f
mov SUN4V_CHIP_SPARC_M7, %g4
ba,pt %xcc, 49f
nop
@ -583,6 +589,12 @@ niagara_tlb_fixup:
be,pt %xcc, niagara4_patch
nop
cmp %g1, SUN4V_CHIP_NIAGARA5
be,pt %xcc, niagara4_patch
nop
cmp %g1, SUN4V_CHIP_SPARC_M6
be,pt %xcc, niagara4_patch
nop
cmp %g1, SUN4V_CHIP_SPARC_M7
be,pt %xcc, niagara4_patch
nop
@ -660,14 +672,12 @@ tlb_fixup_done:
sethi %hi(init_thread_union), %g6
or %g6, %lo(init_thread_union), %g6
ldx [%g6 + TI_TASK], %g4
mov %sp, %l6
wr %g0, ASI_P, %asi
mov 1, %g1
sllx %g1, THREAD_SHIFT, %g1
sub %g1, (STACKFRAME_SZ + STACK_BIAS), %g1
add %g6, %g1, %sp
mov 0, %fp
/* Set per-cpu pointer initially to zero, this makes
* the boot-cpu use the in-kernel-image per-cpu areas
@ -694,44 +704,14 @@ tlb_fixup_done:
nop
#endif
mov %l6, %o1 ! OpenPROM stack
call prom_init
mov %l7, %o0 ! OpenPROM cif handler
/* Initialize current_thread_info()->cpu as early as possible.
* In order to do that accurately we have to patch up the get_cpuid()
* assembler sequences. And that, in turn, requires that we know
* if we are on a Starfire box or not. While we're here, patch up
* the sun4v sequences as well.
/* To create a one-register-window buffer between the kernel's
* initial stack and the last stack frame we use from the firmware,
* do the rest of the boot from a C helper function.
*/
call check_if_starfire
nop
call per_cpu_patch
nop
call sun4v_patch
nop
#ifdef CONFIG_SMP
call hard_smp_processor_id
nop
cmp %o0, NR_CPUS
blu,pt %xcc, 1f
nop
call boot_cpu_id_too_large
nop
/* Not reached... */
1:
#else
mov 0, %o0
#endif
sth %o0, [%g6 + TI_CPU]
call prom_init_report
nop
/* Off we go.... */
call start_kernel
call start_early_boot
nop
/* Not reached... */

View File

@ -46,6 +46,7 @@ static struct api_info api_table[] = {
{ .group = HV_GRP_VF_CPU, },
{ .group = HV_GRP_KT_CPU, },
{ .group = HV_GRP_VT_CPU, },
{ .group = HV_GRP_T5_CPU, },
{ .group = HV_GRP_DIAG, .flags = FLAG_PRE_API },
};

View File

@ -821,3 +821,19 @@ ENTRY(sun4v_vt_set_perfreg)
retl
nop
ENDPROC(sun4v_vt_set_perfreg)
ENTRY(sun4v_t5_get_perfreg)
mov %o1, %o4
mov HV_FAST_T5_GET_PERFREG, %o5
ta HV_FAST_TRAP
stx %o1, [%o4]
retl
nop
ENDPROC(sun4v_t5_get_perfreg)
ENTRY(sun4v_t5_set_perfreg)
mov HV_FAST_T5_SET_PERFREG, %o5
ta HV_FAST_TRAP
retl
nop
ENDPROC(sun4v_t5_set_perfreg)

View File

@ -109,7 +109,6 @@ hv_cpu_startup:
sllx %g5, THREAD_SHIFT, %g5
sub %g5, (STACKFRAME_SZ + STACK_BIAS), %g5
add %g6, %g5, %sp
mov 0, %fp
call init_irqwork_curcpu
nop

View File

@ -278,7 +278,8 @@ static void *sbus_alloc_coherent(struct device *dev, size_t len,
}
order = get_order(len_total);
if ((va = __get_free_pages(GFP_KERNEL|__GFP_COMP, order)) == 0)
va = __get_free_pages(gfp, order);
if (va == 0)
goto err_nopages;
if ((res = kzalloc(sizeof(struct resource), GFP_KERNEL)) == NULL)
@ -443,7 +444,7 @@ static void *pci32_alloc_coherent(struct device *dev, size_t len,
}
order = get_order(len_total);
va = (void *) __get_free_pages(GFP_KERNEL, order);
va = (void *) __get_free_pages(gfp, order);
if (va == NULL) {
printk("pci_alloc_consistent: no %ld pages\n", len_total>>PAGE_SHIFT);
goto err_nopages;

View File

@ -47,8 +47,6 @@
#include "cpumap.h"
#include "kstack.h"
#define NUM_IVECS (IMAP_INR + 1)
struct ino_bucket *ivector_table;
unsigned long ivector_table_pa;
@ -107,55 +105,196 @@ static void bucket_set_irq(unsigned long bucket_pa, unsigned int irq)
#define irq_work_pa(__cpu) &(trap_block[(__cpu)].irq_worklist_pa)
static struct {
unsigned int dev_handle;
unsigned int dev_ino;
unsigned int in_use;
} irq_table[NR_IRQS];
static DEFINE_SPINLOCK(irq_alloc_lock);
unsigned char irq_alloc(unsigned int dev_handle, unsigned int dev_ino)
static unsigned long hvirq_major __initdata;
static int __init early_hvirq_major(char *p)
{
unsigned long flags;
unsigned char ent;
int rc = kstrtoul(p, 10, &hvirq_major);
BUILD_BUG_ON(NR_IRQS >= 256);
return rc;
}
early_param("hvirq", early_hvirq_major);
spin_lock_irqsave(&irq_alloc_lock, flags);
static int hv_irq_version;
for (ent = 1; ent < NR_IRQS; ent++) {
if (!irq_table[ent].in_use)
break;
}
if (ent >= NR_IRQS) {
printk(KERN_ERR "IRQ: Out of virtual IRQs.\n");
ent = 0;
} else {
irq_table[ent].dev_handle = dev_handle;
irq_table[ent].dev_ino = dev_ino;
irq_table[ent].in_use = 1;
}
spin_unlock_irqrestore(&irq_alloc_lock, flags);
return ent;
/* Major version 2.0 of HV_GRP_INTR added support for the VIRQ cookie
* based interfaces, but:
*
* 1) Several OSs, Solaris and Linux included, use them even when only
* negotiating version 1.0 (or failing to negotiate at all). So the
* hypervisor has a workaround that provides the VIRQ interfaces even
* when only verion 1.0 of the API is in use.
*
* 2) Second, and more importantly, with major version 2.0 these VIRQ
* interfaces only were actually hooked up for LDC interrupts, even
* though the Hypervisor specification clearly stated:
*
* The new interrupt API functions will be available to a guest
* when it negotiates version 2.0 in the interrupt API group 0x2. When
* a guest negotiates version 2.0, all interrupt sources will only
* support using the cookie interface, and any attempt to use the
* version 1.0 interrupt APIs numbered 0xa0 to 0xa6 will result in the
* ENOTSUPPORTED error being returned.
*
* with an emphasis on "all interrupt sources".
*
* To correct this, major version 3.0 was created which does actually
* support VIRQs for all interrupt sources (not just LDC devices). So
* if we want to move completely over the cookie based VIRQs we must
* negotiate major version 3.0 or later of HV_GRP_INTR.
*/
static bool sun4v_cookie_only_virqs(void)
{
if (hv_irq_version >= 3)
return true;
return false;
}
#ifdef CONFIG_PCI_MSI
void irq_free(unsigned int irq)
static void __init irq_init_hv(void)
{
unsigned long flags;
unsigned long hv_error, major, minor = 0;
if (irq >= NR_IRQS)
if (tlb_type != hypervisor)
return;
spin_lock_irqsave(&irq_alloc_lock, flags);
if (hvirq_major)
major = hvirq_major;
else
major = 3;
irq_table[irq].in_use = 0;
hv_error = sun4v_hvapi_register(HV_GRP_INTR, major, &minor);
if (!hv_error)
hv_irq_version = major;
else
hv_irq_version = 1;
spin_unlock_irqrestore(&irq_alloc_lock, flags);
pr_info("SUN4V: Using IRQ API major %d, cookie only virqs %s\n",
hv_irq_version,
sun4v_cookie_only_virqs() ? "enabled" : "disabled");
}
/* This function is for the timer interrupt.*/
int __init arch_probe_nr_irqs(void)
{
return 1;
}
#define DEFAULT_NUM_IVECS (0xfffU)
static unsigned int nr_ivec = DEFAULT_NUM_IVECS;
#define NUM_IVECS (nr_ivec)
static unsigned int __init size_nr_ivec(void)
{
if (tlb_type == hypervisor) {
switch (sun4v_chip_type) {
/* Athena's devhandle|devino is large.*/
case SUN4V_CHIP_SPARC64X:
nr_ivec = 0xffff;
break;
}
}
return nr_ivec;
}
struct irq_handler_data {
union {
struct {
unsigned int dev_handle;
unsigned int dev_ino;
};
unsigned long sysino;
};
struct ino_bucket bucket;
unsigned long iclr;
unsigned long imap;
};
static inline unsigned int irq_data_to_handle(struct irq_data *data)
{
struct irq_handler_data *ihd = data->handler_data;
return ihd->dev_handle;
}
static inline unsigned int irq_data_to_ino(struct irq_data *data)
{
struct irq_handler_data *ihd = data->handler_data;
return ihd->dev_ino;
}
static inline unsigned long irq_data_to_sysino(struct irq_data *data)
{
struct irq_handler_data *ihd = data->handler_data;
return ihd->sysino;
}
void irq_free(unsigned int irq)
{
void *data = irq_get_handler_data(irq);
kfree(data);
irq_set_handler_data(irq, NULL);
irq_free_descs(irq, 1);
}
unsigned int irq_alloc(unsigned int dev_handle, unsigned int dev_ino)
{
int irq;
irq = __irq_alloc_descs(-1, 1, 1, numa_node_id(), NULL);
if (irq <= 0)
goto out;
return irq;
out:
return 0;
}
static unsigned int cookie_exists(u32 devhandle, unsigned int devino)
{
unsigned long hv_err, cookie;
struct ino_bucket *bucket;
unsigned int irq = 0U;
hv_err = sun4v_vintr_get_cookie(devhandle, devino, &cookie);
if (hv_err) {
pr_err("HV get cookie failed hv_err = %ld\n", hv_err);
goto out;
}
if (cookie & ((1UL << 63UL))) {
cookie = ~cookie;
bucket = (struct ino_bucket *) __va(cookie);
irq = bucket->__irq;
}
out:
return irq;
}
static unsigned int sysino_exists(u32 devhandle, unsigned int devino)
{
unsigned long sysino = sun4v_devino_to_sysino(devhandle, devino);
struct ino_bucket *bucket;
unsigned int irq;
bucket = &ivector_table[sysino];
irq = bucket_get_irq(__pa(bucket));
return irq;
}
void ack_bad_irq(unsigned int irq)
{
pr_crit("BAD IRQ ack %d\n", irq);
}
void irq_install_pre_handler(int irq,
void (*func)(unsigned int, void *, void *),
void *arg1, void *arg2)
{
pr_warn("IRQ pre handler NOT supported.\n");
}
#endif
/*
* /proc/interrupts printing:
@ -206,15 +345,6 @@ static unsigned int sun4u_compute_tid(unsigned long imap, unsigned long cpuid)
return tid;
}
struct irq_handler_data {
unsigned long iclr;
unsigned long imap;
void (*pre_handler)(unsigned int, void *, void *);
void *arg1;
void *arg2;
};
#ifdef CONFIG_SMP
static int irq_choose_cpu(unsigned int irq, const struct cpumask *affinity)
{
@ -316,8 +446,8 @@ static void sun4u_irq_eoi(struct irq_data *data)
static void sun4v_irq_enable(struct irq_data *data)
{
unsigned int ino = irq_table[data->irq].dev_ino;
unsigned long cpuid = irq_choose_cpu(data->irq, data->affinity);
unsigned int ino = irq_data_to_sysino(data);
int err;
err = sun4v_intr_settarget(ino, cpuid);
@ -337,8 +467,8 @@ static void sun4v_irq_enable(struct irq_data *data)
static int sun4v_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
{
unsigned int ino = irq_table[data->irq].dev_ino;
unsigned long cpuid = irq_choose_cpu(data->irq, mask);
unsigned int ino = irq_data_to_sysino(data);
int err;
err = sun4v_intr_settarget(ino, cpuid);
@ -351,7 +481,7 @@ static int sun4v_set_affinity(struct irq_data *data,
static void sun4v_irq_disable(struct irq_data *data)
{
unsigned int ino = irq_table[data->irq].dev_ino;
unsigned int ino = irq_data_to_sysino(data);
int err;
err = sun4v_intr_setenabled(ino, HV_INTR_DISABLED);
@ -362,7 +492,7 @@ static void sun4v_irq_disable(struct irq_data *data)
static void sun4v_irq_eoi(struct irq_data *data)
{
unsigned int ino = irq_table[data->irq].dev_ino;
unsigned int ino = irq_data_to_sysino(data);
int err;
err = sun4v_intr_setstate(ino, HV_INTR_STATE_IDLE);
@ -373,14 +503,13 @@ static void sun4v_irq_eoi(struct irq_data *data)
static void sun4v_virq_enable(struct irq_data *data)
{
unsigned long cpuid, dev_handle, dev_ino;
unsigned long dev_handle = irq_data_to_handle(data);
unsigned long dev_ino = irq_data_to_ino(data);
unsigned long cpuid;
int err;
cpuid = irq_choose_cpu(data->irq, data->affinity);
dev_handle = irq_table[data->irq].dev_handle;
dev_ino = irq_table[data->irq].dev_ino;
err = sun4v_vintr_set_target(dev_handle, dev_ino, cpuid);
if (err != HV_EOK)
printk(KERN_ERR "sun4v_vintr_set_target(%lx,%lx,%lu): "
@ -403,14 +532,13 @@ static void sun4v_virq_enable(struct irq_data *data)
static int sun4v_virt_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
{
unsigned long cpuid, dev_handle, dev_ino;
unsigned long dev_handle = irq_data_to_handle(data);
unsigned long dev_ino = irq_data_to_ino(data);
unsigned long cpuid;
int err;
cpuid = irq_choose_cpu(data->irq, mask);
dev_handle = irq_table[data->irq].dev_handle;
dev_ino = irq_table[data->irq].dev_ino;
err = sun4v_vintr_set_target(dev_handle, dev_ino, cpuid);
if (err != HV_EOK)
printk(KERN_ERR "sun4v_vintr_set_target(%lx,%lx,%lu): "
@ -422,11 +550,10 @@ static int sun4v_virt_set_affinity(struct irq_data *data,
static void sun4v_virq_disable(struct irq_data *data)
{
unsigned long dev_handle, dev_ino;
unsigned long dev_handle = irq_data_to_handle(data);
unsigned long dev_ino = irq_data_to_ino(data);
int err;
dev_handle = irq_table[data->irq].dev_handle;
dev_ino = irq_table[data->irq].dev_ino;
err = sun4v_vintr_set_valid(dev_handle, dev_ino,
HV_INTR_DISABLED);
@ -438,12 +565,10 @@ static void sun4v_virq_disable(struct irq_data *data)
static void sun4v_virq_eoi(struct irq_data *data)
{
unsigned long dev_handle, dev_ino;
unsigned long dev_handle = irq_data_to_handle(data);
unsigned long dev_ino = irq_data_to_ino(data);
int err;
dev_handle = irq_table[data->irq].dev_handle;
dev_ino = irq_table[data->irq].dev_ino;
err = sun4v_vintr_set_state(dev_handle, dev_ino,
HV_INTR_STATE_IDLE);
if (err != HV_EOK)
@ -479,31 +604,10 @@ static struct irq_chip sun4v_virq = {
.flags = IRQCHIP_EOI_IF_HANDLED,
};
static void pre_flow_handler(struct irq_data *d)
{
struct irq_handler_data *handler_data = irq_data_get_irq_handler_data(d);
unsigned int ino = irq_table[d->irq].dev_ino;
handler_data->pre_handler(ino, handler_data->arg1, handler_data->arg2);
}
void irq_install_pre_handler(int irq,
void (*func)(unsigned int, void *, void *),
void *arg1, void *arg2)
{
struct irq_handler_data *handler_data = irq_get_handler_data(irq);
handler_data->pre_handler = func;
handler_data->arg1 = arg1;
handler_data->arg2 = arg2;
__irq_set_preflow_handler(irq, pre_flow_handler);
}
unsigned int build_irq(int inofixup, unsigned long iclr, unsigned long imap)
{
struct ino_bucket *bucket;
struct irq_handler_data *handler_data;
struct ino_bucket *bucket;
unsigned int irq;
int ino;
@ -537,119 +641,166 @@ out:
return irq;
}
static unsigned int sun4v_build_common(unsigned long sysino,
struct irq_chip *chip)
static unsigned int sun4v_build_common(u32 devhandle, unsigned int devino,
void (*handler_data_init)(struct irq_handler_data *data,
u32 devhandle, unsigned int devino),
struct irq_chip *chip)
{
struct ino_bucket *bucket;
struct irq_handler_data *handler_data;
struct irq_handler_data *data;
unsigned int irq;
BUG_ON(tlb_type != hypervisor);
bucket = &ivector_table[sysino];
irq = bucket_get_irq(__pa(bucket));
if (!irq) {
irq = irq_alloc(0, sysino);
bucket_set_irq(__pa(bucket), irq);
irq_set_chip_and_handler_name(irq, chip, handle_fasteoi_irq,
"IVEC");
}
handler_data = irq_get_handler_data(irq);
if (unlikely(handler_data))
irq = irq_alloc(devhandle, devino);
if (!irq)
goto out;
handler_data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
if (unlikely(!handler_data)) {
prom_printf("IRQ: kzalloc(irq_handler_data) failed.\n");
prom_halt();
data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
if (unlikely(!data)) {
pr_err("IRQ handler data allocation failed.\n");
irq_free(irq);
irq = 0;
goto out;
}
irq_set_handler_data(irq, handler_data);
/* Catch accidental accesses to these things. IMAP/ICLR handling
* is done by hypervisor calls on sun4v platforms, not by direct
* register accesses.
irq_set_handler_data(irq, data);
handler_data_init(data, devhandle, devino);
irq_set_chip_and_handler_name(irq, chip, handle_fasteoi_irq, "IVEC");
data->imap = ~0UL;
data->iclr = ~0UL;
out:
return irq;
}
static unsigned long cookie_assign(unsigned int irq, u32 devhandle,
unsigned int devino)
{
struct irq_handler_data *ihd = irq_get_handler_data(irq);
unsigned long hv_error, cookie;
/* handler_irq needs to find the irq. cookie is seen signed in
* sun4v_dev_mondo and treated as a non ivector_table delivery.
*/
handler_data->imap = ~0UL;
handler_data->iclr = ~0UL;
ihd->bucket.__irq = irq;
cookie = ~__pa(&ihd->bucket);
hv_error = sun4v_vintr_set_cookie(devhandle, devino, cookie);
if (hv_error)
pr_err("HV vintr set cookie failed = %ld\n", hv_error);
return hv_error;
}
static void cookie_handler_data(struct irq_handler_data *data,
u32 devhandle, unsigned int devino)
{
data->dev_handle = devhandle;
data->dev_ino = devino;
}
static unsigned int cookie_build_irq(u32 devhandle, unsigned int devino,
struct irq_chip *chip)
{
unsigned long hv_error;
unsigned int irq;
irq = sun4v_build_common(devhandle, devino, cookie_handler_data, chip);
hv_error = cookie_assign(irq, devhandle, devino);
if (hv_error) {
irq_free(irq);
irq = 0;
}
return irq;
}
static unsigned int sun4v_build_cookie(u32 devhandle, unsigned int devino)
{
unsigned int irq;
irq = cookie_exists(devhandle, devino);
if (irq)
goto out;
irq = cookie_build_irq(devhandle, devino, &sun4v_virq);
out:
return irq;
}
static void sysino_set_bucket(unsigned int irq)
{
struct irq_handler_data *ihd = irq_get_handler_data(irq);
struct ino_bucket *bucket;
unsigned long sysino;
sysino = sun4v_devino_to_sysino(ihd->dev_handle, ihd->dev_ino);
BUG_ON(sysino >= nr_ivec);
bucket = &ivector_table[sysino];
bucket_set_irq(__pa(bucket), irq);
}
static void sysino_handler_data(struct irq_handler_data *data,
u32 devhandle, unsigned int devino)
{
unsigned long sysino;
sysino = sun4v_devino_to_sysino(devhandle, devino);
data->sysino = sysino;
}
static unsigned int sysino_build_irq(u32 devhandle, unsigned int devino,
struct irq_chip *chip)
{
unsigned int irq;
irq = sun4v_build_common(devhandle, devino, sysino_handler_data, chip);
if (!irq)
goto out;
sysino_set_bucket(irq);
out:
return irq;
}
static int sun4v_build_sysino(u32 devhandle, unsigned int devino)
{
int irq;
irq = sysino_exists(devhandle, devino);
if (irq)
goto out;
irq = sysino_build_irq(devhandle, devino, &sun4v_irq);
out:
return irq;
}
unsigned int sun4v_build_irq(u32 devhandle, unsigned int devino)
{
unsigned long sysino = sun4v_devino_to_sysino(devhandle, devino);
return sun4v_build_common(sysino, &sun4v_irq);
}
unsigned int sun4v_build_virq(u32 devhandle, unsigned int devino)
{
struct irq_handler_data *handler_data;
unsigned long hv_err, cookie;
struct ino_bucket *bucket;
unsigned int irq;
bucket = kzalloc(sizeof(struct ino_bucket), GFP_ATOMIC);
if (unlikely(!bucket))
return 0;
/* The only reference we store to the IRQ bucket is
* by physical address which kmemleak can't see, tell
* it that this object explicitly is not a leak and
* should be scanned.
*/
kmemleak_not_leak(bucket);
__flush_dcache_range((unsigned long) bucket,
((unsigned long) bucket +
sizeof(struct ino_bucket)));
irq = irq_alloc(devhandle, devino);
bucket_set_irq(__pa(bucket), irq);
irq_set_chip_and_handler_name(irq, &sun4v_virq, handle_fasteoi_irq,
"IVEC");
handler_data = kzalloc(sizeof(struct irq_handler_data), GFP_ATOMIC);
if (unlikely(!handler_data))
return 0;
/* In order to make the LDC channel startup sequence easier,
* especially wrt. locking, we do not let request_irq() enable
* the interrupt.
*/
irq_set_status_flags(irq, IRQ_NOAUTOEN);
irq_set_handler_data(irq, handler_data);
/* Catch accidental accesses to these things. IMAP/ICLR handling
* is done by hypervisor calls on sun4v platforms, not by direct
* register accesses.
*/
handler_data->imap = ~0UL;
handler_data->iclr = ~0UL;
cookie = ~__pa(bucket);
hv_err = sun4v_vintr_set_cookie(devhandle, devino, cookie);
if (hv_err) {
prom_printf("IRQ: Fatal, cannot set cookie for [%x:%x] "
"err=%lu\n", devhandle, devino, hv_err);
prom_halt();
}
if (sun4v_cookie_only_virqs())
irq = sun4v_build_cookie(devhandle, devino);
else
irq = sun4v_build_sysino(devhandle, devino);
return irq;
}
void ack_bad_irq(unsigned int irq)
unsigned int sun4v_build_virq(u32 devhandle, unsigned int devino)
{
unsigned int ino = irq_table[irq].dev_ino;
int irq;
if (!ino)
ino = 0xdeadbeef;
irq = cookie_build_irq(devhandle, devino, &sun4v_virq);
if (!irq)
goto out;
printk(KERN_CRIT "Unexpected IRQ from ino[%x] irq[%u]\n",
ino, irq);
/* This is borrowed from the original function.
*/
irq_set_status_flags(irq, IRQ_NOAUTOEN);
out:
return irq;
}
void *hardirq_stack[NR_CPUS];
@ -720,9 +871,12 @@ void fixup_irqs(void)
for (irq = 0; irq < NR_IRQS; irq++) {
struct irq_desc *desc = irq_to_desc(irq);
struct irq_data *data = irq_desc_get_irq_data(desc);
struct irq_data *data;
unsigned long flags;
if (!desc)
continue;
data = irq_desc_get_irq_data(desc);
raw_spin_lock_irqsave(&desc->lock, flags);
if (desc->action && !irqd_is_per_cpu(data)) {
if (data->chip->irq_set_affinity)
@ -922,16 +1076,22 @@ static struct irqaction timer_irq_action = {
.name = "timer",
};
/* Only invoked on boot processor. */
void __init init_IRQ(void)
static void __init irq_ivector_init(void)
{
unsigned long size;
unsigned long size, order;
unsigned int ivecs;
map_prom_timers();
kill_prom_timer();
/* If we are doing cookie only VIRQs then we do not need the ivector
* table to process interrupts.
*/
if (sun4v_cookie_only_virqs())
return;
size = sizeof(struct ino_bucket) * NUM_IVECS;
ivector_table = kzalloc(size, GFP_KERNEL);
ivecs = size_nr_ivec();
size = sizeof(struct ino_bucket) * ivecs;
order = get_order(size);
ivector_table = (struct ino_bucket *)
__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
if (!ivector_table) {
prom_printf("Fatal error, cannot allocate ivector_table\n");
prom_halt();
@ -940,6 +1100,15 @@ void __init init_IRQ(void)
((unsigned long) ivector_table) + size);
ivector_table_pa = __pa(ivector_table);
}
/* Only invoked on boot processor.*/
void __init init_IRQ(void)
{
irq_init_hv();
irq_ivector_init();
map_prom_timers();
kill_prom_timer();
if (tlb_type == hypervisor)
sun4v_init_mondo_queues();

View File

@ -47,14 +47,6 @@ kvmap_itlb_vmalloc_addr:
KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_itlb_longpath)
TSB_LOCK_TAG(%g1, %g2, %g7)
/* Load and check PTE. */
ldxa [%g5] ASI_PHYS_USE_EC, %g5
mov 1, %g7
sllx %g7, TSB_TAG_INVALID_BIT, %g7
brgez,a,pn %g5, kvmap_itlb_longpath
TSB_STORE(%g1, %g7)
TSB_WRITE(%g1, %g5, %g6)
/* fallthrough to TLB load */
@ -118,6 +110,12 @@ kvmap_dtlb_obp:
ba,pt %xcc, kvmap_dtlb_load
nop
kvmap_linear_early:
sethi %hi(kern_linear_pte_xor), %g7
ldx [%g7 + %lo(kern_linear_pte_xor)], %g2
ba,pt %xcc, kvmap_dtlb_tsb4m_load
xor %g2, %g4, %g5
.align 32
kvmap_dtlb_tsb4m_load:
TSB_LOCK_TAG(%g1, %g2, %g7)
@ -146,105 +144,17 @@ kvmap_dtlb_4v:
/* Correct TAG_TARGET is already in %g6, check 4mb TSB. */
KERN_TSB4M_LOOKUP_TL1(%g6, %g5, %g1, %g2, %g3, kvmap_dtlb_load)
#endif
/* TSB entry address left in %g1, lookup linear PTE.
* Must preserve %g1 and %g6 (TAG).
/* Linear mapping TSB lookup failed. Fallthrough to kernel
* page table based lookup.
*/
kvmap_dtlb_tsb4m_miss:
/* Clear the PAGE_OFFSET top virtual bits, shift
* down to get PFN, and make sure PFN is in range.
*/
661: sllx %g4, 0, %g5
.section .page_offset_shift_patch, "ax"
.word 661b
.previous
/* Check to see if we know about valid memory at the 4MB
* chunk this physical address will reside within.
*/
661: srlx %g5, MAX_PHYS_ADDRESS_BITS, %g2
.section .page_offset_shift_patch, "ax"
.word 661b
.previous
brnz,pn %g2, kvmap_dtlb_longpath
nop
/* This unconditional branch and delay-slot nop gets patched
* by the sethi sequence once the bitmap is properly setup.
*/
.globl valid_addr_bitmap_insn
valid_addr_bitmap_insn:
ba,pt %xcc, 2f
nop
.subsection 2
.globl valid_addr_bitmap_patch
valid_addr_bitmap_patch:
sethi %hi(sparc64_valid_addr_bitmap), %g7
or %g7, %lo(sparc64_valid_addr_bitmap), %g7
.previous
661: srlx %g5, ILOG2_4MB, %g2
.section .page_offset_shift_patch, "ax"
.word 661b
.previous
srlx %g2, 6, %g5
and %g2, 63, %g2
sllx %g5, 3, %g5
ldx [%g7 + %g5], %g5
mov 1, %g7
sllx %g7, %g2, %g7
andcc %g5, %g7, %g0
be,pn %xcc, kvmap_dtlb_longpath
2: sethi %hi(kpte_linear_bitmap), %g2
/* Get the 256MB physical address index. */
661: sllx %g4, 0, %g5
.section .page_offset_shift_patch, "ax"
.word 661b
.previous
or %g2, %lo(kpte_linear_bitmap), %g2
661: srlx %g5, ILOG2_256MB, %g5
.section .page_offset_shift_patch, "ax"
.word 661b
.previous
and %g5, (32 - 1), %g7
/* Divide by 32 to get the offset into the bitmask. */
srlx %g5, 5, %g5
add %g7, %g7, %g7
sllx %g5, 3, %g5
/* kern_linear_pte_xor[(mask >> shift) & 3)] */
ldx [%g2 + %g5], %g2
srlx %g2, %g7, %g7
sethi %hi(kern_linear_pte_xor), %g5
and %g7, 3, %g7
or %g5, %lo(kern_linear_pte_xor), %g5
sllx %g7, 3, %g7
ldx [%g5 + %g7], %g2
.globl kvmap_linear_patch
kvmap_linear_patch:
ba,pt %xcc, kvmap_dtlb_tsb4m_load
xor %g2, %g4, %g5
ba,a,pt %xcc, kvmap_linear_early
kvmap_dtlb_vmalloc_addr:
KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_dtlb_longpath)
TSB_LOCK_TAG(%g1, %g2, %g7)
/* Load and check PTE. */
ldxa [%g5] ASI_PHYS_USE_EC, %g5
mov 1, %g7
sllx %g7, TSB_TAG_INVALID_BIT, %g7
brgez,a,pn %g5, kvmap_dtlb_longpath
TSB_STORE(%g1, %g7)
TSB_WRITE(%g1, %g5, %g6)
/* fallthrough to TLB load */
@ -276,13 +186,8 @@ kvmap_dtlb_load:
#ifdef CONFIG_SPARSEMEM_VMEMMAP
kvmap_vmemmap:
sub %g4, %g5, %g5
srlx %g5, ILOG2_4MB, %g5
sethi %hi(vmemmap_table), %g1
sllx %g5, 3, %g5
or %g1, %lo(vmemmap_table), %g1
ba,pt %xcc, kvmap_dtlb_load
ldx [%g1 + %g5], %g5
KERN_PGTABLE_WALK(%g4, %g5, %g2, kvmap_dtlb_longpath)
ba,a,pt %xcc, kvmap_dtlb_load
#endif
kvmap_dtlb_nonlinear:
@ -294,8 +199,8 @@ kvmap_dtlb_nonlinear:
#ifdef CONFIG_SPARSEMEM_VMEMMAP
/* Do not use the TSB for vmemmap. */
mov (VMEMMAP_BASE >> 40), %g5
sllx %g5, 40, %g5
sethi %hi(VMEMMAP_BASE), %g5
ldx [%g5 + %lo(VMEMMAP_BASE)], %g5
cmp %g4,%g5
bgeu,pn %xcc, kvmap_vmemmap
nop
@ -307,8 +212,8 @@ kvmap_dtlb_tsbmiss:
sethi %hi(MODULES_VADDR), %g5
cmp %g4, %g5
blu,pn %xcc, kvmap_dtlb_longpath
mov (VMALLOC_END >> 40), %g5
sllx %g5, 40, %g5
sethi %hi(VMALLOC_END), %g5
ldx [%g5 + %lo(VMALLOC_END)], %g5
cmp %g4, %g5
bgeu,pn %xcc, kvmap_dtlb_longpath
nop

View File

@ -1078,7 +1078,8 @@ static void ldc_iommu_release(struct ldc_channel *lp)
struct ldc_channel *ldc_alloc(unsigned long id,
const struct ldc_channel_config *cfgp,
void *event_arg)
void *event_arg,
const char *name)
{
struct ldc_channel *lp;
const struct ldc_mode_ops *mops;
@ -1093,6 +1094,8 @@ struct ldc_channel *ldc_alloc(unsigned long id,
err = -EINVAL;
if (!cfgp)
goto out_err;
if (!name)
goto out_err;
switch (cfgp->mode) {
case LDC_MODE_RAW:
@ -1185,6 +1188,21 @@ struct ldc_channel *ldc_alloc(unsigned long id,
INIT_HLIST_HEAD(&lp->mh_list);
snprintf(lp->rx_irq_name, LDC_IRQ_NAME_MAX, "%s RX", name);
snprintf(lp->tx_irq_name, LDC_IRQ_NAME_MAX, "%s TX", name);
err = request_irq(lp->cfg.rx_irq, ldc_rx, 0,
lp->rx_irq_name, lp);
if (err)
goto out_free_txq;
err = request_irq(lp->cfg.tx_irq, ldc_tx, 0,
lp->tx_irq_name, lp);
if (err) {
free_irq(lp->cfg.rx_irq, lp);
goto out_free_txq;
}
return lp;
out_free_txq:
@ -1237,31 +1255,14 @@ EXPORT_SYMBOL(ldc_free);
* state. This does not initiate a handshake, ldc_connect() does
* that.
*/
int ldc_bind(struct ldc_channel *lp, const char *name)
int ldc_bind(struct ldc_channel *lp)
{
unsigned long hv_err, flags;
int err = -EINVAL;
if (!name ||
(lp->state != LDC_STATE_INIT))
if (lp->state != LDC_STATE_INIT)
return -EINVAL;
snprintf(lp->rx_irq_name, LDC_IRQ_NAME_MAX, "%s RX", name);
snprintf(lp->tx_irq_name, LDC_IRQ_NAME_MAX, "%s TX", name);
err = request_irq(lp->cfg.rx_irq, ldc_rx, 0,
lp->rx_irq_name, lp);
if (err)
return err;
err = request_irq(lp->cfg.tx_irq, ldc_tx, 0,
lp->tx_irq_name, lp);
if (err) {
free_irq(lp->cfg.rx_irq, lp);
return err;
}
spin_lock_irqsave(&lp->lock, flags);
enable_irq(lp->cfg.rx_irq);

View File

@ -191,12 +191,41 @@ static const struct pcr_ops n4_pcr_ops = {
.pcr_nmi_disable = PCR_N4_PICNPT,
};
static u64 n5_pcr_read(unsigned long reg_num)
{
unsigned long val;
(void) sun4v_t5_get_perfreg(reg_num, &val);
return val;
}
static void n5_pcr_write(unsigned long reg_num, u64 val)
{
(void) sun4v_t5_set_perfreg(reg_num, val);
}
static const struct pcr_ops n5_pcr_ops = {
.read_pcr = n5_pcr_read,
.write_pcr = n5_pcr_write,
.read_pic = n4_pic_read,
.write_pic = n4_pic_write,
.nmi_picl_value = n4_picl_value,
.pcr_nmi_enable = (PCR_N4_PICNPT | PCR_N4_STRACE |
PCR_N4_UTRACE | PCR_N4_TOE |
(26 << PCR_N4_SL_SHIFT)),
.pcr_nmi_disable = PCR_N4_PICNPT,
};
static unsigned long perf_hsvc_group;
static unsigned long perf_hsvc_major;
static unsigned long perf_hsvc_minor;
static int __init register_perf_hsvc(void)
{
unsigned long hverror;
if (tlb_type == hypervisor) {
switch (sun4v_chip_type) {
case SUN4V_CHIP_NIAGARA1:
@ -215,6 +244,10 @@ static int __init register_perf_hsvc(void)
perf_hsvc_group = HV_GRP_VT_CPU;
break;
case SUN4V_CHIP_NIAGARA5:
perf_hsvc_group = HV_GRP_T5_CPU;
break;
default:
return -ENODEV;
}
@ -222,10 +255,12 @@ static int __init register_perf_hsvc(void)
perf_hsvc_major = 1;
perf_hsvc_minor = 0;
if (sun4v_hvapi_register(perf_hsvc_group,
perf_hsvc_major,
&perf_hsvc_minor)) {
printk("perfmon: Could not register hvapi.\n");
hverror = sun4v_hvapi_register(perf_hsvc_group,
perf_hsvc_major,
&perf_hsvc_minor);
if (hverror) {
pr_err("perfmon: Could not register hvapi(0x%lx).\n",
hverror);
return -ENODEV;
}
}
@ -254,6 +289,10 @@ static int __init setup_sun4v_pcr_ops(void)
pcr_ops = &n4_pcr_ops;
break;
case SUN4V_CHIP_NIAGARA5:
pcr_ops = &n5_pcr_ops;
break;
default:
ret = -ENODEV;
break;

View File

@ -1662,7 +1662,8 @@ static bool __init supported_pmu(void)
sparc_pmu = &niagara2_pmu;
return true;
}
if (!strcmp(sparc_pmu_type, "niagara4")) {
if (!strcmp(sparc_pmu_type, "niagara4") ||
!strcmp(sparc_pmu_type, "niagara5")) {
sparc_pmu = &niagara4_pmu;
return true;
}

View File

@ -30,6 +30,7 @@
#include <linux/cpu.h>
#include <linux/initrd.h>
#include <linux/module.h>
#include <linux/start_kernel.h>
#include <asm/io.h>
#include <asm/processor.h>
@ -174,7 +175,7 @@ char reboot_command[COMMAND_LINE_SIZE];
static struct pt_regs fake_swapper_regs = { { 0, }, 0, 0, 0, 0 };
void __init per_cpu_patch(void)
static void __init per_cpu_patch(void)
{
struct cpuid_patch_entry *p;
unsigned long ver;
@ -266,7 +267,7 @@ void sun4v_patch_2insn_range(struct sun4v_2insn_patch_entry *start,
}
}
void __init sun4v_patch(void)
static void __init sun4v_patch(void)
{
extern void sun4v_hvapi_init(void);
@ -335,14 +336,25 @@ static void __init pause_patch(void)
}
}
#ifdef CONFIG_SMP
void __init boot_cpu_id_too_large(int cpu)
void __init start_early_boot(void)
{
prom_printf("Serious problem, boot cpu id (%d) >= NR_CPUS (%d)\n",
cpu, NR_CPUS);
prom_halt();
int cpu;
check_if_starfire();
per_cpu_patch();
sun4v_patch();
cpu = hard_smp_processor_id();
if (cpu >= NR_CPUS) {
prom_printf("Serious problem, boot cpu id (%d) >= NR_CPUS (%d)\n",
cpu, NR_CPUS);
prom_halt();
}
current_thread_info()->cpu = cpu;
prom_init_report();
start_kernel();
}
#endif
/* On Ultra, we support all of the v8 capabilities. */
unsigned long sparc64_elf_hwcap = (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR |
@ -500,12 +512,16 @@ static void __init init_sparc64_elf_hwcap(void)
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= HWCAP_SPARC_BLKINIT;
if (sun4v_chip_type == SUN4V_CHIP_NIAGARA2 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= HWCAP_SPARC_N2;
}
@ -533,6 +549,8 @@ static void __init init_sparc64_elf_hwcap(void)
sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= (AV_SPARC_VIS | AV_SPARC_VIS2 |
AV_SPARC_ASI_BLK_INIT |
@ -540,6 +558,8 @@ static void __init init_sparc64_elf_hwcap(void)
if (sun4v_chip_type == SUN4V_CHIP_NIAGARA3 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA4 ||
sun4v_chip_type == SUN4V_CHIP_NIAGARA5 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M6 ||
sun4v_chip_type == SUN4V_CHIP_SPARC_M7 ||
sun4v_chip_type == SUN4V_CHIP_SPARC64X)
cap |= (AV_SPARC_VIS3 | AV_SPARC_HPC |
AV_SPARC_FMAF);

View File

@ -1467,6 +1467,13 @@ static void __init pcpu_populate_pte(unsigned long addr)
pud_t *pud;
pmd_t *pmd;
if (pgd_none(*pgd)) {
pud_t *new;
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
pgd_populate(&init_mm, pgd, new);
}
pud = pud_offset(pgd, addr);
if (pud_none(*pud)) {
pmd_t *new;

View File

@ -195,6 +195,11 @@ sun4v_tsb_miss_common:
ldx [%g2 + TRAP_PER_CPU_PGD_PADDR], %g7
sun4v_itlb_error:
rdpr %tl, %g1
cmp %g1, 1
ble,pt %icc, sun4v_bad_ra
or %g0, FAULT_CODE_BAD_RA | FAULT_CODE_ITLB, %g1
sethi %hi(sun4v_err_itlb_vaddr), %g1
stx %g4, [%g1 + %lo(sun4v_err_itlb_vaddr)]
sethi %hi(sun4v_err_itlb_ctx), %g1
@ -206,15 +211,10 @@ sun4v_itlb_error:
sethi %hi(sun4v_err_itlb_error), %g1
stx %o0, [%g1 + %lo(sun4v_err_itlb_error)]
sethi %hi(1f), %g7
rdpr %tl, %g4
cmp %g4, 1
ble,pt %icc, 1f
sethi %hi(2f), %g7
ba,pt %xcc, etraptl1
or %g7, %lo(2f), %g7
1: ba,pt %xcc, etrap
2: or %g7, %lo(2b), %g7
1: or %g7, %lo(1f), %g7
mov %l4, %o1
call sun4v_itlb_error_report
add %sp, PTREGS_OFF, %o0
@ -222,6 +222,11 @@ sun4v_itlb_error:
/* NOTREACHED */
sun4v_dtlb_error:
rdpr %tl, %g1
cmp %g1, 1
ble,pt %icc, sun4v_bad_ra
or %g0, FAULT_CODE_BAD_RA | FAULT_CODE_DTLB, %g1
sethi %hi(sun4v_err_dtlb_vaddr), %g1
stx %g4, [%g1 + %lo(sun4v_err_dtlb_vaddr)]
sethi %hi(sun4v_err_dtlb_ctx), %g1
@ -233,21 +238,23 @@ sun4v_dtlb_error:
sethi %hi(sun4v_err_dtlb_error), %g1
stx %o0, [%g1 + %lo(sun4v_err_dtlb_error)]
sethi %hi(1f), %g7
rdpr %tl, %g4
cmp %g4, 1
ble,pt %icc, 1f
sethi %hi(2f), %g7
ba,pt %xcc, etraptl1
or %g7, %lo(2f), %g7
1: ba,pt %xcc, etrap
2: or %g7, %lo(2b), %g7
1: or %g7, %lo(1f), %g7
mov %l4, %o1
call sun4v_dtlb_error_report
add %sp, PTREGS_OFF, %o0
/* NOTREACHED */
sun4v_bad_ra:
or %g0, %g4, %g5
ba,pt %xcc, sparc64_realfault_common
or %g1, %g0, %g4
/* NOTREACHED */
/* Instruction Access Exception, tl0. */
sun4v_iacc:
ldxa [%g0] ASI_SCRATCHPAD, %g2

View File

@ -109,10 +109,13 @@ startup_continue:
brnz,pn %g1, 1b
nop
sethi %hi(p1275buf), %g2
or %g2, %lo(p1275buf), %g2
ldx [%g2 + 0x10], %l2
add %l2, -(192 + 128), %sp
/* Get onto temporary stack which will be in the locked
* kernel image.
*/
sethi %hi(tramp_stack), %g1
or %g1, %lo(tramp_stack), %g1
add %g1, TRAMP_STACK_SIZE, %g1
sub %g1, STACKFRAME_SZ + STACK_BIAS + 256, %sp
flushw
/* Setup the loop variables:
@ -394,7 +397,6 @@ after_lock_tlb:
sllx %g5, THREAD_SHIFT, %g5
sub %g5, (STACKFRAME_SZ + STACK_BIAS), %g5
add %g6, %g5, %sp
mov 0, %fp
rdpr %pstate, %o1
or %o1, PSTATE_IE, %o1

View File

@ -2104,6 +2104,11 @@ void sun4v_nonresum_overflow(struct pt_regs *regs)
atomic_inc(&sun4v_nonresum_oflow_cnt);
}
static void sun4v_tlb_error(struct pt_regs *regs)
{
die_if_kernel("TLB/TSB error", regs);
}
unsigned long sun4v_err_itlb_vaddr;
unsigned long sun4v_err_itlb_ctx;
unsigned long sun4v_err_itlb_pte;
@ -2111,8 +2116,7 @@ unsigned long sun4v_err_itlb_error;
void sun4v_itlb_error_report(struct pt_regs *regs, int tl)
{
if (tl > 1)
dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
printk(KERN_EMERG "SUN4V-ITLB: Error at TPC[%lx], tl %d\n",
regs->tpc, tl);
@ -2125,7 +2129,7 @@ void sun4v_itlb_error_report(struct pt_regs *regs, int tl)
sun4v_err_itlb_vaddr, sun4v_err_itlb_ctx,
sun4v_err_itlb_pte, sun4v_err_itlb_error);
prom_halt();
sun4v_tlb_error(regs);
}
unsigned long sun4v_err_dtlb_vaddr;
@ -2135,8 +2139,7 @@ unsigned long sun4v_err_dtlb_error;
void sun4v_dtlb_error_report(struct pt_regs *regs, int tl)
{
if (tl > 1)
dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
dump_tl1_traplog((struct tl1_traplog *)(regs + 1));
printk(KERN_EMERG "SUN4V-DTLB: Error at TPC[%lx], tl %d\n",
regs->tpc, tl);
@ -2149,7 +2152,7 @@ void sun4v_dtlb_error_report(struct pt_regs *regs, int tl)
sun4v_err_dtlb_vaddr, sun4v_err_dtlb_ctx,
sun4v_err_dtlb_pte, sun4v_err_dtlb_error);
prom_halt();
sun4v_tlb_error(regs);
}
void hypervisor_tlbop_error(unsigned long err, unsigned long op)

View File

@ -162,10 +162,10 @@ tsb_miss_page_table_walk_sun4v_fastpath:
nop
.previous
rdpr %tl, %g3
cmp %g3, 1
rdpr %tl, %g7
cmp %g7, 1
bne,pn %xcc, winfix_trampoline
nop
mov %g3, %g4
ba,pt %xcc, etrap
rd %pc, %g7
call hugetlb_setup

View File

@ -714,7 +714,7 @@ int vio_ldc_alloc(struct vio_driver_state *vio,
cfg.tx_irq = vio->vdev->tx_irq;
cfg.rx_irq = vio->vdev->rx_irq;
lp = ldc_alloc(vio->vdev->channel_id, &cfg, event_arg);
lp = ldc_alloc(vio->vdev->channel_id, &cfg, event_arg, vio->name);
if (IS_ERR(lp))
return PTR_ERR(lp);
@ -746,7 +746,7 @@ void vio_port_up(struct vio_driver_state *vio)
err = 0;
if (state == LDC_STATE_INIT) {
err = ldc_bind(vio->lp, vio->name);
err = ldc_bind(vio->lp);
if (err)
printk(KERN_WARNING "%s: Port %lu bind failed, "
"err=%d\n",

View File

@ -35,8 +35,9 @@ jiffies = jiffies_64;
SECTIONS
{
/* swapper_low_pmd_dir is sparc64 only */
swapper_low_pmd_dir = 0x0000000000402000;
#ifdef CONFIG_SPARC64
swapper_pg_dir = 0x0000000000402000;
#endif
. = INITIAL_ADDRESS;
.text TEXTSTART :
{
@ -122,11 +123,6 @@ SECTIONS
*(.swapper_4m_tsb_phys_patch)
__swapper_4m_tsb_phys_patch_end = .;
}
.page_offset_shift_patch : {
__page_offset_shift_patch = .;
*(.page_offset_shift_patch)
__page_offset_shift_patch_end = .;
}
.popc_3insn_patch : {
__popc_3insn_patch = .;
*(.popc_3insn_patch)

View File

@ -41,6 +41,10 @@
#endif
#endif
#if !defined(EX_LD) && !defined(EX_ST)
#define NON_USER_COPY
#endif
#ifndef EX_LD
#define EX_LD(x) x
#endif
@ -197,9 +201,13 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
mov EX_RETVAL(%o3), %o0
.Llarge_src_unaligned:
#ifdef NON_USER_COPY
VISEntryHalfFast(.Lmedium_vis_entry_fail)
#else
VISEntryHalf
#endif
andn %o2, 0x3f, %o4
sub %o2, %o4, %o2
VISEntryHalf
alignaddr %o1, %g0, %g1
add %o1, %o4, %o1
EX_LD(LOAD(ldd, %g1 + 0x00, %f0))
@ -240,6 +248,10 @@ FUNC_NAME: /* %o0=dst, %o1=src, %o2=len */
nop
ba,a,pt %icc, .Lmedium_unaligned
#ifdef NON_USER_COPY
.Lmedium_vis_entry_fail:
or %o0, %o1, %g2
#endif
.Lmedium:
LOAD(prefetch, %o1 + 0x40, #n_reads_strong)
andcc %g2, 0x7, %g0

View File

@ -3,8 +3,9 @@
* Copyright (C) 1996,1997 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
* Copyright (C) 1996 David S. Miller (davem@caip.rutgers.edu)
*
* Returns 0, if ok, and number of bytes not yet set if exception
* occurs and we were called as clear_user.
* Calls to memset returns initial %o0. Calls to bzero returns 0, if ok, and
* number of bytes not yet set if exception occurs and we were called as
* clear_user.
*/
#include <asm/ptrace.h>
@ -65,6 +66,8 @@ __bzero_begin:
.globl __memset_start, __memset_end
__memset_start:
memset:
mov %o0, %g1
mov 1, %g4
and %o1, 0xff, %g3
sll %g3, 8, %g2
or %g3, %g2, %g3
@ -89,6 +92,7 @@ memset:
sub %o0, %o2, %o0
__bzero:
clr %g4
mov %g0, %g3
1:
cmp %o1, 7
@ -151,8 +155,8 @@ __bzero:
bne,a 8f
EX(stb %g3, [%o0], and %o1, 1)
8:
retl
clr %o0
b 0f
nop
7:
be 13b
orcc %o1, 0, %g0
@ -164,6 +168,12 @@ __bzero:
bne 8b
EX(stb %g3, [%o0 - 1], add %o1, 1)
0:
andcc %g4, 1, %g0
be 5f
nop
retl
mov %g1, %o0
5:
retl
clr %o0
__memset_end:

View File

@ -346,6 +346,9 @@ retry:
down_read(&mm->mmap_sem);
}
if (fault_code & FAULT_CODE_BAD_RA)
goto do_sigbus;
vma = find_vma(mm, address);
if (!vma)
goto bad_area;

View File

@ -160,6 +160,36 @@ static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
return 1;
}
int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages)
{
struct mm_struct *mm = current->mm;
unsigned long addr, len, end;
unsigned long next, flags;
pgd_t *pgdp;
int nr = 0;
start &= PAGE_MASK;
addr = start;
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
local_irq_save(flags);
pgdp = pgd_offset(mm, addr);
do {
pgd_t pgd = *pgdp;
next = pgd_addr_end(addr, end);
if (pgd_none(pgd))
break;
if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
break;
} while (pgdp++, addr = next, addr != end);
local_irq_restore(flags);
return nr;
}
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages)
{

View File

@ -75,7 +75,6 @@ unsigned long kern_linear_pte_xor[4] __read_mostly;
* 'cpu' properties, but we need to have this table setup before the
* MDESC is initialized.
*/
unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
#ifndef CONFIG_DEBUG_PAGEALLOC
/* A special kernel TSB for 4MB, 256MB, 2GB and 16GB linear mappings.
@ -84,10 +83,11 @@ unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
*/
extern struct tsb swapper_4m_tsb[KERNEL_TSB4M_NENTRIES];
#endif
extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];
static unsigned long cpu_pgsz_mask;
#define MAX_BANKS 32
#define MAX_BANKS 1024
static struct linux_prom64_registers pavail[MAX_BANKS];
static int pavail_ents;
@ -165,10 +165,6 @@ static void __init read_obp_memory(const char *property,
cmp_p64, NULL);
}
unsigned long sparc64_valid_addr_bitmap[VALID_ADDR_BITMAP_BYTES /
sizeof(unsigned long)];
EXPORT_SYMBOL(sparc64_valid_addr_bitmap);
/* Kernel physical address base and size in bytes. */
unsigned long kern_base __read_mostly;
unsigned long kern_size __read_mostly;
@ -840,7 +836,10 @@ static int find_node(unsigned long addr)
if ((addr & p->mask) == p->val)
return i;
}
return -1;
/* The following condition has been observed on LDOM guests.*/
WARN_ONCE(1, "find_node: A physical address doesn't match a NUMA node"
" rule. Some physical memory will be owned by node 0.");
return 0;
}
static u64 memblock_nid_range(u64 start, u64 end, int *nid)
@ -1366,9 +1365,144 @@ static unsigned long __init bootmem_init(unsigned long phys_base)
static struct linux_prom64_registers pall[MAX_BANKS] __initdata;
static int pall_ents __initdata;
#ifdef CONFIG_DEBUG_PAGEALLOC
static unsigned long max_phys_bits = 40;
bool kern_addr_valid(unsigned long addr)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
if ((long)addr < 0L) {
unsigned long pa = __pa(addr);
if ((addr >> max_phys_bits) != 0UL)
return false;
return pfn_valid(pa >> PAGE_SHIFT);
}
if (addr >= (unsigned long) KERNBASE &&
addr < (unsigned long)&_end)
return true;
pgd = pgd_offset_k(addr);
if (pgd_none(*pgd))
return 0;
pud = pud_offset(pgd, addr);
if (pud_none(*pud))
return 0;
if (pud_large(*pud))
return pfn_valid(pud_pfn(*pud));
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
return 0;
if (pmd_large(*pmd))
return pfn_valid(pmd_pfn(*pmd));
pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte))
return 0;
return pfn_valid(pte_pfn(*pte));
}
EXPORT_SYMBOL(kern_addr_valid);
static unsigned long __ref kernel_map_hugepud(unsigned long vstart,
unsigned long vend,
pud_t *pud)
{
const unsigned long mask16gb = (1UL << 34) - 1UL;
u64 pte_val = vstart;
/* Each PUD is 8GB */
if ((vstart & mask16gb) ||
(vend - vstart <= mask16gb)) {
pte_val ^= kern_linear_pte_xor[2];
pud_val(*pud) = pte_val | _PAGE_PUD_HUGE;
return vstart + PUD_SIZE;
}
pte_val ^= kern_linear_pte_xor[3];
pte_val |= _PAGE_PUD_HUGE;
vend = vstart + mask16gb + 1UL;
while (vstart < vend) {
pud_val(*pud) = pte_val;
pte_val += PUD_SIZE;
vstart += PUD_SIZE;
pud++;
}
return vstart;
}
static bool kernel_can_map_hugepud(unsigned long vstart, unsigned long vend,
bool guard)
{
if (guard && !(vstart & ~PUD_MASK) && (vend - vstart) >= PUD_SIZE)
return true;
return false;
}
static unsigned long __ref kernel_map_hugepmd(unsigned long vstart,
unsigned long vend,
pmd_t *pmd)
{
const unsigned long mask256mb = (1UL << 28) - 1UL;
const unsigned long mask2gb = (1UL << 31) - 1UL;
u64 pte_val = vstart;
/* Each PMD is 8MB */
if ((vstart & mask256mb) ||
(vend - vstart <= mask256mb)) {
pte_val ^= kern_linear_pte_xor[0];
pmd_val(*pmd) = pte_val | _PAGE_PMD_HUGE;
return vstart + PMD_SIZE;
}
if ((vstart & mask2gb) ||
(vend - vstart <= mask2gb)) {
pte_val ^= kern_linear_pte_xor[1];
pte_val |= _PAGE_PMD_HUGE;
vend = vstart + mask256mb + 1UL;
} else {
pte_val ^= kern_linear_pte_xor[2];
pte_val |= _PAGE_PMD_HUGE;
vend = vstart + mask2gb + 1UL;
}
while (vstart < vend) {
pmd_val(*pmd) = pte_val;
pte_val += PMD_SIZE;
vstart += PMD_SIZE;
pmd++;
}
return vstart;
}
static bool kernel_can_map_hugepmd(unsigned long vstart, unsigned long vend,
bool guard)
{
if (guard && !(vstart & ~PMD_MASK) && (vend - vstart) >= PMD_SIZE)
return true;
return false;
}
static unsigned long __ref kernel_map_range(unsigned long pstart,
unsigned long pend, pgprot_t prot)
unsigned long pend, pgprot_t prot,
bool use_huge)
{
unsigned long vstart = PAGE_OFFSET + pstart;
unsigned long vend = PAGE_OFFSET + pend;
@ -1387,19 +1521,34 @@ static unsigned long __ref kernel_map_range(unsigned long pstart,
pmd_t *pmd;
pte_t *pte;
if (pgd_none(*pgd)) {
pud_t *new;
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
alloc_bytes += PAGE_SIZE;
pgd_populate(&init_mm, pgd, new);
}
pud = pud_offset(pgd, vstart);
if (pud_none(*pud)) {
pmd_t *new;
if (kernel_can_map_hugepud(vstart, vend, use_huge)) {
vstart = kernel_map_hugepud(vstart, vend, pud);
continue;
}
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
alloc_bytes += PAGE_SIZE;
pud_populate(&init_mm, pud, new);
}
pmd = pmd_offset(pud, vstart);
if (!pmd_present(*pmd)) {
if (pmd_none(*pmd)) {
pte_t *new;
if (kernel_can_map_hugepmd(vstart, vend, use_huge)) {
vstart = kernel_map_hugepmd(vstart, vend, pmd);
continue;
}
new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
alloc_bytes += PAGE_SIZE;
pmd_populate_kernel(&init_mm, pmd, new);
@ -1422,100 +1571,34 @@ static unsigned long __ref kernel_map_range(unsigned long pstart,
return alloc_bytes;
}
static void __init flush_all_kernel_tsbs(void)
{
int i;
for (i = 0; i < KERNEL_TSB_NENTRIES; i++) {
struct tsb *ent = &swapper_tsb[i];
ent->tag = (1UL << TSB_TAG_INVALID_BIT);
}
#ifndef CONFIG_DEBUG_PAGEALLOC
for (i = 0; i < KERNEL_TSB4M_NENTRIES; i++) {
struct tsb *ent = &swapper_4m_tsb[i];
ent->tag = (1UL << TSB_TAG_INVALID_BIT);
}
#endif
}
extern unsigned int kvmap_linear_patch[1];
#endif /* CONFIG_DEBUG_PAGEALLOC */
static void __init kpte_set_val(unsigned long index, unsigned long val)
{
unsigned long *ptr = kpte_linear_bitmap;
val <<= ((index % (BITS_PER_LONG / 2)) * 2);
ptr += (index / (BITS_PER_LONG / 2));
*ptr |= val;
}
static const unsigned long kpte_shift_min = 28; /* 256MB */
static const unsigned long kpte_shift_max = 34; /* 16GB */
static const unsigned long kpte_shift_incr = 3;
static unsigned long kpte_mark_using_shift(unsigned long start, unsigned long end,
unsigned long shift)
{
unsigned long size = (1UL << shift);
unsigned long mask = (size - 1UL);
unsigned long remains = end - start;
unsigned long val;
if (remains < size || (start & mask))
return start;
/* VAL maps:
*
* shift 28 --> kern_linear_pte_xor index 1
* shift 31 --> kern_linear_pte_xor index 2
* shift 34 --> kern_linear_pte_xor index 3
*/
val = ((shift - kpte_shift_min) / kpte_shift_incr) + 1;
remains &= ~mask;
if (shift != kpte_shift_max)
remains = size;
while (remains) {
unsigned long index = start >> kpte_shift_min;
kpte_set_val(index, val);
start += 1UL << kpte_shift_min;
remains -= 1UL << kpte_shift_min;
}
return start;
}
static void __init mark_kpte_bitmap(unsigned long start, unsigned long end)
{
unsigned long smallest_size, smallest_mask;
unsigned long s;
smallest_size = (1UL << kpte_shift_min);
smallest_mask = (smallest_size - 1UL);
while (start < end) {
unsigned long orig_start = start;
for (s = kpte_shift_max; s >= kpte_shift_min; s -= kpte_shift_incr) {
start = kpte_mark_using_shift(start, end, s);
if (start != orig_start)
break;
}
if (start == orig_start)
start = (start + smallest_size) & ~smallest_mask;
}
}
static void __init init_kpte_bitmap(void)
{
unsigned long i;
for (i = 0; i < pall_ents; i++) {
unsigned long phys_start, phys_end;
phys_start = pall[i].phys_addr;
phys_end = phys_start + pall[i].reg_size;
mark_kpte_bitmap(phys_start, phys_end);
}
}
static void __init kernel_physical_mapping_init(void)
{
#ifdef CONFIG_DEBUG_PAGEALLOC
unsigned long i, mem_alloced = 0UL;
bool use_huge = true;
#ifdef CONFIG_DEBUG_PAGEALLOC
use_huge = false;
#endif
for (i = 0; i < pall_ents; i++) {
unsigned long phys_start, phys_end;
@ -1523,7 +1606,7 @@ static void __init kernel_physical_mapping_init(void)
phys_end = phys_start + pall[i].reg_size;
mem_alloced += kernel_map_range(phys_start, phys_end,
PAGE_KERNEL);
PAGE_KERNEL, use_huge);
}
printk("Allocated %ld bytes for kernel page tables.\n",
@ -1532,8 +1615,9 @@ static void __init kernel_physical_mapping_init(void)
kvmap_linear_patch[0] = 0x01000000; /* nop */
flushi(&kvmap_linear_patch[0]);
flush_all_kernel_tsbs();
__flush_tlb_all();
#endif
}
#ifdef CONFIG_DEBUG_PAGEALLOC
@ -1543,7 +1627,7 @@ void kernel_map_pages(struct page *page, int numpages, int enable)
unsigned long phys_end = phys_start + (numpages * PAGE_SIZE);
kernel_map_range(phys_start, phys_end,
(enable ? PAGE_KERNEL : __pgprot(0)));
(enable ? PAGE_KERNEL : __pgprot(0)), false);
flush_tsb_kernel_range(PAGE_OFFSET + phys_start,
PAGE_OFFSET + phys_end);
@ -1571,76 +1655,56 @@ unsigned long __init find_ecache_flush_span(unsigned long size)
unsigned long PAGE_OFFSET;
EXPORT_SYMBOL(PAGE_OFFSET);
static void __init page_offset_shift_patch_one(unsigned int *insn, unsigned long phys_bits)
{
unsigned long final_shift;
unsigned int val = *insn;
unsigned int cnt;
unsigned long VMALLOC_END = 0x0000010000000000UL;
EXPORT_SYMBOL(VMALLOC_END);
/* We are patching in ilog2(max_supported_phys_address), and
* we are doing so in a manner similar to a relocation addend.
* That is, we are adding the shift value to whatever value
* is in the shift instruction count field already.
*/
cnt = (val & 0x3f);
val &= ~0x3f;
/* If we are trying to shift >= 64 bits, clear the destination
* register. This can happen when phys_bits ends up being equal
* to MAX_PHYS_ADDRESS_BITS.
*/
final_shift = (cnt + (64 - phys_bits));
if (final_shift >= 64) {
unsigned int rd = (val >> 25) & 0x1f;
val = 0x80100000 | (rd << 25);
} else {
val |= final_shift;
}
*insn = val;
__asm__ __volatile__("flush %0"
: /* no outputs */
: "r" (insn));
}
static void __init page_offset_shift_patch(unsigned long phys_bits)
{
extern unsigned int __page_offset_shift_patch;
extern unsigned int __page_offset_shift_patch_end;
unsigned int *p;
p = &__page_offset_shift_patch;
while (p < &__page_offset_shift_patch_end) {
unsigned int *insn = (unsigned int *)(unsigned long)*p;
page_offset_shift_patch_one(insn, phys_bits);
p++;
}
}
unsigned long sparc64_va_hole_top = 0xfffff80000000000UL;
unsigned long sparc64_va_hole_bottom = 0x0000080000000000UL;
static void __init setup_page_offset(void)
{
unsigned long max_phys_bits = 40;
if (tlb_type == cheetah || tlb_type == cheetah_plus) {
/* Cheetah/Panther support a full 64-bit virtual
* address, so we can use all that our page tables
* support.
*/
sparc64_va_hole_top = 0xfff0000000000000UL;
sparc64_va_hole_bottom = 0x0010000000000000UL;
max_phys_bits = 42;
} else if (tlb_type == hypervisor) {
switch (sun4v_chip_type) {
case SUN4V_CHIP_NIAGARA1:
case SUN4V_CHIP_NIAGARA2:
/* T1 and T2 support 48-bit virtual addresses. */
sparc64_va_hole_top = 0xffff800000000000UL;
sparc64_va_hole_bottom = 0x0000800000000000UL;
max_phys_bits = 39;
break;
case SUN4V_CHIP_NIAGARA3:
/* T3 supports 48-bit virtual addresses. */
sparc64_va_hole_top = 0xffff800000000000UL;
sparc64_va_hole_bottom = 0x0000800000000000UL;
max_phys_bits = 43;
break;
case SUN4V_CHIP_NIAGARA4:
case SUN4V_CHIP_NIAGARA5:
case SUN4V_CHIP_SPARC64X:
default:
case SUN4V_CHIP_SPARC_M6:
/* T4 and later support 52-bit virtual addresses. */
sparc64_va_hole_top = 0xfff8000000000000UL;
sparc64_va_hole_bottom = 0x0008000000000000UL;
max_phys_bits = 47;
break;
case SUN4V_CHIP_SPARC_M7:
default:
/* M7 and later support 52-bit virtual addresses. */
sparc64_va_hole_top = 0xfff8000000000000UL;
sparc64_va_hole_bottom = 0x0008000000000000UL;
max_phys_bits = 49;
break;
}
}
@ -1650,12 +1714,16 @@ static void __init setup_page_offset(void)
prom_halt();
}
PAGE_OFFSET = PAGE_OFFSET_BY_BITS(max_phys_bits);
PAGE_OFFSET = sparc64_va_hole_top;
VMALLOC_END = ((sparc64_va_hole_bottom >> 1) +
(sparc64_va_hole_bottom >> 2));
pr_info("PAGE_OFFSET is 0x%016lx (max_phys_bits == %lu)\n",
pr_info("MM: PAGE_OFFSET is 0x%016lx (max_phys_bits == %lu)\n",
PAGE_OFFSET, max_phys_bits);
page_offset_shift_patch(max_phys_bits);
pr_info("MM: VMALLOC [0x%016lx --> 0x%016lx]\n",
VMALLOC_START, VMALLOC_END);
pr_info("MM: VMEMMAP [0x%016lx --> 0x%016lx]\n",
VMEMMAP_BASE, VMEMMAP_BASE << 1);
}
static void __init tsb_phys_patch(void)
@ -1700,21 +1768,42 @@ static void __init tsb_phys_patch(void)
#define NUM_KTSB_DESCR 1
#endif
static struct hv_tsb_descr ktsb_descr[NUM_KTSB_DESCR];
extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];
/* The swapper TSBs are loaded with a base sequence of:
*
* sethi %uhi(SYMBOL), REG1
* sethi %hi(SYMBOL), REG2
* or REG1, %ulo(SYMBOL), REG1
* or REG2, %lo(SYMBOL), REG2
* sllx REG1, 32, REG1
* or REG1, REG2, REG1
*
* When we use physical addressing for the TSB accesses, we patch the
* first four instructions in the above sequence.
*/
static void patch_one_ktsb_phys(unsigned int *start, unsigned int *end, unsigned long pa)
{
pa >>= KTSB_PHYS_SHIFT;
unsigned long high_bits, low_bits;
high_bits = (pa >> 32) & 0xffffffff;
low_bits = (pa >> 0) & 0xffffffff;
while (start < end) {
unsigned int *ia = (unsigned int *)(unsigned long)*start;
ia[0] = (ia[0] & ~0x3fffff) | (pa >> 10);
ia[0] = (ia[0] & ~0x3fffff) | (high_bits >> 10);
__asm__ __volatile__("flush %0" : : "r" (ia));
ia[1] = (ia[1] & ~0x3ff) | (pa & 0x3ff);
ia[1] = (ia[1] & ~0x3fffff) | (low_bits >> 10);
__asm__ __volatile__("flush %0" : : "r" (ia + 1));
ia[2] = (ia[2] & ~0x1fff) | (high_bits & 0x3ff);
__asm__ __volatile__("flush %0" : : "r" (ia + 2));
ia[3] = (ia[3] & ~0x1fff) | (low_bits & 0x3ff);
__asm__ __volatile__("flush %0" : : "r" (ia + 3));
start++;
}
}
@ -1853,7 +1942,6 @@ static void __init sun4v_linear_pte_xor_finalize(void)
/* paging_init() sets up the page tables */
static unsigned long last_valid_pfn;
pgd_t swapper_pg_dir[PTRS_PER_PGD];
static void sun4u_pgprot_init(void);
static void sun4v_pgprot_init(void);
@ -1956,16 +2044,10 @@ void __init paging_init(void)
*/
init_mm.pgd += ((shift) / (sizeof(pgd_t)));
memset(swapper_low_pmd_dir, 0, sizeof(swapper_low_pmd_dir));
memset(swapper_pg_dir, 0, sizeof(swapper_pg_dir));
/* Now can init the kernel/bad page tables. */
pud_set(pud_offset(&swapper_pg_dir[0], 0),
swapper_low_pmd_dir + (shift / sizeof(pgd_t)));
inherit_prom_mappings();
init_kpte_bitmap();
/* Ok, we can use our TLB miss and window trap handlers safely. */
setup_tba();
@ -2072,70 +2154,6 @@ int page_in_phys_avail(unsigned long paddr)
return 0;
}
static struct linux_prom64_registers pavail_rescan[MAX_BANKS] __initdata;
static int pavail_rescan_ents __initdata;
/* Certain OBP calls, such as fetching "available" properties, can
* claim physical memory. So, along with initializing the valid
* address bitmap, what we do here is refetch the physical available
* memory list again, and make sure it provides at least as much
* memory as 'pavail' does.
*/
static void __init setup_valid_addr_bitmap_from_pavail(unsigned long *bitmap)
{
int i;
read_obp_memory("available", &pavail_rescan[0], &pavail_rescan_ents);
for (i = 0; i < pavail_ents; i++) {
unsigned long old_start, old_end;
old_start = pavail[i].phys_addr;
old_end = old_start + pavail[i].reg_size;
while (old_start < old_end) {
int n;
for (n = 0; n < pavail_rescan_ents; n++) {
unsigned long new_start, new_end;
new_start = pavail_rescan[n].phys_addr;
new_end = new_start +
pavail_rescan[n].reg_size;
if (new_start <= old_start &&
new_end >= (old_start + PAGE_SIZE)) {
set_bit(old_start >> ILOG2_4MB, bitmap);
goto do_next_page;
}
}
prom_printf("mem_init: Lost memory in pavail\n");
prom_printf("mem_init: OLD start[%lx] size[%lx]\n",
pavail[i].phys_addr,
pavail[i].reg_size);
prom_printf("mem_init: NEW start[%lx] size[%lx]\n",
pavail_rescan[i].phys_addr,
pavail_rescan[i].reg_size);
prom_printf("mem_init: Cannot continue, aborting.\n");
prom_halt();
do_next_page:
old_start += PAGE_SIZE;
}
}
}
static void __init patch_tlb_miss_handler_bitmap(void)
{
extern unsigned int valid_addr_bitmap_insn[];
extern unsigned int valid_addr_bitmap_patch[];
valid_addr_bitmap_insn[1] = valid_addr_bitmap_patch[1];
mb();
valid_addr_bitmap_insn[0] = valid_addr_bitmap_patch[0];
flushi(&valid_addr_bitmap_insn[0]);
}
static void __init register_page_bootmem_info(void)
{
#ifdef CONFIG_NEED_MULTIPLE_NODES
@ -2148,18 +2166,6 @@ static void __init register_page_bootmem_info(void)
}
void __init mem_init(void)
{
unsigned long addr, last;
addr = PAGE_OFFSET + kern_base;
last = PAGE_ALIGN(kern_size) + addr;
while (addr < last) {
set_bit(__pa(addr) >> ILOG2_4MB, sparc64_valid_addr_bitmap);
addr += PAGE_SIZE;
}
setup_valid_addr_bitmap_from_pavail(sparc64_valid_addr_bitmap);
patch_tlb_miss_handler_bitmap();
high_memory = __va(last_valid_pfn << PAGE_SHIFT);
register_page_bootmem_info();
@ -2249,18 +2255,9 @@ unsigned long _PAGE_CACHE __read_mostly;
EXPORT_SYMBOL(_PAGE_CACHE);
#ifdef CONFIG_SPARSEMEM_VMEMMAP
unsigned long vmemmap_table[VMEMMAP_SIZE];
static long __meminitdata addr_start, addr_end;
static int __meminitdata node_start;
int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
int node)
{
unsigned long phys_start = (vstart - VMEMMAP_BASE);
unsigned long phys_end = (vend - VMEMMAP_BASE);
unsigned long addr = phys_start & VMEMMAP_CHUNK_MASK;
unsigned long end = VMEMMAP_ALIGN(phys_end);
unsigned long pte_base;
pte_base = (_PAGE_VALID | _PAGE_SZ4MB_4U |
@ -2271,47 +2268,52 @@ int __meminit vmemmap_populate(unsigned long vstart, unsigned long vend,
_PAGE_CP_4V | _PAGE_CV_4V |
_PAGE_P_4V | _PAGE_W_4V);
for (; addr < end; addr += VMEMMAP_CHUNK) {
unsigned long *vmem_pp =
vmemmap_table + (addr >> VMEMMAP_CHUNK_SHIFT);
void *block;
pte_base |= _PAGE_PMD_HUGE;
vstart = vstart & PMD_MASK;
vend = ALIGN(vend, PMD_SIZE);
for (; vstart < vend; vstart += PMD_SIZE) {
pgd_t *pgd = pgd_offset_k(vstart);
unsigned long pte;
pud_t *pud;
pmd_t *pmd;
if (pgd_none(*pgd)) {
pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
if (!new)
return -ENOMEM;
pgd_populate(&init_mm, pgd, new);
}
pud = pud_offset(pgd, vstart);
if (pud_none(*pud)) {
pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
if (!new)
return -ENOMEM;
pud_populate(&init_mm, pud, new);
}
pmd = pmd_offset(pud, vstart);
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node);
if (!(*vmem_pp & _PAGE_VALID)) {
block = vmemmap_alloc_block(1UL << ILOG2_4MB, node);
if (!block)
return -ENOMEM;
*vmem_pp = pte_base | __pa(block);
/* check to see if we have contiguous blocks */
if (addr_end != addr || node_start != node) {
if (addr_start)
printk(KERN_DEBUG " [%lx-%lx] on node %d\n",
addr_start, addr_end-1, node_start);
addr_start = addr;
node_start = node;
}
addr_end = addr + VMEMMAP_CHUNK;
pmd_val(*pmd) = pte_base | __pa(block);
}
}
return 0;
}
void __meminit vmemmap_populate_print_last(void)
{
if (addr_start) {
printk(KERN_DEBUG " [%lx-%lx] on node %d\n",
addr_start, addr_end-1, node_start);
addr_start = 0;
addr_end = 0;
node_start = 0;
}
return 0;
}
void vmemmap_free(unsigned long start, unsigned long end)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
static void prot_init_common(unsigned long page_none,
@ -2787,8 +2789,8 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
do_flush_tlb_kernel_range(start, LOW_OBP_ADDRESS);
}
if (end > HI_OBP_ADDRESS) {
flush_tsb_kernel_range(end, HI_OBP_ADDRESS);
do_flush_tlb_kernel_range(end, HI_OBP_ADDRESS);
flush_tsb_kernel_range(HI_OBP_ADDRESS, end);
do_flush_tlb_kernel_range(HI_OBP_ADDRESS, end);
}
} else {
flush_tsb_kernel_range(start, end);

View File

@ -8,15 +8,8 @@
*/
#define MAX_PHYS_ADDRESS (1UL << MAX_PHYS_ADDRESS_BITS)
#define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
#define KPTE_BITMAP_BYTES \
((MAX_PHYS_ADDRESS / KPTE_BITMAP_CHUNK_SZ) / 4)
#define VALID_ADDR_BITMAP_CHUNK_SZ (4UL * 1024UL * 1024UL)
#define VALID_ADDR_BITMAP_BYTES \
((MAX_PHYS_ADDRESS / VALID_ADDR_BITMAP_CHUNK_SZ) / 8)
extern unsigned long kern_linear_pte_xor[4];
extern unsigned long kpte_linear_bitmap[KPTE_BITMAP_BYTES / sizeof(unsigned long)];
extern unsigned int sparc64_highest_unlocked_tlb_ent;
extern unsigned long sparc64_kern_pri_context;
extern unsigned long sparc64_kern_pri_nuc_bits;
@ -38,15 +31,4 @@ extern unsigned long kern_locked_tte_data;
void prom_world(int enter);
#ifdef CONFIG_SPARSEMEM_VMEMMAP
#define VMEMMAP_CHUNK_SHIFT 22
#define VMEMMAP_CHUNK (1UL << VMEMMAP_CHUNK_SHIFT)
#define VMEMMAP_CHUNK_MASK ~(VMEMMAP_CHUNK - 1UL)
#define VMEMMAP_ALIGN(x) (((x)+VMEMMAP_CHUNK-1UL)&VMEMMAP_CHUNK_MASK)
#define VMEMMAP_SIZE ((((1UL << MAX_PHYSADDR_BITS) >> PAGE_SHIFT) * \
sizeof(struct page)) >> VMEMMAP_CHUNK_SHIFT)
extern unsigned long vmemmap_table[VMEMMAP_SIZE];
#endif
#endif /* _SPARC64_MM_INIT_H */

View File

@ -54,8 +54,8 @@ ENTRY(swsusp_arch_resume)
nop
/* Write PAGE_OFFSET to %g7 */
sethi %uhi(PAGE_OFFSET), %g7
sllx %g7, 32, %g7
sethi %hi(PAGE_OFFSET), %g7
ldx [%g7 + %lo(PAGE_OFFSET)], %g7
setuw (PAGE_SIZE-8), %g3

View File

@ -14,7 +14,10 @@
* the .bss section or it will break things.
*/
#define BARG_LEN 256
/* We limit BARG_LEN to 1024 because this is the size of the
* 'barg_out' command line buffer in the SILO bootloader.
*/
#define BARG_LEN 1024
struct {
int bootstr_len;
int bootstr_valid;

View File

@ -11,11 +11,10 @@
.text
.globl prom_cif_direct
prom_cif_direct:
save %sp, -192, %sp
sethi %hi(p1275buf), %o1
or %o1, %lo(p1275buf), %o1
ldx [%o1 + 0x0010], %o2 ! prom_cif_stack
save %o2, -192, %sp
ldx [%i1 + 0x0008], %l2 ! prom_cif_handler
ldx [%o1 + 0x0008], %l2 ! prom_cif_handler
mov %g4, %l0
mov %g5, %l1
mov %g6, %l3

View File

@ -26,13 +26,13 @@ phandle prom_chosen_node;
* It gets passed the pointer to the PROM vector.
*/
extern void prom_cif_init(void *, void *);
extern void prom_cif_init(void *);
void __init prom_init(void *cif_handler, void *cif_stack)
void __init prom_init(void *cif_handler)
{
phandle node;
prom_cif_init(cif_handler, cif_stack);
prom_cif_init(cif_handler);
prom_chosen_node = prom_finddevice(prom_chosen_path);
if (!prom_chosen_node || (s32)prom_chosen_node == -1)

Some files were not shown because too many files have changed in this diff Show More