Linux 2.6.23.3

revert "x86_64: allocate sparsemem memmap above 4G"
Reverted upstream by commit 6a22c57b8d Revert this commit: commit 2e1c49db4c Author: Zou Nan hai <nanhai.zou@intel.com> Date: Fri Jun 1 00:46:28 2007 -0700 x86_64: allocate sparsemem memmap above 4G This reverts commit 2e1c49db4c. First off, testing in Fedora has shown it to cause boot failures, bisected down by Martin Ebourne, and reported by Dave Jobes. So the commit will likely be reverted in the 2.6.23 stable kernels. Secondly, in the 2.6.24 model, x86-64 has now grown support for SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the bug is not visible any more, it's become invisible due to the code just being irrelevant and no longer enabled on the only architecture that this ever affected. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Martin Ebourne <fedora@ebourne.me.uk> Cc: Zou Nan hai <nanhai.zou@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16 08:24:58 -08:00 · 2007-11-16 08:22:59 -08:00 · 2007-11-16 08:22:58 -08:00 · 2007-11-16 08:22:58 -08:00 · 2007-11-16 08:22:58 -08:00 · 2007-11-16 08:22:58 -08:00
53 changed files with 450 additions and 220 deletions
--- a/Documentation/ja_JP/HOWTO
+++ b/Documentation/ja_JP/HOWTO
@ -1,4 +1,4 @@
-NOTE:
+NOTE:
 This is a version of Documentation/HOWTO translated into Japanese.
 This document is maintained by Tsugikazu Shibata <tshibata@ab.jp.nec.com>
 and the JF Project team <www.linux.or.jp/JF>.
@ -11,14 +11,14 @@ for non English (read: Japanese) speakers and is not intended as a
 fork. So if you have any comments or updates for this file, please try
 to update the original English file first.

-Last Updated: 2007/07/18
+Last Updated: 2007/09/23
 ==================================
 これは、
-linux-2.6.22/Documentation/HOWTO
+linux-2.6.23/Documentation/HOWTO
 の和訳です。

 翻訳団体： JF プロジェクト < http://www.linux.or.jp/JF/ >
-翻訳日： 2007/07/16
+翻訳日： 2007/09/19
 翻訳者： Tsugikazu Shibata <tshibata at ab dot jp dot nec dot com>
 校正者： 松倉さん <nbh--mats at nifty dot com>
         小林 雅典さん (Masanori Kobayasi) <zap03216 at nifty dot ne dot jp>
@ -27,6 +27,7 @@ linux-2.6.22/Documentation/HOWTO
         野口さん (Kenji Noguchi) <tokyo246 at gmail dot com>
         河内さん (Takayoshi Kochi) <t-kochi at bq dot jp dot nec dot com>
         岩本さん (iwamoto) <iwamoto.kn at ncos dot nec dot co dot jp>
+         内田さん (Satoshi Uchida) <s-uchida at ap dot jp dot nec dot com>
 ==================================

 Linux カーネル開発のやり方
@ -40,7 +41,7 @@ Linux カーネル開発コミュニティと共に活動するやり方を学
 手助けになります。

 もし、このドキュメントのどこかが古くなっていた場合には、このドキュメン
-トの最後にリストしたメンテナーにパッチを送ってください。
+トの最後にリストしたメンテナにパッチを送ってください。

 はじめに
 ---------
@ -59,7 +60,7 @@ Linux カーネル開発コミュニティと共に活動するやり方を学
 ネル開発者には必要です。アーキテクチャ向けの低レベル部分の開発をするの
 でなければ、(どんなアーキテクチャでも)アセンブリ(訳注: 言語)は必要あり
 ません。以下の本は、C 言語の十分な知識や何年もの経験に取って代わるもの
-ではありませんが、少なくともリファレンスとしてはいい本です。
+ではありませんが、少なくともリファレンスとしては良い本です。
 - "The C Programming Language" by Kernighan and Ritchie [Prentice Hall]
 -『プログラミング言語Ｃ第2版』(B.W. カーニハン/D.M. リッチー著 石田晴久訳) [共立出版]
 - "Practical C Programming" by Steve Oualline [O'Reilly]
@ -76,7 +77,7 @@ Linux カーネル開発コミュニティと共に活動するやり方を学
 ときどき、カーネルがツールチェインや C 言語拡張に置いている前提がどう
 なっているのかわかりにくいことがあり、また、残念なことに決定的なリファ
 レンスは存在しません。情報を得るには、gcc の info ページ( info gcc )を
-みてください。
+見てください。

 あなたは既存の開発コミュニティと一緒に作業する方法を学ぼうとしているこ
 とに留意してください。そのコミュニティは、コーディング、スタイル、
@ -92,7 +93,7 @@ Linux カーネル開発コミュニティと共に活動するやり方を学

 Linux カーネルのソースコードは GPL ライセンスの下でリリースされていま
 す。ライセンスの詳細については、ソースツリーのメインディレクトリに存在
-する、COPYING のファイルをみてください。もしライセンスについてさらに質
+する、COPYING のファイルを見てください。もしライセンスについてさらに質
 問があれば、Linux Kernel メーリングリストに質問するのではなく、どうぞ
 法律家に相談してください。メーリングリストの人達は法律家ではなく、法的
 問題については彼らの声明はあてにするべきではありません。
@ -109,7 +110,8 @@ Linux カーネルソースツリーは幅広い範囲のドキュメントを
 新しいドキュメントファイルも追加することを勧めます。
 カーネルの変更が、カーネルがユーザ空間に公開しているインターフェイスの
 変更を引き起こす場合、その変更を説明するマニュアルページのパッチや情報
-をマニュアルページのメンテナ mtk-manpages@gmx.net に送ることを勧めます。
+をマニュアルページのメンテナ mtk-manpages@gmx.net に送ることを勧めま
+す。

 以下はカーネルソースツリーに含まれている読んでおくべきファイルの一覧で
 す-
@ -117,7 +119,7 @@ Linux カーネルソースツリーは幅広い範囲のドキュメントを
  README
    このファイルは Linuxカーネルの簡単な背景とカーネルを設定(訳注
    configure )し、生成(訳注 build )するために必要なことは何かが書かれ
-    ています。カーネルに関して初めての人はここからスタートするとよいで
+    ています。カーネルに関して初めての人はここからスタートすると良いで
    しょう。

  Documentation/Changes
@ -128,7 +130,7 @@ Linux カーネルソースツリーは幅広い範囲のドキュメントを
  Documentation/CodingStyle
    これは Linux カーネルのコーディングスタイルと背景にある理由を記述
    しています。全ての新しいコードはこのドキュメントにあるガイドライン
-    に従っていることを期待されています。大部分のメンテナーはこれらのルー
+    に従っていることを期待されています。大部分のメンテナはこれらのルー
    ルに従っているものだけを受け付け、多くの人は正しいスタイルのコード
    だけをレビューします。

@ -168,16 +170,16 @@ Linux カーネルソースツリーは幅広い範囲のドキュメントを
    支援してください。

  Documentation/ManagementStyle
-    このドキュメントは Linux カーネルのメンテナー達がどう行動するか、
+    このドキュメントは Linux カーネルのメンテナ達がどう行動するか、
    彼らの手法の背景にある共有されている精神について記述しています。こ
    れはカーネル開発の初心者なら（もしくは、単に興味があるだけの人でも）
-    重要です。なぜならこのドキュメントは、カーネルメンテナー達の独特な
+    重要です。なぜならこのドキュメントは、カーネルメンテナ達の独特な
    行動についての多くの誤解や混乱を解消するからです。

  Documentation/stable_kernel_rules.txt
    このファイルはどのように stable カーネルのリリースが行われるかのルー
    ルが記述されています。そしてこれらのリリースの中のどこかで変更を取
-    り入れてもらいたい場合に何をすればいいかが示されています。
+    り入れてもらいたい場合に何をすれば良いかが示されています。

  Documentation/kernel-docs.txt
 　　カーネル開発に付随する外部ドキュメントのリストです。もしあなたが
@ -218,9 +220,9 @@ web サイトには、コードの構成、サブシステム、現在存在す
 ここには、また、カーネルのコンパイルのやり方やパッチの当て方などの間接
 的な基本情報も記述されています。

-あなたがどこからスタートしてよいかわからないが、Linux カーネル開発コミュ
+あなたがどこからスタートして良いかわからないが、Linux カーネル開発コミュ
 ニティに参加して何かすることをさがしている場合には、Linux kernel
-Janitor's プロジェクトにいけばよいでしょう -
+Janitor's プロジェクトにいけば良いでしょう -
 	http://janitor.kernelnewbies.org/
 ここはそのようなスタートをするのにうってつけの場所です。ここには、
 Linux カーネルソースツリーの中に含まれる、きれいにし、修正しなければな
@ -243,7 +245,7 @@ Linux カーネルソースツリーの中に含まれる、きれいにし、
 自己参照方式で、索引がついた web 形式で、ソースコードを参照することが
 できます。この最新の素晴しいカーネルコードのリポジトリは以下で見つかり
 ます-
-	http://sosdg.org/~coywolf/lxr/
+	http://sosdg.org/~qiyong/lxr/

 開発プロセス
 -----------------------
@ -265,9 +267,9 @@ Linux カーネルの開発プロセスは現在幾つかの異なるメイン
 以下のとおり-

  - 新しいカーネルがリリースされた直後に、2週間の特別期間が設けられ、
-    この期間中に、メンテナー達は Linus に大きな差分を送ることができま
-    す。このような差分は通常 -mm カーネルに数週間含まれてきたパッチで
-    す。 大きな変更は git(カーネルのソース管理ツール、詳細は
+    この期間中に、メンテナ達は Linus に大きな差分を送ることができます。
+    このような差分は通常 -mm カーネルに数週間含まれてきたパッチです。
+    大きな変更は git(カーネルのソース管理ツール、詳細は
    http://git.or.cz/  参照) を使って送るのが好ましいやり方ですが、パッ
    チファイルの形式のまま送るのでも十分です。

@ -285,6 +287,10 @@ Linux カーネルの開発プロセスは現在幾つかの異なるメイン
    に安定した状態にあると判断したときにリリースされます。目標は毎週新
    しい -rc カーネルをリリースすることです。

+   - 以下の URL で各 -rc リリースに存在する既知の後戻り問題のリスト
+     が追跡されます-
+     http://kernelnewbies.org/known_regressions
+
  - このプロセスはカーネルが 「準備ができた」と考えられるまで継続しま
    す。このプロセスはだいたい 6週間継続します。

@ -331,8 +337,8 @@ Andrew は個別のサブシステムカーネルツリーとパッチを全て
 linux-kernel メーリングリストで収集された多数のパッチと同時に一つにま
 とめます。
 このツリーは新機能とパッチが検証される場となります。ある期間の間パッチ
-が -mm に入って価値を証明されたら、Andrew やサブシステムメンテナが、メ
-インラインへ入れるように Linus にプッシュします。
+が -mm に入って価値を証明されたら、Andrew やサブシステムメンテナが、
+メインラインへ入れるように Linus にプッシュします。

 メインカーネルツリーに含めるために Linus に送る前に、すべての新しいパッ
 チが -mm ツリーでテストされることが強く推奨されます。
@ -460,7 +466,7 @@ MAINTAINERS ファイルにリストがありますので参照してくださ
 せん-
 彼らはあなたのパッチの行毎にコメントを入れたいので、そのためにはそうす
 るしかありません。あなたのメールプログラムが空白やタブを圧縮しないよう
-に確認した方がいいです。最初の良いテストとしては、自分にメールを送って
+に確認した方が良いです。最初の良いテストとしては、自分にメールを送って
 みて、そのパッチを自分で当ててみることです。もしそれがうまく行かないな
 ら、あなたのメールプログラムを直してもらうか、正しく動くように変えるべ
 きです。
@ -507,14 +513,14 @@ MAINTAINERS ファイルにリストがありますので参照してくださ
 とも普通のことです。これはあなたのパッチが受け入れられないということで
 は *ありません*、そしてあなた自身に反対することを意味するのでも *ありま
 せん*。単に自分のパッチに対して指摘された問題を全て修正して再送すれば
-いいのです。
+良いのです。


 カーネルコミュニティと企業組織のちがい
 -----------------------------------------------------------------

 カーネルコミュニティは大部分の伝統的な会社の開発環境とは異ったやり方で
-動いています。以下は問題を避けるためにできるとよいことののリストです-
+動いています。以下は問題を避けるためにできると良いことのリストです-

  あなたの提案する変更について言うときのうまい言い方：

@ -525,7 +531,7 @@ MAINTAINERS ファイルにリストがありますので参照してくださ
    - "以下は一連の小さなパッチ群ですが..."
    - "これは典型的なマシンでの性能を向上させます.."

-  やめた方がいい悪い言い方：
+  やめた方が良い悪い言い方：

    - このやり方で AIX/ptx/Solaris ではできたので、できるはずだ
    - 私はこれを20年もの間やってきた、だから
@ -575,10 +581,10 @@ Linux カーネルコミュニティは、一度に大量のコードの塊を

 1) 小さいパッチはあなたのパッチが適用される見込みを大きくします、カー
   ネルの人達はパッチが正しいかどうかを確認する時間や労力をかけないか
-   らです。5行のパッチはメンテナがたった1秒見るだけで適用できます。し
-   かし、500行のパッチは、正しいことをレビューするのに数時間かかるかも
-   しれません(時間はパッチのサイズなどにより指数関数に比例してかかりま
-   す)
+   らです。5行のパッチはメンテナがたった1秒見るだけで適用できます。
+   しかし、500行のパッチは、正しいことをレビューするのに数時間かかるか
+   もしれません(時間はパッチのサイズなどにより指数関数に比例してかかり
+   ます)

   小さいパッチは何かあったときにデバッグもとても簡単になります。パッ
   チを1個1個取り除くのは、とても大きなパッチを当てた後に(かつ、何かお
@ -587,23 +593,23 @@ Linux カーネルコミュニティは、一度に大量のコードの塊を
 2) 小さいパッチを送るだけでなく、送るまえに、書き直して、シンプルにす
   る(もしくは、単に順番を変えるだけでも)ことも、とても重要です。

-以下はカーネル開発者の Al Viro のたとえ話しです：
+以下はカーネル開発者の Al Viro のたとえ話です：

        "生徒の数学の宿題を採点する先生のことを考えてみてください、先
-        生は生徒が解に到達するまでの試行錯誤をみたいとは思わないでしょ
-        う。先生は簡潔な最高の解をみたいのです。良い生徒はこれを知って
+        生は生徒が解に到達するまでの試行錯誤を見たいとは思わないでしょ
+        う。先生は簡潔な最高の解を見たいのです。良い生徒はこれを知って
        おり、そして最終解の前の中間作業を提出することは決してないので
        す"

-        カーネル開発でもこれは同じです。メンテナー達とレビューア達は、
-        問題を解決する解の背後になる思考プロセスをみたいとは思いません。
-        彼らは単純であざやかな解決方法をみたいのです。
+        カーネル開発でもこれは同じです。メンテナ達とレビューア達は、
+        問題を解決する解の背後になる思考プロセスを見たいとは思いません。
+        彼らは単純であざやかな解決方法を見たいのです。

 あざやかな解を説明するのと、コミュニティと共に仕事をし、未解決の仕事を
 議論することのバランスをキープするのは難しいかもしれません。
 ですから、開発プロセスの早期段階で改善のためのフィードバックをもらうよ
-うにするのもいいですが、変更点を小さい部分に分割して全体ではまだ完成し
-ていない仕事を(部分的に)取り込んでもらえるようにすることもいいことです。
+うにするのも良いですが、変更点を小さい部分に分割して全体ではまだ完成し
+ていない仕事を(部分的に)取り込んでもらえるようにすることも良いことです。

 また、でき上がっていないものや、"将来直す" ようなパッチを、本流に含め
 てもらうように送っても、それは受け付けられないことを理解してください。
@ -629,7 +635,7 @@ Linux カーネルコミュニティは、一度に大量のコードの塊を
  - テスト結果

 これについて全てがどのようにあるべきかについての詳細は、以下のドキュメ
-ントの ChangeLog セクションをみてください-
+ントの ChangeLog セクションを見てください-
  "The Perfect Patch"
      http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt

--- a/2
+++ b/2
@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 23
-EXTRAVERSION =
+EXTRAVERSION = .3
 NAME = Arr Matey! A Hairy Bilge Rat!

 # *DOCUMENTATION*
--- a/arch/i386/boot/boot.h
+++ b/arch/i386/boot/boot.h
@ -17,6 +17,8 @@
 #ifndef BOOT_BOOT_H
 #define BOOT_BOOT_H

+#define STACK_SIZE	512	/* Minimum number of bytes for stack */
+
 #ifndef __ASSEMBLY__

 #include <stdarg.h>
@ -198,8 +200,6 @@ static inline int isdigit(int ch)
 }

 /* Heap -- available for dynamic lists. */
-#define STACK_SIZE	512	/* Minimum number of bytes for stack */
-
 extern char _end[];
 extern char *HEAP;
 extern char *heap_end;
@ -216,9 +216,9 @@ static inline char *__get_heap(size_t s, size_t a, size_t n)
 #define GET_HEAP(type, n) \
 	((type *)__get_heap(sizeof(type),__alignof__(type),(n)))

-static inline int heap_free(void)
+static inline bool heap_free(size_t n)
 {
-	return heap_end-HEAP;
+	return (int)(heap_end-HEAP) >= (int)n;
 }

 /* copy.S */
--- a/arch/i386/boot/header.S
+++ b/arch/i386/boot/header.S
@ -173,7 +173,8 @@ ramdisk_size:	.long	0		# its size in bytes
 bootsect_kludge:
 		.long	0		# obsolete

-heap_end_ptr:	.word	_end+1024	# (Header version 0x0201 or later)
+heap_end_ptr:	.word	_end+STACK_SIZE-512
+					# (Header version 0x0201 or later)
 					# space from here (exclusive) down to
 					# end of setup code can be used by setup
 					# for local heap purposes.
@ -225,28 +226,53 @@ start_of_setup:
 	int	$0x13
 #endif

-# We will have entered with %cs = %ds+0x20, normalize %cs so
-# it is on par with the other segments.
-	pushw	%ds
-	pushw	$setup2
-	lretw
-
-setup2:
 # Force %es = %ds
 	movw	%ds, %ax
 	movw	%ax, %es
 	cld

-# Stack paranoia: align the stack and make sure it is good
-# for both 16- and 32-bit references.  In particular, if we
-# were meant to have been using the full 16-bit segment, the
-# caller might have set %sp to zero, which breaks %esp-based
-# references.
-	andw	$~3, %sp	# dword align (might as well...)
-	jnz	1f
-	movw	$0xfffc, %sp	# Make sure we're not zero
-1:	movzwl	%sp, %esp	# Clear upper half of %esp
-	sti
+# Apparently some ancient versions of LILO invoked the kernel
+# with %ss != %ds, which happened to work by accident for the
+# old code.  If the CAN_USE_HEAP flag is set in loadflags, or
+# %ss != %ds, then adjust the stack pointer.
+
+	# Smallest possible stack we can tolerate
+	movw	$(_end+STACK_SIZE), %cx
+
+	movw	heap_end_ptr, %dx
+	addw	$512, %dx
+	jnc	1f
+	xorw	%dx, %dx	# Wraparound - whole segment available
+1:	testb	$CAN_USE_HEAP, loadflags
+	jnz	2f
+
+	# No CAN_USE_HEAP
+	movw	%ss, %dx
+	cmpw	%ax, %dx	# %ds == %ss?
+	movw	%sp, %dx
+	# If so, assume %sp is reasonably set, otherwise use
+	# the smallest possible stack.
+	jne	4f		# -> Smallest possible stack...
+
+	# Make sure the stack is at least minimum size.  Take a value
+	# of zero to mean "full segment."
+2:
+	andw	$~3, %dx	# dword align (might as well...)
+	jnz	3f
+	movw	$0xfffc, %dx	# Make sure we're not zero
+3:	cmpw	%cx, %dx
+	jnb	5f
+4:	movw	%cx, %dx	# Minimum value we can possibly use
+5:	movw	%ax, %ss
+	movzwl	%dx, %esp	# Clear upper half of %esp
+	sti			# Now we should have a working stack
+
+# We will have entered with %cs = %ds+0x20, normalize %cs so
+# it is on par with the other segments.
+	pushw	%ds
+	pushw	$6f
+	lretw
+6:

 # Check signature at end of setup
 	cmpl	$0x5a5aaa55, setup_sig
--- a/arch/i386/boot/video-bios.c
+++ b/arch/i386/boot/video-bios.c
@ -79,7 +79,7 @@ static int bios_probe(void)
 	video_bios.modes = GET_HEAP(struct mode_info, 0);

 	for (mode = 0x14; mode <= 0x7f; mode++) {
-		if (heap_free() < sizeof(struct mode_info))
+		if (!heap_free(sizeof(struct mode_info)))
 			break;

 		if (mode_defined(VIDEO_FIRST_BIOS+mode))
--- a/arch/i386/boot/video-vesa.c
+++ b/arch/i386/boot/video-vesa.c
@ -57,7 +57,7 @@ static int vesa_probe(void)
 	while ((mode = rdfs16(mode_ptr)) != 0xffff) {
 		mode_ptr += 2;

-		if (heap_free() < sizeof(struct mode_info))
+		if (!heap_free(sizeof(struct mode_info)))
 			break;	/* Heap full, can't save mode info */

 		if (mode & ~0x1ff)
--- a/arch/i386/boot/video.c
+++ b/arch/i386/boot/video.c
@ -371,7 +371,7 @@ static void save_screen(void)
 	saved.curx = boot_params.screen_info.orig_x;
 	saved.cury = boot_params.screen_info.orig_y;

-	if (heap_free() < saved.x*saved.y*sizeof(u16)+512)
+	if (!heap_free(saved.x*saved.y*sizeof(u16)+512))
 		return;		/* Not enough heap to save the screen */

 	saved.data = GET_HEAP(u16, saved.x*saved.y);
--- a/arch/i386/kernel/tsc.c
+++ b/arch/i386/kernel/tsc.c
@ -137,7 +137,7 @@ unsigned long native_calculate_cpu_khz(void)
 {
 	unsigned long long start, end;
 	unsigned long count;
-	u64 delta64;
+	u64 delta64 = (u64)ULLONG_MAX;
 	int i;
 	unsigned long flags;

@ -149,6 +149,7 @@ unsigned long native_calculate_cpu_khz(void)
 		rdtscll(start);
 		mach_countup(&count);
 		rdtscll(end);
+		delta64 = min(delta64, (end - start));
 	}
 	/*
 	 * Error: ECTCNEVERSET
@ -159,8 +160,6 @@ unsigned long native_calculate_cpu_khz(void)
 	if (count <= 1)
 		goto err;

-	delta64 = end - start;
-
 	/* cpu freq too fast: */
 	if (delta64 > (1ULL<<32))
 		goto err;
--- a/arch/i386/xen/enlighten.c
+++ b/arch/i386/xen/enlighten.c
@ -56,7 +56,23 @@ DEFINE_PER_CPU(enum paravirt_lazy_mode, xen_lazy_mode);

 DEFINE_PER_CPU(struct vcpu_info *, xen_vcpu);
 DEFINE_PER_CPU(struct vcpu_info, xen_vcpu_info);
-DEFINE_PER_CPU(unsigned long, xen_cr3);
+
+/*
+ * Note about cr3 (pagetable base) values:
+ *
+ * xen_cr3 contains the current logical cr3 value; it contains the
+ * last set cr3.  This may not be the current effective cr3, because
+ * its update may be being lazily deferred.  However, a vcpu looking
+ * at its own cr3 can use this value knowing that it everything will
+ * be self-consistent.
+ *
+ * xen_current_cr3 contains the actual vcpu cr3; it is set once the
+ * hypercall to set the vcpu cr3 is complete (so it may be a little
+ * out of date, but it will never be set early).  If one vcpu is
+ * looking at another vcpu's cr3 value, it should use this variable.
+ */
+DEFINE_PER_CPU(unsigned long, xen_cr3);	 /* cr3 stored as physaddr */
+DEFINE_PER_CPU(unsigned long, xen_current_cr3);	 /* actual vcpu cr3 */

 struct start_info *xen_start_info;
 EXPORT_SYMBOL_GPL(xen_start_info);
@ -100,7 +116,7 @@ static void __init xen_vcpu_setup(int cpu)
 	info.mfn = virt_to_mfn(vcpup);
 	info.offset = offset_in_page(vcpup);

-	printk(KERN_DEBUG "trying to map vcpu_info %d at %p, mfn %x, offset %d\n",
+	printk(KERN_DEBUG "trying to map vcpu_info %d at %p, mfn %llx, offset %d\n",
 	       cpu, vcpup, info.mfn, info.offset);

 	/* Check to see if the hypervisor will put the vcpu_info
@ -632,32 +648,36 @@ static unsigned long xen_read_cr3(void)
 	return x86_read_percpu(xen_cr3);
 }

+static void set_current_cr3(void *v)
+{
+	x86_write_percpu(xen_current_cr3, (unsigned long)v);
+}
+
 static void xen_write_cr3(unsigned long cr3)
 {
+	struct mmuext_op *op;
+	struct multicall_space mcs;
+	unsigned long mfn = pfn_to_mfn(PFN_DOWN(cr3));
+
 	BUG_ON(preemptible());

-	if (cr3 == x86_read_percpu(xen_cr3)) {
-		/* just a simple tlb flush */
-		xen_flush_tlb();
-		return;
-	}
+	mcs = xen_mc_entry(sizeof(*op));  /* disables interrupts */

+	/* Update while interrupts are disabled, so its atomic with
+	   respect to ipis */
 	x86_write_percpu(xen_cr3, cr3);

+	op = mcs.args;
+	op->cmd = MMUEXT_NEW_BASEPTR;
+	op->arg1.mfn = mfn;

-	{
-		struct mmuext_op *op;
-		struct multicall_space mcs = xen_mc_entry(sizeof(*op));
-		unsigned long mfn = pfn_to_mfn(PFN_DOWN(cr3));
+	MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);

-		op = mcs.args;
-		op->cmd = MMUEXT_NEW_BASEPTR;
-		op->arg1.mfn = mfn;
+	/* Update xen_update_cr3 once the batch has actually
+	   been submitted. */
+	xen_mc_callback(set_current_cr3, (void *)cr3);

-		MULTI_mmuext_op(mcs.mc, op, 1, NULL, DOMID_SELF);
-
-		xen_mc_issue(PARAVIRT_LAZY_CPU);
-	}
+	xen_mc_issue(PARAVIRT_LAZY_CPU);  /* interrupts restored */
 }

 /* Early in boot, while setting up the initial pagetable, assume
@ -1113,6 +1133,7 @@ asmlinkage void __init xen_start_kernel(void)
 	/* keep using Xen gdt for now; no urgent need to change it */

 	x86_write_percpu(xen_cr3, __pa(pgd));
+	x86_write_percpu(xen_current_cr3, __pa(pgd));

 #ifdef CONFIG_SMP
 	/* Don't do the full vcpu_info placement stuff until we have a
--- a/arch/i386/xen/mmu.c
+++ b/arch/i386/xen/mmu.c
@ -515,20 +515,43 @@ static void drop_other_mm_ref(void *info)

 	if (__get_cpu_var(cpu_tlbstate).active_mm == mm)
 		leave_mm(smp_processor_id());
+
+	/* If this cpu still has a stale cr3 reference, then make sure
+	   it has been flushed. */
+	if (x86_read_percpu(xen_current_cr3) == __pa(mm->pgd)) {
+		load_cr3(swapper_pg_dir);
+		arch_flush_lazy_cpu_mode();
+	}
 }

 static void drop_mm_ref(struct mm_struct *mm)
 {
+	cpumask_t mask;
+	unsigned cpu;
+
 	if (current->active_mm == mm) {
 		if (current->mm == mm)
 			load_cr3(swapper_pg_dir);
 		else
 			leave_mm(smp_processor_id());
+		arch_flush_lazy_cpu_mode();
 	}

-	if (!cpus_empty(mm->cpu_vm_mask))
-		xen_smp_call_function_mask(mm->cpu_vm_mask, drop_other_mm_ref,
-					   mm, 1);
+	/* Get the "official" set of cpus referring to our pagetable. */
+	mask = mm->cpu_vm_mask;
+
+	/* It's possible that a vcpu may have a stale reference to our
+	   cr3, because its in lazy mode, and it hasn't yet flushed
+	   its set of pending hypercalls yet.  In this case, we can
+	   look at its actual current cr3 value, and force it to flush
+	   if needed. */
+	for_each_online_cpu(cpu) {
+		if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd))
+			cpu_set(cpu, mask);
+	}
+
+	if (!cpus_empty(mask))
+		xen_smp_call_function_mask(mask, drop_other_mm_ref, mm, 1);
 }
 #else
 static void drop_mm_ref(struct mm_struct *mm)
--- a/arch/i386/xen/multicalls.c
+++ b/arch/i386/xen/multicalls.c
@ -32,7 +32,11 @@
 struct mc_buffer {
 	struct multicall_entry entries[MC_BATCH];
 	u64 args[MC_ARGS];
-	unsigned mcidx, argidx;
+	struct callback {
+		void (*fn)(void *);
+		void *data;
+	} callbacks[MC_BATCH];
+	unsigned mcidx, argidx, cbidx;
 };

 static DEFINE_PER_CPU(struct mc_buffer, mc_buffer);
@ -43,6 +47,7 @@ void xen_mc_flush(void)
 	struct mc_buffer *b = &__get_cpu_var(mc_buffer);
 	int ret = 0;
 	unsigned long flags;
+	int i;

 	BUG_ON(preemptible());

@ -51,8 +56,6 @@ void xen_mc_flush(void)
 	local_irq_save(flags);

 	if (b->mcidx) {
-		int i;
-
 		if (HYPERVISOR_multicall(b->entries, b->mcidx) != 0)
 			BUG();
 		for (i = 0; i < b->mcidx; i++)
@ -65,6 +68,13 @@ void xen_mc_flush(void)

 	local_irq_restore(flags);

+	for(i = 0; i < b->cbidx; i++) {
+		struct callback *cb = &b->callbacks[i];
+
+		(*cb->fn)(cb->data);
+	}
+	b->cbidx = 0;
+
 	BUG_ON(ret);
 }

@ -88,3 +98,16 @@ struct multicall_space __xen_mc_entry(size_t args)

 	return ret;
 }
+
+void xen_mc_callback(void (*fn)(void *), void *data)
+{
+	struct mc_buffer *b = &__get_cpu_var(mc_buffer);
+	struct callback *cb;
+
+	if (b->cbidx == MC_BATCH)
+		xen_mc_flush();
+
+	cb = &b->callbacks[b->cbidx++];
+	cb->fn = fn;
+	cb->data = data;
+}
--- a/arch/i386/xen/multicalls.h
+++ b/arch/i386/xen/multicalls.h
@ -42,4 +42,7 @@ static inline void xen_mc_issue(unsigned mode)
 	local_irq_restore(x86_read_percpu(xen_mc_irq_flags));
 }

+/* Set up a callback to be called when the current batch is flushed */
+void xen_mc_callback(void (*fn)(void *), void *data);
+
 #endif /* _XEN_MULTICALLS_H */
--- a/arch/i386/xen/xen-ops.h
+++ b/arch/i386/xen/xen-ops.h
@ -11,6 +11,7 @@ void xen_copy_trap_info(struct trap_info *traps);

 DECLARE_PER_CPU(struct vcpu_info *, xen_vcpu);
 DECLARE_PER_CPU(unsigned long, xen_cr3);
+DECLARE_PER_CPU(unsigned long, xen_current_cr3);

 extern struct start_info *xen_start_info;
 extern struct shared_info *HYPERVISOR_shared_info;
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@ -360,11 +360,26 @@ static void r4k___flush_cache_all(void)
 	r4k_on_each_cpu(local_r4k___flush_cache_all, NULL, 1, 1);
 }

+static inline int has_valid_asid(const struct mm_struct *mm)
+{
+#if defined(CONFIG_MIPS_MT_SMP) || defined(CONFIG_MIPS_MT_SMTC)
+	int i;
+
+	for_each_online_cpu(i)
+		if (cpu_context(i, mm))
+			return 1;
+
+	return 0;
+#else
+	return cpu_context(smp_processor_id(), mm);
+#endif
+}
+
 static inline void local_r4k_flush_cache_range(void * args)
 {
 	struct vm_area_struct *vma = args;

-	if (!(cpu_context(smp_processor_id(), vma->vm_mm)))
+	if (!(has_valid_asid(vma->vm_mm)))
 		return;

 	r4k_blast_dcache();
@ -383,7 +398,7 @@ static inline void local_r4k_flush_cache_mm(void * args)
 {
 	struct mm_struct *mm = args;

-	if (!cpu_context(smp_processor_id(), mm))
+	if (!has_valid_asid(mm))
 		return;

 	/*
@ -434,7 +449,7 @@ static inline void local_r4k_flush_cache_page(void *args)
 	 * If ownes no valid ASID yet, cannot possibly have gotten
 	 * this page into the cache.
 	 */
-	if (cpu_context(smp_processor_id(), mm) == 0)
+	if (!has_valid_asid(mm))
 		return;

 	addr &= PAGE_MASK;
--- a/arch/powerpc/math-emu/math.c
+++ b/arch/powerpc/math-emu/math.c
@ -407,11 +407,16 @@ do_mathemu(struct pt_regs *regs)

 	case XE:
 		idx = (insn >> 16) & 0x1f;
-		if (!idx)
-			goto illegal;
-
 		op0 = (void *)&current->thread.fpr[(insn >> 21) & 0x1f];
-		op1 = (void *)(regs->gpr[idx] + regs->gpr[(insn >> 11) & 0x1f]);
+		if (!idx) {
+			if (((insn >> 1) & 0x3ff) == STFIWX)
+				op1 = (void *)(regs->gpr[(insn >> 11) & 0x1f]);
+			else
+				goto illegal;
+		} else {
+			op1 = (void *)(regs->gpr[idx] + regs->gpr[(insn >> 11) & 0x1f]);
+		}
+
 		break;

 	case XEU:
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@ -126,7 +126,7 @@ static struct axon_msic *find_msi_translator(struct pci_dev *dev)
 	const phandle *ph;
 	struct axon_msic *msic = NULL;

-	dn = pci_device_to_OF_node(dev);
+	dn = of_node_get(pci_device_to_OF_node(dev));
 	if (!dn) {
 		dev_dbg(&dev->dev, "axon_msi: no pci_dn found\n");
 		return NULL;
@ -183,7 +183,7 @@ static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg)
 	int len;
 	const u32 *prop;

-	dn = pci_device_to_OF_node(dev);
+	dn = of_node_get(pci_device_to_OF_node(dev));
 	if (!dn) {
 		dev_dbg(&dev->dev, "axon_msi: no pci_dn found\n");
 		return -ENODEV;
--- a/arch/sparc64/kernel/sys_sparc.c
+++ b/arch/sparc64/kernel/sys_sparc.c
@ -319,7 +319,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u

 	if (flags & MAP_FIXED) {
 		/* Ok, don't mess with it. */
-		return get_unmapped_area(NULL, addr, len, pgoff, flags);
+		return get_unmapped_area(NULL, orig_addr, len, pgoff, flags);
 	}
 	flags &= ~MAP_SHARED;

--- a/arch/sparc64/lib/xor.S
+++ b/arch/sparc64/lib/xor.S
@ -491,12 +491,12 @@ xor_niagara_4:		/* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */
 	ldda		[%i1 + 0x10] %asi, %i2	/* %i2/%i3 = src1 + 0x10 */
 	xor		%g2, %i4, %g2
 	xor		%g3, %i5, %g3
-	ldda		[%i7 + 0x10] %asi, %i4	/* %i4/%i5 = src2 + 0x10 */
+	ldda		[%l7 + 0x10] %asi, %i4	/* %i4/%i5 = src2 + 0x10 */
 	xor		%l0, %g2, %l0
 	xor		%l1, %g3, %l1
 	stxa		%l0, [%i0 + 0x00] %asi
 	stxa		%l1, [%i0 + 0x08] %asi
-	ldda		[%i6 + 0x10] %asi, %g2	/* %g2/%g3 = src3 + 0x10 */
+	ldda		[%l6 + 0x10] %asi, %g2	/* %g2/%g3 = src3 + 0x10 */
 	ldda		[%i0 + 0x10] %asi, %l0	/* %l0/%l1 = dest + 0x10 */

 	xor		%i4, %i2, %i4
@ -504,12 +504,12 @@ xor_niagara_4:		/* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */
 	ldda		[%i1 + 0x20] %asi, %i2	/* %i2/%i3 = src1 + 0x20 */
 	xor		%g2, %i4, %g2
 	xor		%g3, %i5, %g3
-	ldda		[%i7 + 0x20] %asi, %i4	/* %i4/%i5 = src2 + 0x20 */
+	ldda		[%l7 + 0x20] %asi, %i4	/* %i4/%i5 = src2 + 0x20 */
 	xor		%l0, %g2, %l0
 	xor		%l1, %g3, %l1
 	stxa		%l0, [%i0 + 0x10] %asi
 	stxa		%l1, [%i0 + 0x18] %asi
-	ldda		[%i6 + 0x20] %asi, %g2	/* %g2/%g3 = src3 + 0x20 */
+	ldda		[%l6 + 0x20] %asi, %g2	/* %g2/%g3 = src3 + 0x20 */
 	ldda		[%i0 + 0x20] %asi, %l0	/* %l0/%l1 = dest + 0x20 */

 	xor		%i4, %i2, %i4
@ -517,12 +517,12 @@ xor_niagara_4:		/* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */
 	ldda		[%i1 + 0x30] %asi, %i2	/* %i2/%i3 = src1 + 0x30 */
 	xor		%g2, %i4, %g2
 	xor		%g3, %i5, %g3
-	ldda		[%i7 + 0x30] %asi, %i4	/* %i4/%i5 = src2 + 0x30 */
+	ldda		[%l7 + 0x30] %asi, %i4	/* %i4/%i5 = src2 + 0x30 */
 	xor		%l0, %g2, %l0
 	xor		%l1, %g3, %l1
 	stxa		%l0, [%i0 + 0x20] %asi
 	stxa		%l1, [%i0 + 0x28] %asi
-	ldda		[%i6 + 0x30] %asi, %g2	/* %g2/%g3 = src3 + 0x30 */
+	ldda		[%l6 + 0x30] %asi, %g2	/* %g2/%g3 = src3 + 0x30 */
 	ldda		[%i0 + 0x30] %asi, %l0	/* %l0/%l1 = dest + 0x30 */

 	prefetch	[%i1 + 0x40], #one_read
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@ -60,7 +60,8 @@ SYS_DIR		:= $(ARCH_DIR)/include/sysdep-$(SUBARCH)

 CFLAGS += $(CFLAGS-y) -D__arch_um__ -DSUBARCH=\"$(SUBARCH)\"	\
 	$(ARCH_INCLUDE) $(MODE_INCLUDE) -Dvmap=kernel_vmap	\
-	-Din6addr_loopback=kernel_in6addr_loopback
+	-Din6addr_loopback=kernel_in6addr_loopback \
+	-Din6addr_any=kernel_in6addr_any

 AFLAGS += $(ARCH_INCLUDE)

--- a/arch/um/include/common-offsets.h
+++ b/arch/um/include/common-offsets.h
@ -10,6 +10,7 @@ OFFSET(HOST_TASK_PID, task_struct, pid);

 DEFINE(UM_KERN_PAGE_SIZE, PAGE_SIZE);
 DEFINE(UM_KERN_PAGE_MASK, PAGE_MASK);
+DEFINE(UM_KERN_PAGE_SHIFT, PAGE_SHIFT);
 DEFINE(UM_NSEC_PER_SEC, NSEC_PER_SEC);

 DEFINE_STR(UM_KERN_EMERG, KERN_EMERG);
--- a/arch/um/include/sysdep-i386/stub.h
+++ b/arch/um/include/sysdep-i386/stub.h
@ -9,7 +9,6 @@
 #include <sys/mman.h>
 #include <asm/ptrace.h>
 #include <asm/unistd.h>
-#include <asm/page.h>
 #include "stub-data.h"
 #include "kern_constants.h"
 #include "uml-config.h"
@ -19,7 +18,7 @@ extern void stub_clone_handler(void);

 #define STUB_SYSCALL_RET EAX
 #define STUB_MMAP_NR __NR_mmap2
-#define MMAP_OFFSET(o) ((o) >> PAGE_SHIFT)
+#define MMAP_OFFSET(o) ((o) >> UM_KERN_PAGE_SHIFT)

 static inline long stub_syscall0(long syscall)
 {
--- a/arch/um/kernel/skas/clone.c
+++ b/arch/um/kernel/skas/clone.c
@ -3,7 +3,6 @@
 #include <sys/mman.h>
 #include <sys/time.h>
 #include <asm/unistd.h>
-#include <asm/page.h>
 #include "ptrace_user.h"
 #include "skas.h"
 #include "stub-data.h"
--- a/arch/um/os-Linux/main.c
+++ b/arch/um/os-Linux/main.c
@ -12,7 +12,6 @@
 #include <sys/resource.h>
 #include <sys/mman.h>
 #include <sys/user.h>
-#include <asm/page.h>
 #include "kern_util.h"
 #include "as-layout.h"
 #include "mem_user.h"
--- a/arch/um/os-Linux/skas/mem.c
+++ b/arch/um/os-Linux/skas/mem.c
@ -9,7 +9,6 @@
 #include <unistd.h>
 #include <sys/mman.h>
 #include <sys/wait.h>
-#include <asm/page.h>
 #include <asm/unistd.h>
 #include "mem_user.h"
 #include "mem.h"
--- a/arch/um/os-Linux/skas/process.c
+++ b/arch/um/os-Linux/skas/process.c
@ -182,7 +182,7 @@ static int userspace_tramp(void *stack)

 	ptrace(PTRACE_TRACEME, 0, 0, 0);

-	init_new_thread_signals();
+	signal(SIGTERM, SIG_DFL);
 	err = set_interval(1);
 	if(err)
 		panic("userspace_tramp - setting timer failed, errno = %d\n",
--- a/arch/um/os-Linux/start_up.c
+++ b/arch/um/os-Linux/start_up.c
@ -19,7 +19,6 @@
 #include <sys/mman.h>
 #include <sys/resource.h>
 #include <asm/unistd.h>
-#include <asm/page.h>
 #include <sys/types.h>
 #include "kern_util.h"
 #include "user.h"
--- a/arch/um/os-Linux/tt.c
+++ b/arch/um/os-Linux/tt.c
@ -17,7 +17,6 @@
 #include <sys/mman.h>
 #include <asm/ptrace.h>
 #include <asm/unistd.h>
-#include <asm/page.h>
 #include "kern_util.h"
 #include "user.h"
 #include "signal_kern.h"
--- a/arch/um/os-Linux/util.c
+++ b/arch/um/os-Linux/util.c
@ -105,6 +105,44 @@ int setjmp_wrapper(void (*proc)(void *, void *), ...)

 void os_dump_core(void)
 {
+	int pid;
+
 	signal(SIGSEGV, SIG_DFL);
+
+	/*
+	 * We are about to SIGTERM this entire process group to ensure that
+	 * nothing is around to run after the kernel exits.  The
+	 * kernel wants to abort, not die through SIGTERM, so we
+	 * ignore it here.
+	 */
+
+	signal(SIGTERM, SIG_IGN);
+	kill(0, SIGTERM);
+	/*
+	 * Most of the other processes associated with this UML are
+	 * likely sTopped, so give them a SIGCONT so they see the
+	 * SIGTERM.
+	 */
+	kill(0, SIGCONT);
+
+	/*
+	 * Now, having sent signals to everyone but us, make sure they
+	 * die by ptrace.  Processes can survive what's been done to
+	 * them so far - the mechanism I understand is receiving a
+	 * SIGSEGV and segfaulting immediately upon return.  There is
+	 * always a SIGSEGV pending, and (I'm guessing) signals are
+	 * processed in numeric order so the SIGTERM (signal 15 vs
+	 * SIGSEGV being signal 11) is never handled.
+	 *
+	 * Run a waitpid loop until we get some kind of error.
+	 * Hopefully, it's ECHILD, but there's not a lot we can do if
+	 * it's something else.  Tell os_kill_ptraced_process not to
+	 * wait for the child to report its death because there's
+	 * nothing reasonable to do if that fails.
+	 */
+
+	while ((pid = waitpid(-1, NULL, WNOHANG)) > 0)
+		os_kill_ptraced_process(pid, 0);
+
 	abort();
 }
--- a/arch/um/sys-i386/user-offsets.c
+++ b/arch/um/sys-i386/user-offsets.c
@ -2,9 +2,9 @@
 #include <stddef.h>
 #include <signal.h>
 #include <sys/poll.h>
+#include <sys/user.h>
 #include <sys/mman.h>
 #include <asm/ptrace.h>
-#include <asm/user.h>

 #define DEFINE(sym, val) \
 	asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@ -48,8 +48,8 @@ void foo(void)
 	OFFSET(HOST_SC_FP_ST, _fpstate, _st);
 	OFFSET(HOST_SC_FXSR_ENV, _fpstate, _fxsr_env);

-	DEFINE_LONGS(HOST_FP_SIZE, sizeof(struct user_i387_struct));
-	DEFINE_LONGS(HOST_XFP_SIZE, sizeof(struct user_fxsr_struct));
+	DEFINE_LONGS(HOST_FP_SIZE, sizeof(struct user_fpregs_struct));
+	DEFINE_LONGS(HOST_XFP_SIZE, sizeof(struct user_fpxregs_struct));

 	DEFINE(HOST_IP, EIP);
 	DEFINE(HOST_SP, UESP);
--- a/arch/um/sys-x86_64/user-offsets.c
+++ b/arch/um/sys-x86_64/user-offsets.c
@ -3,17 +3,10 @@
 #include <signal.h>
 #include <sys/poll.h>
 #include <sys/mman.h>
+#include <sys/user.h>
 #define __FRAME_OFFSETS
 #include <asm/ptrace.h>
 #include <asm/types.h>
-/* For some reason, x86_64 defines u64 and u32 only in <pci/types.h>, which I
- * refuse to include here, even though they're used throughout the headers.
- * These are used in asm/user.h, and that include can't be avoided because of
- * the sizeof(struct user_regs_struct) below.
- */
-typedef __u64 u64;
-typedef __u32 u32;
-#include <asm/user.h>

 #define DEFINE(sym, val) \
        asm volatile("\n->" #sym " %0 " #val : : "i" (val))
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@ -734,12 +734,6 @@ int in_gate_area_no_task(unsigned long addr)
 	return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
 }

-void * __init alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
-{
-	return __alloc_bootmem_core(pgdat->bdata, size,
-			SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
-}
-
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
 	if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@ -229,9 +229,14 @@ void global_flush_tlb(void)
 	struct page *pg, *next;
 	struct list_head l;

-	down_read(&init_mm.mmap_sem);
+	/*
+	 * Write-protect the semaphore, to exclude two contexts
+	 * doing a list_replace_init() call in parallel and to
+	 * exclude new additions to the deferred_pages list:
+	 */
+	down_write(&init_mm.mmap_sem);
 	list_replace_init(&deferred_pages, &l);
-	up_read(&init_mm.mmap_sem);
+	up_write(&init_mm.mmap_sem);

 	flush_map(&l);

--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@ -819,7 +819,6 @@ static int __blk_free_tags(struct blk_queue_tag *bqt)
 	retval = atomic_dec_and_test(&bqt->refcnt);
 	if (retval) {
 		BUG_ON(bqt->busy);
-		BUG_ON(!list_empty(&bqt->busy_list));

 		kfree(bqt->tag_index);
 		bqt->tag_index = NULL;
@ -931,7 +930,6 @@ static struct blk_queue_tag *__blk_queue_init_tags(struct request_queue *q,
 	if (init_tag_map(q, tags, depth))
 		goto fail;

-	INIT_LIST_HEAD(&tags->busy_list);
 	tags->busy = 0;
 	atomic_set(&tags->refcnt, 1);
 	return tags;
@ -982,6 +980,7 @@ int blk_queue_init_tags(struct request_queue *q, int depth,
 	 */
 	q->queue_tags = tags;
 	q->queue_flags |= (1 << QUEUE_FLAG_QUEUED);
+	INIT_LIST_HEAD(&q->tag_busy_list);
 	return 0;
 fail:
 	kfree(tags);
@ -1152,7 +1151,7 @@ int blk_queue_start_tag(struct request_queue *q, struct request *rq)
 	rq->tag = tag;
 	bqt->tag_index[tag] = rq;
 	blkdev_dequeue_request(rq);
-	list_add(&rq->queuelist, &bqt->busy_list);
+	list_add(&rq->queuelist, &q->tag_busy_list);
 	bqt->busy++;
 	return 0;
 }
@ -1173,11 +1172,10 @@ EXPORT_SYMBOL(blk_queue_start_tag);
 **/
 void blk_queue_invalidate_tags(struct request_queue *q)
 {
-	struct blk_queue_tag *bqt = q->queue_tags;
 	struct list_head *tmp, *n;
 	struct request *rq;

-	list_for_each_safe(tmp, n, &bqt->busy_list) {
+	list_for_each_safe(tmp, n, &q->tag_busy_list) {
 		rq = list_entry_rq(tmp);

 		if (rq->tag == -1) {
--- a/drivers/ata/sata_mv.c
+++ b/drivers/ata/sata_mv.c
@ -69,10 +69,11 @@
 #include <linux/device.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_cmnd.h>
+#include <scsi/scsi_device.h>
 #include <linux/libata.h>

 #define DRV_NAME	"sata_mv"
-#define DRV_VERSION	"1.0"
+#define DRV_VERSION	"1.01"

 enum {
 	/* BAR's are enumerated in terms of pci_resource_start() terms */
@ -420,6 +421,7 @@ static void mv_error_handler(struct ata_port *ap);
 static void mv_post_int_cmd(struct ata_queued_cmd *qc);
 static void mv_eh_freeze(struct ata_port *ap);
 static void mv_eh_thaw(struct ata_port *ap);
+static int mv_slave_config(struct scsi_device *sdev);
 static int mv_init_one(struct pci_dev *pdev, const struct pci_device_id *ent);

 static void mv5_phy_errata(struct mv_host_priv *hpriv, void __iomem *mmio,
@ -457,7 +459,7 @@ static struct scsi_host_template mv5_sht = {
 	.use_clustering		= 1,
 	.proc_name		= DRV_NAME,
 	.dma_boundary		= MV_DMA_BOUNDARY,
-	.slave_configure	= ata_scsi_slave_config,
+	.slave_configure	= mv_slave_config,
 	.slave_destroy		= ata_scsi_slave_destroy,
 	.bios_param		= ata_std_bios_param,
 };
@ -475,7 +477,7 @@ static struct scsi_host_template mv6_sht = {
 	.use_clustering		= 1,
 	.proc_name		= DRV_NAME,
 	.dma_boundary		= MV_DMA_BOUNDARY,
-	.slave_configure	= ata_scsi_slave_config,
+	.slave_configure	= mv_slave_config,
 	.slave_destroy		= ata_scsi_slave_destroy,
 	.bios_param		= ata_std_bios_param,
 };
@ -763,6 +765,17 @@ static void mv_irq_clear(struct ata_port *ap)
 {
 }

+static int mv_slave_config(struct scsi_device *sdev)
+{
+	int rc = ata_scsi_slave_config(sdev);
+	if (rc)
+		return rc;
+
+	blk_queue_max_phys_segments(sdev->request_queue, MV_MAX_SG_CT / 2);
+
+	return 0;	/* scsi layer doesn't check return value, sigh */
+}
+
 static void mv_set_edma_ptrs(void __iomem *port_mmio,
 			     struct mv_host_priv *hpriv,
 			     struct mv_port_priv *pp)
@ -1130,10 +1143,9 @@ static void mv_port_stop(struct ata_port *ap)
 *      LOCKING:
 *      Inherited from caller.
 */
-static unsigned int mv_fill_sg(struct ata_queued_cmd *qc)
+static void mv_fill_sg(struct ata_queued_cmd *qc)
 {
 	struct mv_port_priv *pp = qc->ap->private_data;
-	unsigned int n_sg = 0;
 	struct scatterlist *sg;
 	struct mv_sg *mv_sg;

@ -1151,7 +1163,7 @@ static unsigned int mv_fill_sg(struct ata_queued_cmd *qc)

 			mv_sg->addr = cpu_to_le32(addr & 0xffffffff);
 			mv_sg->addr_hi = cpu_to_le32((addr >> 16) >> 16);
-			mv_sg->flags_size = cpu_to_le32(len);
+			mv_sg->flags_size = cpu_to_le32(len & 0xffff);

 			sg_len -= len;
 			addr += len;
@ -1160,12 +1172,9 @@ static unsigned int mv_fill_sg(struct ata_queued_cmd *qc)
 				mv_sg->flags_size |= cpu_to_le32(EPRD_FLAG_END_OF_TBL);

 			mv_sg++;
-			n_sg++;
 		}

 	}
-
-	return n_sg;
 }

 static inline void mv_crqb_pack_cmd(__le16 *cmdw, u8 data, u8 addr, unsigned last)
--- a/fs/locks.c
+++ b/fs/locks.c
@ -694,11 +694,20 @@ EXPORT_SYMBOL(posix_test_lock);
 * Note: the above assumption may not be true when handling lock requests
 * from a broken NFS client. But broken NFS clients have a lot more to
 * worry about than proper deadlock detection anyway... --okir
+ *
+ * However, the failure of this assumption (also possible in the case of
+ * multiple tasks sharing the same open file table) also means there's no
+ * guarantee that the loop below will terminate.  As a hack, we give up
+ * after a few iterations.
 */
+
+#define MAX_DEADLK_ITERATIONS 10
+
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
 	struct list_head *tmp;
+	int i = 0;

 next_task:
 	if (posix_same_owner(caller_fl, block_fl))
@ -706,6 +715,8 @@ next_task:
 	list_for_each(tmp, &blocked_list) {
 		struct file_lock *fl = list_entry(tmp, struct file_lock, fl_link);
 		if (posix_same_owner(fl, block_fl)) {
+			if (i++ > MAX_DEADLK_ITERATIONS)
+				return 0;
 			fl = fl->fl_next;
 			block_fl = fl;
 			goto next_task;
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@ -351,7 +351,8 @@ static cputime_t task_utime(struct task_struct *p)
 	}
 	utime = (clock_t)temp;

-	return clock_t_to_cputime(utime);
+	p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
+	return p->prev_utime;
 }

 static cputime_t task_stime(struct task_struct *p)
@ -366,7 +367,8 @@ static cputime_t task_stime(struct task_struct *p)
 	stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
 			cputime_to_clock_t(task_utime(p));

-	return clock_t_to_cputime(stime);
+	p->prev_stime = max(p->prev_stime, clock_t_to_cputime(stime));
+	return p->prev_stime;
 }
 #endif

--- a/fs/splice.c
+++ b/fs/splice.c
@ -1390,10 +1390,10 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
 	if (copy_to_user(sd->u.userptr, src + buf->offset, sd->len))
 		ret = -EFAULT;

+	buf->ops->unmap(pipe, buf, src);
 out:
 	if (ret > 0)
 		sd->u.userptr += ret;
-	buf->ops->unmap(pipe, buf, src);
 	return ret;
 }

--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@ -187,6 +187,19 @@ free_address(
 {
 	a_list_t	*aentry;

+#ifdef CONFIG_XEN
+	/*
+	 * Xen needs to be able to make sure it can get an exclusive
+	 * RO mapping of pages it wants to turn into a pagetable.  If
+	 * a newly allocated page is also still being vmap()ed by xfs,
+	 * it will cause pagetable construction to fail.  This is a
+	 * quick workaround to always eagerly unmap pages so that Xen
+	 * is happy.
+	 */
+	vunmap(addr);
+	return;
+#endif
+
 	aentry = kmalloc(sizeof(a_list_t), GFP_NOWAIT);
 	if (likely(aentry)) {
 		spin_lock(&as_lock);
--- a/include/asm-mips/hazards.h
+++ b/include/asm-mips/hazards.h
@ -10,11 +10,12 @@
 #ifndef _ASM_HAZARDS_H
 #define _ASM_HAZARDS_H

-
 #ifdef __ASSEMBLY__
 #define ASMMACRO(name, code...) .macro name; code; .endm
 #else

+#include <asm/cpu-features.h>
+
 #define ASMMACRO(name, code...)						\
 __asm__(".macro " #name "; " #code "; .endm");				\
 									\
@ -86,6 +87,57 @@ do {									\
 	: "=r" (tmp));							\
 } while (0)

+#elif defined(CONFIG_CPU_MIPSR1)
+
+/*
+ * These are slightly complicated by the fact that we guarantee R1 kernels to
+ * run fine on R2 processors.
+ */
+ASMMACRO(mtc0_tlbw_hazard,
+	_ssnop; _ssnop; _ehb
+	)
+ASMMACRO(tlbw_use_hazard,
+	_ssnop; _ssnop; _ssnop; _ehb
+	)
+ASMMACRO(tlb_probe_hazard,
+	 _ssnop; _ssnop; _ssnop; _ehb
+	)
+ASMMACRO(irq_enable_hazard,
+	 _ssnop; _ssnop; _ssnop; _ehb
+	)
+ASMMACRO(irq_disable_hazard,
+	_ssnop; _ssnop; _ssnop; _ehb
+	)
+ASMMACRO(back_to_back_c0_hazard,
+	 _ssnop; _ssnop; _ssnop; _ehb
+	)
+/*
+ * gcc has a tradition of misscompiling the previous construct using the
+ * address of a label as argument to inline assembler.  Gas otoh has the
+ * annoying difference between la and dla which are only usable for 32-bit
+ * rsp. 64-bit code, so can't be used without conditional compilation.
+ * The alterantive is switching the assembler to 64-bit code which happens
+ * to work right even for 32-bit code ...
+ */
+#define __instruction_hazard()						\
+do {									\
+	unsigned long tmp;						\
+									\
+	__asm__ __volatile__(						\
+	"	.set	mips64r2				\n"	\
+	"	dla	%0, 1f					\n"	\
+	"	jr.hb	%0					\n"	\
+	"	.set	mips0					\n"	\
+	"1:							\n"	\
+	: "=r" (tmp));							\
+} while (0)
+
+#define instruction_hazard()						\
+do {									\
+	if (cpu_has_mips_r2)						\
+		__instruction_hazard();					\
+} while (0)
+
 #elif defined(CONFIG_CPU_R10000)

 /*
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@ -356,7 +356,6 @@ enum blk_queue_state {
 struct blk_queue_tag {
 	struct request **tag_index;	/* map of busy tags */
 	unsigned long *tag_map;		/* bit map of free/busy tags */
-	struct list_head busy_list;	/* fifo list of busy tags */
 	int busy;			/* current depth */
 	int max_depth;			/* what we will send to device */
 	int real_max_depth;		/* what the array can hold */
@ -451,6 +450,7 @@ struct request_queue
 	unsigned int		dma_alignment;

 	struct blk_queue_tag	*queue_tags;
+	struct list_head	tag_busy_list;

 	unsigned int		nr_sorted;
 	unsigned int		in_flight;
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@ -59,7 +59,6 @@ extern void *__alloc_bootmem_core(struct bootmem_data *bdata,
 				  unsigned long align,
 				  unsigned long goal,
 				  unsigned long limit);
-extern void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size);

 #ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
 extern void reserve_bootmem(unsigned long addr, unsigned long size);
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@ -1022,6 +1022,7 @@ struct task_struct {

 	unsigned int rt_priority;
 	cputime_t utime, stime;
+	cputime_t prev_utime, prev_stime;
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	struct timespec start_time; 		/* monotonic time */
 	struct timespec real_start_time;	/* boot based time */
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@ -160,8 +160,9 @@ struct vcpu_set_singleshot_timer {
 */
 #define VCPUOP_register_vcpu_info   10  /* arg == struct vcpu_info */
 struct vcpu_register_vcpu_info {
-    uint32_t mfn;               /* mfn of page to place vcpu_info */
-    uint32_t offset;            /* offset within page */
+    uint64_t mfn;    /* mfn of page to place vcpu_info */
+    uint32_t offset; /* offset within page */
+    uint32_t rsvd;   /* unused */
 };

 #endif /* __XEN_PUBLIC_VCPU_H__ */
--- a/kernel/fork.c
+++ b/kernel/fork.c
@ -1045,6 +1045,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,

 	p->utime = cputime_zero;
 	p->stime = cputime_zero;
+	p->prev_utime = cputime_zero;
+	p->prev_stime = cputime_zero;

 #ifdef CONFIG_TASK_XACCT
 	p->rchar = 0;		/* I/O counter: bytes read */
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@ -29,6 +29,15 @@ fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
 	return 0;
 }

+static void __user *futex_uaddr(struct robust_list *entry,
+				compat_long_t futex_offset)
+{
+	compat_uptr_t base = ptr_to_compat(entry);
+	void __user *uaddr = compat_ptr(base + futex_offset);
+
+	return uaddr;
+}
+
 /*
 * Walk curr->robust_list (very carefully, it's a userspace list!)
 * and mark any locks found there dead, and notify any waiters.
@ -75,11 +84,13 @@ void compat_exit_robust_list(struct task_struct *curr)
 		 * A pending lock might already be on the list, so
 		 * dont process it twice:
 		 */
-		if (entry != pending)
-			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi))
-				return;
+		if (entry != pending) {
+			void __user *uaddr = futex_uaddr(entry,
+							 futex_offset);

+			if (handle_futex_death(uaddr, curr, pi))
+				return;
+		}
 		if (rc)
 			return;
 		uentry = next_uentry;
@ -93,9 +104,11 @@ void compat_exit_robust_list(struct task_struct *curr)

 		cond_resched();
 	}
-	if (pending)
-		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip);
+	if (pending) {
+		void __user *uaddr = futex_uaddr(pending, futex_offset);
+
+		handle_futex_death(uaddr, curr, pip);
+	}
 }

 asmlinkage long
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@ -1521,7 +1521,7 @@ cache_hit:
 }

 static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
-	       	struct held_lock *hlock, int chain_head)
+	       	struct held_lock *hlock, int chain_head, u64 chain_key)
 {
 	/*
 	 * Trylock needs to maintain the stack of held locks, but it
@ -1534,7 +1534,7 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 	 * graph_lock for us)
 	 */
 	if (!hlock->trylock && (hlock->check == 2) &&
-			lookup_chain_cache(curr->curr_chain_key, hlock->class)) {
+			lookup_chain_cache(chain_key, hlock->class)) {
 		/*
 		 * Check whether last held lock:
 		 *
@ -1576,7 +1576,7 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
 #else
 static inline int validate_chain(struct task_struct *curr,
 	       	struct lockdep_map *lock, struct held_lock *hlock,
-		int chain_head)
+		int chain_head, u64 chain_key)
 {
 	return 1;
 }
@ -2450,11 +2450,11 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		chain_head = 1;
 	}
 	chain_key = iterate_chain_key(chain_key, id);
-	curr->curr_chain_key = chain_key;

-	if (!validate_chain(curr, lock, hlock, chain_head))
+	if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
 		return 0;

+	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
 #ifdef CONFIG_DEBUG_LOCKDEP
--- a/kernel/params.c
+++ b/kernel/params.c
@ -595,13 +595,16 @@ static void __init param_sysfs_builtin(void)

 	for (i=0; i < __stop___param - __start___param; i++) {
 		char *dot;
+		size_t max_name_len;

 		kp = &__start___param[i];
+		max_name_len =
+			min_t(size_t, MAX_KBUILD_MODNAME, strlen(kp->name));

-		/* We do not handle args without periods. */
-		dot = memchr(kp->name, '.', MAX_KBUILD_MODNAME);
+		dot = memchr(kp->name, '.', max_name_len);
 		if (!dot) {
-			DEBUGP("couldn't find period in %s\n", kp->name);
+			DEBUGP("couldn't find period in first %d characters "
+			       "of %s\n", MAX_KBUILD_MODNAME, kp->name);
 			continue;
 		}
 		name_len = dot - kp->name;
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@ -80,10 +80,11 @@ void softlockup_tick(void)
 	print_timestamp = per_cpu(print_timestamp, this_cpu);

 	/* report at most once a second */
-	if (print_timestamp < (touch_timestamp + 1) ||
-		did_panic ||
-			!per_cpu(watchdog_task, this_cpu))
+	if ((print_timestamp >= touch_timestamp &&
+			print_timestamp < (touch_timestamp + 1)) ||
+			did_panic || !per_cpu(watchdog_task, this_cpu)) {
 		return;
+	}

 	/* do not print during early bootup: */
 	if (unlikely(system_state != SYSTEM_RUNNING)) {
--- a/mm/filemap.c
+++ b/mm/filemap.c
@ -1312,7 +1312,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)

 	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (vmf->pgoff >= size)
-		goto outside_data_content;
+		return VM_FAULT_SIGBUS;

 	/* If we don't want any read-ahead, don't bother */
 	if (VM_RandomReadHint(vma))
@ -1389,7 +1389,7 @@ retry_find:
 	if (unlikely(vmf->pgoff >= size)) {
 		unlock_page(page);
 		page_cache_release(page);
-		goto outside_data_content;
+		return VM_FAULT_SIGBUS;
 	}

 	/*
@ -1400,15 +1400,6 @@ retry_find:
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;

-outside_data_content:
-	/*
-	 * An external ptracer can access pages that normally aren't
-	 * accessible..
-	 */
-	if (vma->vm_mm == current->mm)
-		return VM_FAULT_SIGBUS;
-
-	/* Fall through to the non-read-ahead case */
 no_cached_page:
 	/*
 	 * We're only likely to ever get here if MADV_RANDOM is in
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@ -672,8 +672,10 @@ retry:

 			ret = (*writepage)(page, wbc, data);

-			if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE))
+			if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE)) {
 				unlock_page(page);
+				ret = 0;
+			}
 			if (ret || (--(wbc->nr_to_write) <= 0))
 				done = 1;
 			if (wbc->nonblocking && bdi_write_congested(bdi)) {
--- a/mm/shmem.c
+++ b/mm/shmem.c
@ -916,6 +916,21 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	struct inode *inode;

 	BUG_ON(!PageLocked(page));
+	/*
+	 * shmem_backing_dev_info's capabilities prevent regular writeback or
+	 * sync from ever calling shmem_writepage; but a stacking filesystem
+	 * may use the ->writepage of its underlying filesystem, in which case
+	 * we want to do nothing when that underlying filesystem is tmpfs
+	 * (writing out to swap is useful as a response to memory pressure, but
+	 * of no use to stabilize the data) - just redirty the page, unlock it
+	 * and claim success in this case.  AOP_WRITEPAGE_ACTIVATE, and the
+	 * page_mapped check below, must be avoided unless we're in reclaim.
+	 */
+	if (!wbc->for_reclaim) {
+		set_page_dirty(page);
+		unlock_page(page);
+		return 0;
+	}
 	BUG_ON(page_mapped(page));

 	mapping = page->mapping;
--- a/mm/slub.c
+++ b/mm/slub.c
@ -1501,28 +1501,8 @@ new_slab:
 	page = new_slab(s, gfpflags, node);
 	if (page) {
 		cpu = smp_processor_id();
-		if (s->cpu_slab[cpu]) {
-			/*
-			 * Someone else populated the cpu_slab while we
-			 * enabled interrupts, or we have gotten scheduled
-			 * on another cpu. The page may not be on the
-			 * requested node even if __GFP_THISNODE was
-			 * specified. So we need to recheck.
-			 */
-			if (node == -1 ||
-				page_to_nid(s->cpu_slab[cpu]) == node) {
-				/*
-				 * Current cpuslab is acceptable and we
-				 * want the current one since its cache hot
-				 */
-				discard_slab(s, page);
-				page = s->cpu_slab[cpu];
-				slab_lock(page);
-				goto load_freelist;
-			}
-			/* New slab does not fit our expectations */
+		if (s->cpu_slab[cpu])
 			flush_slab(s, s->cpu_slab[cpu], cpu);
-		}
 		slab_lock(page);
 		SetSlabFrozen(page);
 		s->cpu_slab[cpu] = page;
--- a/mm/sparse.c
+++ b/mm/sparse.c
@ -215,12 +215,6 @@ static int __meminit sparse_init_one_section(struct mem_section *ms,
 	return 1;
 }

-__attribute__((weak)) __init
-void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
-{
-	return NULL;
-}
-
 static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
@ -231,11 +225,6 @@ static struct page __init *sparse_early_mem_map_alloc(unsigned long pnum)
 	if (map)
 		return map;

-  	map = alloc_bootmem_high_node(NODE_DATA(nid),
-                       sizeof(struct page) * PAGES_PER_SECTION);
-	if (map)
-		return map;
-
 	map = alloc_bootmem_node(NODE_DATA(nid),
 			sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)