linux

Author	SHA1	Message	Date
Gerald Schaefer	335d7afbfb	mutexes, sched: Introduce arch_mutex_cpu_relax() The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:05:34 +01:00
Ingo Molnar	22a867d817	Merge commit 'v2.6.37-rc3' into sched/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:05:21 +01:00
Peter Zijlstra	5bb6b1ea67	sched: Add some clock info to sched_debug Add more clock information to /proc/sched_debug, Thomas wanted to see the sched_clock_stable state. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-23 10:29:08 +01:00
Peter Zijlstra	51a96c7781	cpu: Remove incorrect BUG_ON Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-23 10:29:08 +01:00
Dhaval Giani	2e01f4740a	cpu: Remove unused variable GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani <dhaval_giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290091494.1145.5.camel@gondor.retis> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-23 10:29:07 +01:00
Peter Zijlstra	70caf8a6c1	sched: Fix UP build breakage The recent cgroup-scheduling rework caused a UP build problem. Cc: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-23 10:29:07 +01:00
Erik Gilling	28d0686cf7	sched: Make task dump print all 15 chars of proc comm Signed-off-by: Erik Gilling <konkers@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-23 10:29:07 +01:00
Linus Torvalds	33e0d57f5d	Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" This reverts commit `59365d136d`. It turns out that this can break certain existing user land setups. Quoth Sarah Sharp: "On Wednesday, I updated my branch to commit `460781b` from linus' tree, and my box would not boot. klogd segfaulted, which stalled the whole system. At first I thought it actually hung the box, but it continued booting after 5 minutes, and I was able to log in. It dropped back to the text console instead of the graphical bootup display for that period of time. dmesg surprisingly still works. I've bisected the problem down to this commit (commit `59365d136d`) The box is running klogd 1.5.5ubuntu3 (from Jaunty). Yes, I know that's old. I read the bit in the commit about changing the permissions of kallsyms after boot, but if I can't boot that doesn't help." So let's just keep the old default, and encourage distributions to do the "chmod -r /proc/kallsyms" in their bootup scripts. This is not worth a kernel option to change default behavior, since it's so easily done in user space. Reported-and-bisected-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Marcus Meissner <meissner@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-19 11:54:40 -08:00
Linus Torvalds	2d42dc3feb	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Fix regression in evr register handling kgdb,x86: fix regression in detach handling kdb: fix crash when KDB_BASE_CMD_MAX is exceeded kdb: fix memory leak in kdb_main.c	2010-11-18 08:24:58 -08:00
Paul Turner	9437178f62	sched: Update tg->shares after cpu.shares write Formerly sched_group_set_shares would force a rebalance by overflowing domain share sums. Now that per-cpu averages are maintained we can set the true value by issuing an update_cfs_shares() following a tg->shares update. Also initialize tg se->load to 0 for consistency since we'll now set correct weights on enqueue. Signed-off-by: Paul Turner <pjt@google.com?> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.465521344@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:50 +01:00
Paul Turner	d6b5591829	sched: Allow update_cfs_load() to update global load Refactor the global load updates from update_shares_cpu() so that update_cfs_load() can update global load when it is more than ~10% out of sync. The new global_load parameter allows us to force an update, regardless of the error factor so that we can synchronize w/ update_shares(). Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.377473595@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:50 +01:00
Paul Turner	3b3d190ec3	sched: Implement demand based update_cfs_load() When the system is busy, dilation of rq->next_balance makes lb->update_shares() insufficiently frequent for threads which don't sleep (no dequeue/enqueue updates). Adjust for this by making demand based updates based on the accumulation of execution time sufficient to wrap our averaging window. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.291159744@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:49 +01:00
Paul Turner	c66eaf619c	sched: Update shares on idle_balance Since shares updates are no longer expensive and effectively local, update them at idle_balance(). This allows us to more quickly redistribute shares to another cpu when our load becomes idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.204191702@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:49 +01:00
Paul Turner	a7a4f8a752	sched: Add sysctl_sched_shares_window Introduce a new sysctl for the shares window and disambiguate it from sched_time_avg. A 10ms window appears to be a good compromise between accuracy and performance. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.112173964@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:49 +01:00
Paul Turner	67e86250f8	sched: Introduce hierarchal order on shares update list Avoid duplicate shares update calls by ensuring children always appear before parents in rq->leaf_cfs_rq_list. This allows us to do a single in-order traversal for update_shares(). Since we always enqueue in bottom-up order this reduces to 2 cases: 1) Our parent is already in the list, e.g. root \ b /\ c d* (root->b->c already enqueued) Since d's parent is enqueued we push it to the head of the list, implicitly ahead of b. 2) Our parent does not appear in the list (or we have no parent) In this case we enqueue to the tail of the list, if our parent is subsequently enqueued (bottom-up) it will appear to our right by the same rule. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.022488865@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:48 +01:00
Paul Turner	e33078baa4	sched: Fix update_cfs_load() synchronization Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with the put path since nr_running accounting occurs at deactivation. It's also not safe to make the removal decision based on load_avg as this fails with both high periods and low shares. Resolve this by clipping history after 4 periods without activity. Note: the above will always occur from update_shares() since in the last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load is called. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.933428187@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:48 +01:00
Paul Turner	f0d7442a59	sched: Fix load corruption from update_cfs_shares() As part of enqueue_entity both a new entity weight and its contribution to the queuing cfs_rq / rq are updated. Since update_cfs_shares will only update the queueing weights when the entity is on_rq (which in this case it is not yet), there's a dependency loop here: update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight account_entity_enqueue needs the updated weight for the queuing cfs_rq load[] Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as if we had accounted the enqueue already. This was also resulting in rq->load corruption previously. []: this dependency also exists when using the group cfs_rq w/ update_cfs_shares as the weight of the enqueued entity changes without the load being updated. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.844900206@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:47 +01:00
Peter Zijlstra	9e3081ca61	sched: Make tg_shares_up() walk on-demand Make tg_shares_up() use the active cgroup list, this means we cannot do a strict bottom-up walk of the hierarchy, but assuming its a very wide tree with a small number of active groups it should be a win. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.754159484@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:47 +01:00
Peter Zijlstra	3d4b47b4b0	sched: Implement on-demand (active) cfs_rq list Make certain load-balance actions scale per number of active cgroups instead of the number of existing cgroups. This makes wakeup/sleep paths more expensive, but is a win for systems where the vast majority of existing cgroups are idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.666535048@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:47 +01:00
Peter Zijlstra	2069dd75c7	sched: Rewrite tg_shares_up) By tracking a per-cpu load-avg for each cfs_rq and folding it into a global task_group load on each tick we can rework tg_shares_up to be strictly per-cpu. This should improve cpu-cgroup performance for smp systems significantly. [ Paul: changed to use queueing cfs_rq + bug fixes ] Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.580480400@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:46 +01:00
Peter Zijlstra	48c5ccae88	sched: Simplify cpu-hot-unplug task migration While discussing the need for sched_idle_next(), Oleg remarked that since try_to_wake_up() ensures sleeping tasks will end up running on a sane cpu, we can do away with migrate_live_tasks(). If we then extend the existing hack of migrating current from CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the need for the sched_idle_next() abomination disappears as well, since idle will be the only possible thread left after the migration thread stops. This greatly simplifies the hot-unplug task migration path, as can be seen from the resulting code reduction (and about half the new lines are comments). Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289851597.2109.547.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:27:46 +01:00
Ingo Molnar	92fd4d4d67	Merge commit 'v2.6.37-rc2' into sched/core Merge reason: Move to a .37-rc base. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:22:26 +01:00
Jovi Zhang	5450d90405	kdb: fix crash when KDB_BASE_CMD_MAX is exceeded When the number of dyanmic kdb commands exceeds KDB_BASE_CMD_MAX, the kernel will fault. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-11-17 13:54:57 -06:00
Jovi Zhang	85e76ab50a	kdb: fix memory leak in kdb_main.c Call kfree in the error path as well as the success path in kdb_ll(). Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-11-17 13:54:57 -06:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Marcus Meissner	59365d136d	kernel: make /proc/kallsyms mode 400 to reduce ease of attacking Making /proc/kallsyms readable only for root by default makes it slightly harder for attackers to write generic kernel exploits by removing one source of knowledge where things are in the kernel. This is the second submit, discussion happened on this on first submit and mostly concerned that this is just one hole of the sieve ... but one of the bigger ones. Changing the permissions of at least System.map and vmlinux is also required to fix the same set, but a packaging issue. Target of this starter patch and follow ups is removing any kind of kernel space address information leak from the kernel. [ Side note: the default of root-only reading is the "safe" value, and it's easy enough to then override at any time after boot. The /proc filesystem allows root to change the permissions with a regular chmod, so you can "revert" this at run-time by simply doing chmod og+r /proc/kallsyms as root if you really want regular users to see the kernel symbols. It does help some tools like "perf" figure them out without any setup, so it may well make sense in some situations. - Linus ] Signed-off-by: Marcus Meissner <meissner@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Eugene Teo <eugeneteo@kernel.org> Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-16 19:06:01 -08:00
Linus Torvalds	d33fdee4d0	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix cross-sched-class wakeup preemption sched: Fix runnable condition for stoptask sched: Use group weight, idle cpu metrics to fix imbalances during idle	2010-11-16 15:20:05 -08:00
Linus Torvalds	1e8703b2e6	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / PM QoS: Fix reversed min and max PM / OPP: Hide OPP configuration when SoCs do not provide an implementation PM: Allow devices to be removed during late suspend and early resume	2010-11-16 15:18:17 -08:00
Linus Torvalds	45314915ed	Merge branch 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Address compiler warnings in exit_robust_list	2010-11-16 14:31:03 -08:00
Linus Torvalds	2ebc8ec86f	Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] kprobes: Fix the return address of multiple kretprobes [S390] kprobes: disable interrupts throughout [S390] ftrace: build without frame pointers on s390 [S390] mm: add devmem_is_allowed() for STRICT_DEVMEM checking [S390] vmlogrdr: purge after recording is switched off [S390] cio: fix incorrect ccw_device_init_count [S390] tape: add medium state notifications [S390] fix get_user_pages_fast	2010-11-16 09:27:13 -08:00
Joe Perches	df6e61d4ca	kernel/sysctl.c: Fix build failure with !CONFIG_PRINTK Sigh... Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-16 07:56:09 -08:00
Eric Paris	12b3052c3e	capabilities/syslog: open code cap_syslog logic to fix build failure The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build failure when CONFIG_PRINTK=n. This is because the capabilities code which used the new option was built even though the variable in question didn't exist. The patch here fixes this by moving the capabilities checks out of the LSM and into the caller. All (known) LSMs should have been calling the capabilities hook already so it actually makes the code organization better to eliminate the hook altogether. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-15 15:40:01 -08:00
Colin Cross	00fafcda17	PM / PM QoS: Fix reversed min and max pm_qos_get_value had min and max reversed, causing all pm_qos requests to have no effect. Signed-off-by: Colin Cross <ccross@android.com> Acked-by: mark <markgross@thegnar.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org	2010-11-15 22:45:22 +01:00
Linus Torvalds	8a9f772c14	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: (27 commits) block: remove unused copy_io_context() Documentation: remove anticipatory scheduler info block: remove REQ_HARDBARRIER ioprio: rcu_read_lock/unlock protect find_task_by_vpid call (V2) ioprio: fix RCU locking around task dereference block: ioctl: fix information leak to userland block: read i_size with i_size_read() cciss: fix proc warning on attempt to remove non-existant directory bio: take care not overflow page count when mapping/copying user data block: limit vec count in bio_kmalloc() and bio_alloc_map_data() block: take care not to overflow when calculating total iov length block: check for proper length of iov entries in blk_rq_map_user_iov() cciss: remove controllers supported by hpsa cciss: use usleep_range not msleep for small sleeps cciss: limit commands allocated on reset_devices cciss: Use kernel provided PCI state save and restore functions cciss: fix board status waiting code drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs drbd: REQ_HARDBARRIER -> REQ_FUA transition for meta data accesses drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code ...	2010-11-12 08:52:47 -08:00
Linus Torvalds	28397babba	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation perf_events: Fix time tracking in samples perf trace: update usage perf trace: update Documentation with new perf trace variants perf trace: live-mode command-line cleanup perf trace record: handle commands correctly perf record: make the record options available outside perf record perf trace scripting: remove system-wide param from shell scripts perf trace scripting: fix some small memory leaks and missing error checks perf: Fix usages of profile_cpu in builtin-top.c to use cpu_list perf, ui: Eliminate stack-smashing protection compiler complaint	2010-11-12 08:39:52 -08:00
Dan Rosenberg	eaf06b241b	Restrict unprivileged access to kernel syslog The kernel syslog contains debugging information that is often useful during exploitation of other vulnerabilities, such as kernel heap addresses. Rather than futilely attempt to sanitize hundreds (or thousands) of printk statements and simultaneously cripple useful debugging functionality, it is far simpler to create an option that prevents unprivileged users from reading the syslog. This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the dmesg_restrict sysctl. When set to "0", the default, no restrictions are enforced. When set to "1", only users with CAP_SYS_ADMIN can read the kernel syslog via dmesg(8) or other mechanisms. [akpm@linux-foundation.org: explain the config option in kernel.txt] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Eugene Teo <eugeneteo@kernel.org> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-12 07:55:32 -08:00
Ken Chen	38715258aa	latencytop: fix per task accumulator Per task latencytop accumulator prematurely terminates due to erroneous placement of latency_record_count. It should be incremented whenever a new record is allocated instead of increment on every latencytop event. Also fix search iterator to only search known record events instead of blindly searching all pre-allocated space. Signed-off-by: Ken Chen <kenchen@google.com> Reviewed-by: Arjan van de Ven <arjan@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-12 07:55:31 -08:00
Alexey Khoroshilov	834b40380e	kernel/range.c: fix clean_sort_range() for the case of full array clean_sort_range() should return a number of nonempty elements of range array, but if the array is full clean_sort_range() returns 0. The problem is that the number of nonempty elements is evaluated by finding the first empty element of the array. If there is no such element it returns an initial value of local variable nr_range that is zero. The fix is trivial: it changes initial value of nr_range to size of the array. The bug can lead to loss of information regarding all ranges, since typically returned value of clean_sort_range() is considered as an actual number of ranges in the array after a series of add/subtract operations. Found by Analytical Verification project of Linux Verification Center (linuxtesting.org), thanks to Alexander Kolosov. Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-12 07:55:31 -08:00
Peter Zijlstra	1e5a74059f	sched: Fix cross-sched-class wakeup preemption Instead of dealing with sched classes inside each check_preempt_curr() implementation, pull out this logic into the generic wakeup preemption path. This fixes a hang in KVM (and others) where we are waiting for the stop machine thread to run ... Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1288891946.2039.31.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-11 14:37:23 +01:00
Mark Brown	43e60861fe	PM / OPP: Hide OPP configuration when SoCs do not provide an implementation Since the OPP API is only useful with an appropraite SoC-specific implementation there is no point in offering the ability to enable the API on general systems. Provide an ARCH_HAS OPP Kconfig symbol which masks out the option unless selected by an implementation. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Nishanth Menon <nm@ti.com> Acked-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-11-11 01:51:26 +01:00
Peter Zijlstra	2d46709082	sched: Fix runnable condition for stoptask Heiko reported that the TASK_RUNNING check is not sufficient for CONFIG_PREEMPT=y since we can get preempted with !TASK_RUNNING. He suggested adding a ->se.on_rq test to the existing TASK_RUNNING one, however TASK_RUNNING will always have ->se.on_rq, so we might as well reduce that to a single test. [ stop tasks should never get preempted, but its good to handle this case correctly should this ever happen ] Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 23:13:58 +01:00
Suresh Siddha	aae6d3ddd8	sched: Use group weight, idle cpu metrics to fix imbalances during idle Currently we consider a sched domain to be well balanced when the imbalance is less than the domain's imablance_pct. As the number of cores and threads are increasing, current values of imbalance_pct (for example 25% for a NUMA domain) are not enough to detect imbalances like: a) On a WSM-EP system (two sockets, each having 6 cores and 12 logical threads), 24 cpu-hogging tasks get scheduled as 13 on one socket and 11 on another socket. Leading to an idle HT cpu. b) On a hypothetial 2 socket NHM-EX system (each socket having 8 cores and 16 logical threads), 16 cpu-hogging tasks can get scheduled as 9 on one socket and 7 on another socket. Leaving one core in a socket idle whereas in another socket we have a core having both its HT siblings busy. While this issue can be fixed by decreasing the domain's imbalance_pct (by making it a function of number of logical cpus in the domain), it can potentially cause more task migrations across sched groups in an overloaded case. Fix this by using imbalance_pct only during newly_idle and busy load balancing. And during idle load balancing, check if there is an imbalance in number of idle cpu's across the busiest and this sched_group or if the busiest group has more tasks than its weight that the idle cpu in this_group can pull. Reported-by: Nikhil Rao <ncrao@google.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1284760952.2676.11.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 23:13:56 +01:00
Stephane Eranian	eed01528a4	perf_events: Fix time tracking in samples This patch corrects time tracking in samples. Without this patch both time_enabled and time_running are bogus when user asks for PERF_SAMPLE_READ. One uses PERF_SAMPLE_READ to sample the values of other counters in each sample. Because of multiplexing, it is necessary to know both time_enabled, time_running to be able to scale counts correctly. In this second version of the patch, we maintain a shadow copy of ctx->time which allows us to compute ctx->time without calling update_context_time() from NMI context. We avoid the issue that update_context_time() must always be called with ctx->lock held. We do not keep shadow copies of the other event timings because if the lead event is overflowing then it is active and thus it's been scheduled in via event_sched_in() in which case neither tstamp_stopped, tstamp_running can be modified. This timing logic only applies to samples when PERF_SAMPLE_READ is used. Note that this patch does not address timing issues related to sampling inheritance between tasks. This will be addressed in a future patch. With this patch, the libpfm4 example task_smpl now reports correct counts (shown on 2.4GHz Core 2): $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 noploop for 5 seconds IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 2,400,000,254 unhalted_core_cycles:u (33) 2,399,273,744 instructions_retired:u (34) 53,340 baclears (35) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 22:58:39 +01:00
Christoph Hellwig	02e031cbc8	block: remove REQ_HARDBARRIER REQ_HARDBARRIER is dead now, so remove the leftovers. What's left at this point is: - various checks inside the block layer. - sanity checks in bio based drivers. - now unused bio_empty_barrier helper. - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while, but Xen really needs to sort out it's barrier situaton. - setting of ordered tags in uas - dead code copied from old scsi drivers. - scsi different retry for barriers - it's dead and should have been removed when flushes were converted to FS requests. - blktrace handling of barriers - removed. Someone who knows blktrace better should add support for REQ_FLUSH and REQ_FUA, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:54:09 +01:00
Darren Hart	4c115e951d	futex: Address compiler warnings in exit_robust_list Since commit `1dcc41bb` (futex: Change 3rd arg of fetch_robust_entry() to unsigned int*) some gcc versions decided to emit the following warning: kernel/futex.c: In function ‘exit_robust_list’: kernel/futex.c:2492: warning: ‘next_pi’ may be used uninitialized in this function The commit did not introduce the warning as gcc should have warned before that commit as well. It's just gcc being silly. The code path really can't result in next_pi being unitialized (or should not), but let's keep the build clean. Annotate next_pi as an uninitialized_var. [ tglx: Addressed the same issue in futex_compat.c and massaged the changelog ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Tested-by: Matt Fleming <matt@console-pimps.org> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1288897200-13008-1-git-send-email-dvhart@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-10 13:27:50 +01:00
Heiko Carstens	becf91f187	[S390] ftrace: build without frame pointers on s390 s390 doesn't need FRAME_POINTERS in order to have a working function tracer. We don't need frame pointers in order to get strack traces since we always have valid backchains by using the -mkernel-backchain gcc option. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-11-10 10:05:54 +01:00
David Daney	433039e97f	watchdog: Fix section mismatch and potential undefined behavior. Commit `d9ca07a05c` ("watchdog: Avoid kernel crash when disabling watchdog") introduces a section mismatch. Now that we reference no_watchdog from non-__init code it can no longer be __initdata. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-05 17:45:35 -07:00
Oleg Nesterov	e0a7021710	posix-cpu-timers: workaround to suppress the problems with mt exec posix-cpu-timers.c correctly assumes that the dying process does posix_cpu_timers_exit_group() and removes all !CPUCLOCK_PERTHREAD timers from signal->cpu_timers list. But, it also assumes that timer->it.cpu.task is always the group leader, and thus the dead ->task means the dead thread group. This is obviously not true after de_thread() changes the leader. After that almost every posix_cpu_timer_ method has problems. It is not simple to fix this bug correctly. First of all, I think that timer->it.cpu should use struct pid instead of task_struct. Also, the locking should be reworked completely. In particular, tasklist_lock should not be used at all. This all needs a lot of nontrivial and hard-to-test changes. Change __exit_signal() to do posix_cpu_timers_exit_group() when the old leader dies during exec. This is not the fix, just the temporary hack to hide the problem for 2.6.37 and stable. IOW, this is obviously wrong but this is what we currently have anyway: cpu timers do not work after mt exec. In theory this change adds another race. The exiting leader can detach the timers which were attached to the new leader. However, the window between de_thread() and release_task() is small, we can pretend that sys_timer_create() was called before de_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-05 14:16:03 -07:00
Jesper Juhl	408af87a39	Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset We can optimize kernel/relay.c::relay_alloc_page_array() slightly by using vzalloc. The patch makes these changes: - use vzalloc instead of vmalloc+memset. - remove redundant local variable 'array'. - declare local 'pa_size' as const. Cuts down nicely on both source and object-code size. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-05 08:21:34 -07:00
Linus Torvalds	82279e6bd7	Merge branches 'irq-core-for-linus' and 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix up irq_node() for irq_data changes. genirq: Add single IRQ reservation helper genirq: Warn if enable_irq is called before irq is set up * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: semaphore: Remove mutex emulation staging: Final semaphore cleanup jbd2: Convert jbd2_slab_create_sem to mutex hpfs: Convert sbi->hpfs_creation_de to mutex Fix up trivial change/delete conflicts with deleted 'dream' drivers (drivers/staging/dream/camera/{mt9d112.c,mt9p012_fox.c,mt9t013.c,s5k3e2fx.c})	2010-10-31 20:40:24 -04:00

1 2 3 4 5 ...

10673 Commits