Vincent Tech Blog: 2009

Saturday, December 5, 2009

My asterisk patch got accepted

This is mostly for my kernel blog, but I feel occasionally some non-kernel work are good to be here too :-). After long-waiting, finally My first Asterisk patch got accepted:

https://issues.asterisk.org/file_download.php?file_id=24698&type=bug

lmadsen (administrator)
2009-04-09 10:44
License accepted! This looks like it could be useful :)

lmadsen (administrator)
2009-04-09 10:46

Don't forget to update the documentation either. If it is not already in XML format, that would also be an ideal change, but for now, you will need to update the docs in the code so that people at least know you can either pass it a file, or a playlist filename.

Thanks!
(0103048)

macli (reporter)
2009-04-09 15:58

I added some documentation in XML format, second patch app_mp3.diff uploaded
I will add note here describing how to play m3u playlist file :-)

#############
dialplan will look like this:
exten => 1234,1,Answer()
exten => 1234,n,MP3Player(/var/lib/asterisk/mp3/playlist.m3u)

and the playlist.m3u will be like this:

/var/lib/asterisk/mp3/1.mp3
/var/lib/asterisk/mp3/2.mp3
/var/lib/asterisk/mp3/3.mp3
..........
/var/lib/asterisk/mp3/n.mp3

It is easy to generate mp3 playlist file with .m3u, do the following:

find /path/to/your-mp3-file -type file *.mp3 > playlist.m3u
#############
(0103049)
macli (reporter)
2009-04-09 16:11

the second patch seems have one single long line, I uploaded third patch app_mp3.diff1 to comform to code guide line :-)
(0106495)
lmadsen (administrator)
2009-06-16 14:01

This is a new feature, so it can only go into trunk, but I'm marking this as targeted for 1.6.3.0 in the hopes we can get it merged in sooner rather than later. I will also set the status to Ready for Testing in the hopes we can get some other testers. Thanks!
(0114777)

dvossel (administrator)
2009-12-04 12:20

macli, I don't understand some of your changes. I uploaded a patch that strips out some some of the stuff that didn't seem necessary to me. Let me know what you think. The main thing I didn't understand was why you used the argument parser for a single argument and modified the http mpg321 code. It's still your work of coarse and will be documented as such when it is committed.
(0114784)

macli (reporter)
2009-12-04 13:40
Hi dvossel, I added the argument parser only for learning purpose, I do not remember exactly why I modifed mpg321 code, maybe it is related to some problem while playing some live online music list. Please go ahead using your simple solution, thanks!
(0114787)
svnbot (reporter)
2009-12-04 14:27

Repository: asterisk
Revision: 233234

U trunk/apps/app_mp3.c

------------------------------------------------------------------------
r233234 | dvossel | 2009-12-04 14:27:38 -0600 (Fri, 04 Dec 2009) | 9 lines

.m3u support for Mp3Player app

(closes issue 0014823)
Reported by: macli
Patches:
app_mp3.diff1 uploaded by macli (license )
Tested by: macli, dvossel

Friday, August 14, 2009

My patch to pagemap clear_refs

I posted one trivial patch to fix the user input and got accepted in Andrew Morton's mm test tree, Oh yeah, This is my first kernel patch and it got accepted, that is encouraging and making me feel contributing more :-)

http://marc.info/?t=125011994400002&r=1&w=2


    fs/proc/task_mmu.c v1: fix clear_refs_write() input sanity check

    v1 fix the compiling errors and keep the type variable name.

    Andrew Morton pointed out similar string hacking and obfuscated check for zero-length input
    at the end of the function, David Rientjes suggested to use strict_strtol to replace
    simple_strtol, this patch cover above suggestions, add removing of leading and trailing
    whitespace from user input. It does not change function behavious.

    This patch is rebased on mmotm-2009-08-04-14-22.

    Signed-off-by: Vincent Li 

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f884ad4..2a1bef9 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -492,21 +492,20 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
                                size_t count, loff_t *ppos)
 {
        struct task_struct *task;
-       char buffer[PROC_NUMBUF], *end;
+       char buffer[PROC_NUMBUF];
        struct mm_struct *mm;
        struct vm_area_struct *vma;
-       int type;
+       long type;

        memset(buffer, 0, sizeof(buffer));
        if (count > sizeof(buffer) - 1)
                count = sizeof(buffer) - 1;
        if (copy_from_user(buffer, buf, count))
                return -EFAULT;
-       type = simple_strtol(buffer, &end, 0);
+       if (strict_strtol(strstrip(buffer), 10, &type))
+               return -EINVAL;
        if (type < CLEAR_REFS_ALL || type > CLEAR_REFS_MAPPED)
                return -EINVAL;
-       if (*end == '\n')
-               end++;
        task = get_proc_task(file->f_path.dentry->d_inode);
        if (!task)
                return -ESRCH;
@@ -542,9 +541,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
                mmput(mm);
        }
        put_task_struct(task);
-       if (end - buffer == 0)
-               return -EIO;
-       return end - buffer;
+
+       return count;
 }

 const struct file_operations proc_clear_refs_operations = {

pagemap clear_refs: specify anon or mapped pages to be cleared

Moussa A. Ba posted patch to specify anon or mapped page reference to be cleared

http://marc.info/?l=linux-kernel&m=124882208428609&w=2

and the script to test out this patch:

http://marc.info/?l=linux-kernel&m=124898170201564&w=2

Monday, August 10, 2009

Add some trace events for the page allocator

Mel Gorman has posted series of patches to add trace events for page allocator:

http://marc.info/?l=linux-kernel&m=124991900725530&w=2

Here is How I tested out those patches:

1, make menuconfig as below:

kernel hacking -> Tracers

.config - Linux Kernel v2.6.31-rc5 Configuration
─────────────────────────────────────────────────────────────────────────────────────────────────────
┌─────────────────────────────────────────── Tracers ────────────────────────────────────────────┐
│ Arrow keys navigate the menu. selects submenus --->. Highlighted letters are │
│ hotkeys. Pressing includes, excludes, modularizes features. Press to │
│ exit, for Help, for Search. Legend: [*] built-in [ ] excluded module < > │
│ module capable │
│ ┌────────────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ --- Tracers │ │
│ │ [*] Kernel Function Tracer │ │
│ │ [*] Kernel Function Graph Tracer │ │
│ │ [ ] Interrupts-off Latency Tracer │ │
│ │ [ ] Sysprof Tracer │ │
│ │ [ ] Scheduling Latency Tracer │ │
│ │ [*] Trace syscalls │ │
│ │ [ ] Trace boot initcalls │ │
│ │ Branch Profiling (No branch profiling) ---> │ │
│ │ [ ] Trace power consumption behavior │ │
│ │ [ ] Trace max stack │ │
│ │ [*] Trace SLAB allocations │ │
│ │ [ ] Trace workqueues │ │
│ │ [*] Support for tracing block io actions │ │
│ │ [*] enable/disable ftrace tracepoints dynamically │ │
│ │ [*] Kernel function profiler │ │
│ │ [ ] Perform a startup test on ftrace │ │
│ │ [*] Memory mapped IO tracing │ │
│ │ < > Test module for mmiotrace │ │
│ │ < > Ring buffer benchmark stress tester │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────────────────────────────┘ │

2, read Documentation/trace/ftrace.txt and add line below in /etc/fstab:

debugfs /sys/kernel/debug debugfs defaults 0 0

3, apply the series of patches in order as git am series of patches

4, enable page allocator tracing events with:

#for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done

5, run post-process script to get the page allocator events tracing data:

trace-pagealloc-postprocess.pl < /sys/kernel/debug/tracing/trace_pipe

We could use perf user tools under tools/perf/ to enable the page allocator trace events and track events data. read more in Documentation/trace/tracepoint-analysis.txt

Wednesday, August 5, 2009

Move oom_adj to signal_struct discussion

I copy and paste the content here for future reference:

Full discussion thread is here:
http://marc.info/?l=linux-mm&m=124938152105260&w=2

http://marc.info/?l=linux-mm&m=124938156605311&w=2


Minchan Kim  wrote:
> Hmm. I can't understand why it is troublesome.
> I think it's related to moving oom_adj to singal_struct.
> Unfortunately, I can't understand why we have to put oom_adj
> in singal_struct?
>
> That's why I have a question to Kosaki a while ago.

> I can't understand it still. :-(
>
> Could you elaborate it ?
>

Current code is as following
==
  do_each_thread(g,p) {
        ......
        p = badness();

        record p of highest badness.
  }
  p = higest badness thread.

  Scan all threads which shares mm_struct of p. and check oom_adj

==
Assume a process which has 20000 threads. And 1 of thread has OOM_DISABLE.

Then, at worst, this scan will needs
        (1+2+3+....+20000) * (20000-1) scan. (when ignoring other processes)
even with your patch.

This means the kernel wastes enough long time that Cluster-Management-Software can
detetct this as livelock, and do reboot/cluster-fail-over.

Fixing livelock is not the last goal. I (we) would like to reduct stall time
to reasonable level. If we move oom_adj to signal_struct or mm_struct, scan-cost
will be only 20000. No retry at all.

And, if we can use for_each_process() rather than do_each_thread(),
scan-cost will be 1.

(BTW, "signal" struct is bad name I think, it should be "process" struct ;)


Thanks,
-Kame



> > > > > > > > Hi, Kosaki.
> > > > > > > >
> > > > > > > > I am so late to invole this thread.
> > > > > > > > But let me have a question.
> > > > > > > >
> > > > > > > > What's advantage of placing oom_adj in singal rather than task ?
> > > > > > > > I mean task->oom_adj and task->signal->oom_adj ?
> > > > > > > >
> > > > > > > > I am sorry if you already discussed it at last threads.
> > > > > > >
> > > > > > > Not sorry. that's very good question.
> > > > > > >
> > > > > > > I'm trying to explain the detailed intention of commit 2ff05b2b4eac
> > > > > > > (move oom_adj to mm_struct).
> > > > > > >
> > > > > > > In 2.6.30, OOM logic callflow is here.
> > > > > > >
> > > > > > > __out_of_memory
> > > > > > >   select_bad_process              for each task
> > > > > > >           badness                 calculate badness of one task
> > > > > > >   oom_kill_process                search child
> > > > > > >           oom_kill_task           kill target task and mm shared tasks with it
> > > > > > >
> > > > > > > example, process-A have two thread, thread-A and thread-B and it
> > > > > > > have very fat memory.
> > > > > > > And, each thread have following likes oom property.
> > > > > > >
> > > > > > >   thread-A: oom_adj = OOM_DISABLE, oom_score = 0
> > > > > > >   thread-B: oom_adj = 0,           oom_score = very-high
> > > > > > >
> > > > > > > Then, select_bad_process() select thread-B, but oom_kill_task refuse
> > > > > > > kill the task because thread-A have OOM_DISABLE.
> > > > > > > __out_of_memory() call select_bad_process() again. but select_bad_process()
> > > > > > > select the same task. It mean kernel fall in the livelock.
> > > > > > >
> > > > > > > The fact is, select_bad_process() must select killable task. otherwise
> > > > > > > OOM logic go into livelock.
> > > > > > >
> > > > > > > Is this enough explanation? thanks.
> > > > > > >
> > > >
> > > > The problem resulted from David patch.
> > > > It can solve live lock problem but make a new problem like vfork problem.
> > > > I think both can be solved by different approach.
> > > >
> > > > It's just RFC.
> > > >
> > > > If some process is selected by OOM killer but it have a child of OOM immune,
> > > > We just decrease point of process. It can affect selection of bad process.
> > > > After some trial, at last bad score is drastically low and another process is
> > > > selected by OOM killer. So I think Live lock don't happen.
> > > >
> > > > New variable adding in task struct is rather high cost.
> > > > But i think we can union it with oomkilladj
> > > > since oomkilladj is used to present just -17 ~ 15.
> > > >
> > > > What do you think about this approach ?
> > > >
> > > keeping this in "task" struct is troublesome.
> > > It may not livelock but near-to-livelock state, in bad case.
> >
> > Hmm. I can't understand why it is troublesome.
> > I think it's related to moving oom_adj to singal_struct.
> > Unfortunately, I can't understand why we have to put oom_adj
> > in singal_struct?
> >
> > That's why I have a question to Kosaki a while ago.
> > I can't understand it still. :-(
> >
> > Could you elaborate it ?
>
> Maybe, It's because my explanation is still poor. sorry.
> Please give me one more chance.
>
> In my previous mail, I explained select_bad_process() must not
> unkillable task, is this ok?
> IOW, if all thread have the same oom_adj, the issue gone.
>
> signal_struct is shared all thread in the process. then, the issue gone.
>
>

Your and Kame's good explanation opens my eyes. :)
I realized your approach's benefit.

Yes. Let's wait to listen others's opinios.

Friday, July 24, 2009

gfp_zone analysis

I have three zones on my system, DMA, NORMAL, HIGHMEM, Let's figure out how gfp_zone works:

assume that the allocation flags is 0x421 which can be tranlate to:

__GFP_DMA | GFP_HIGH | GFP_REPEAT

which means allocate memory from ZONE_DMA, Will gfp_zone be able to get ZONE_DMA from gfp flags?

Let's continue:

#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE)

The GFP_ZONEMASK on my system would be:

GFP_ZONEMASK = 0x03

We have total three zones, so

ZONES_SHIFT = 0x02

GFP_ZONE_TABLE would be:
2 << 0x02 =33=100001

bit = (__GFP_DMA | __GFP_HIGH | __GFP_REPEAT ) & 0x03 = 0x01


static inline enum zone_type gfp_zone(gfp_t flags)
{
       enum zone_type z;
       int bit = flags & GFP_ZONEMASK;

       z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) &
                                        ((1 <<>> bit) & 1);
       else {
#ifdef CONFIG_DEBUG_VM
               BUG_ON((GFP_ZONE_BAD >> bit) & 1);
#endif
       }
       return z;
}

z = (GFP_ZONE_TABLE >> (bit * ZONES_SHIFT)) & ((1 << ZONES_SHIFT) - 1)
= (100001 >> (0x01 * 0x02)) & (( 1 << 0x02 -1 )
= 1000 & 0011
= 0
= ZONE_DMA

Thursday, July 23, 2009

Chat about git on how to apply local custom patch on top of mainline master branch

(09:35:49) vincentinsz: hm, question about git, now my kernel git repo is 2.6.31-rc3, and i git branched a test branch and committed custom patch, so the test branch is 2.6.31-rc3 + custom patch
(09:36:47) vincentinsz: now the mainline kernel is 2.6.31-rc4, i git checkout master and git pull to sync to the mainline kernel
(09:37:26) vincentinsz: but how I sync my test branch so it would be 2.6.31-rc4 + custom branch?
(09:38:05) qunying: is your branch a direct checkout from the branch point or from the mainline
(09:43:23) vincentinsz: say linus is the public git repo, here is my working step: 1, git clone linus, 2, git pull (from time to time), 3, git branch test, 4, git commit custom patch, 5 git checkout master, 6 git pull ( new kernel tag released say -rc4), now how I let branch test sync to -rc4 + custom patch?
(09:44:44) qunying: normally you don't you git pull, as it will automatically merge
(09:45:12) qunying: use git fetch then git rebase origin
(09:45:48) qunying: will move your local commits to the top
(09:45:50) vincentinsz: but the merge only touch master branch, not test branch, I care about the custom patch in test branch, not master branch
(09:46:42) qunying: then in test branch, you rebase against master
(09:46:52) qunying: git rebase master
(09:47:45) vincentinsz: then the custom patch would be on top of master?
(09:47:57) qunying: ya
(09:48:45) qunying: it will bring you branch to the latest master + your own commit on top
(09:49:21) vincentinsz: ah, that is it
(09:52:17) vincentinsz: My thought is that I would never touch my local master branch except git pull to sync to linus public git repo, I only test custom patch on a local test branch and also would like to have the test branch sync to master with custom patch on top of it
(09:52:53) vincentinsz: so I would never ruin my local master branch
(09:53:46) vincentinsz: based on the idea that i would never run git clone again :-)
(09:53:54) vincentinsz: reasonable?
(09:55:13) qunying: ya
(09:57:49) vincentinsz: there is git stash, but it seems only save non-committed custom patch and reapply the patch on top
(09:58:04) qunying: ya
(10:06:24) vincentinsz: hm, interesting, it seems I can not save the gaim chat log any more to other text file
(10:06:34) vincentinsz: like copy and paste
(10:06:53) qunying: that is strange
(10:07:39) vincentinsz: I could highlight all the text, right click, there is copy option, but it wont be saved in memory
(10:08:14) vincentinsz: there is save as option in conversation menu, but it only save as html format, annoying
(10:08:40) qunying: ya
(10:09:04) vincentinsz: same to you?
(10:09:32) qunying: never try it, i always let gaim save its own
(10:09:58) qunying: mime is working fine
(10:10:18) vincentinsz: I would like to have the technical discussion posted on my personal blog, so I can always looked it up when I need it :)
(10:11:17) qunying: it works for me, probably i am using a newer version
(10:11:52) qunying: try just highlight the text, and use middle-key to paste on other program
(10:13:24) vincentinsz: I have no middle key on mouse, it is scroll key
(10:14:56) qunying: it is the same, press it like the other will do

Wednesday, July 15, 2009

include/linux/gfp.h


0x00u 0          ->     __GFP_NORMAL
0x01u 1          ->      __GFP_DMA
0x02u 10         ->      __GFP_HIGHMEM
0x04u 100        ->      __GFP_DMA32
0x08u 1000       ->      __GFP_MOVABLE
0x0fu 1111       ->      __GFP_ZONEMASK
0x10u 10000     ->   __GFP_WAIT
0x20u 100000     ->   __GFP_HIGH
0x40u 1000000    ->   __GFP_IO
0x80u 10000000   ->   __GFP_FS
0x100u 100000000  ->   __GFP_COLD

./scripts/gfp-translate to translate VM oops GFP flag hex code, for example 0x4020 would be
__GFP_COMP | __GFP_HIGH

Friday, July 10, 2009

The heart of zoned buddy allocator

__alloc_pages_nodemask
-> get_page_from_freelist (first attempt)
   ->__alloc_pages_slowpath (enter slow path allocation)
      ->wake_all_kswapd (wake up background page reclaiming to free pages)
         ->get_page_from_freelist (try again, got no page? continue)
           -> __alloc_pages_high_priority ( if ALLOC_NO_WATERMARKS, try this one)
             ->__alloc_pages_direct_reclaim (enter direct page reclaim )
               -> get_page_from_freelist (still got no page, and direct reclaim make no progrogress)
                 ->__alloc_pages_may_oom (enter OOM to kill some task to free some pages)

Wednesday, July 8, 2009

Analysis of shrink_slab function in mm/vmscan.c

The code snippet is referenced from 2.6.31-rc2


184 #define SHRINK_BATCH 128
185 /*
186  * Call the shrink functions to age shrinkable caches
187  *
188  * Here we assume it costs one seek to replace a lru page and that it also
189  * takes a seek to recreate a cache object.  With this in mind we age equal
190  * percentages of the lru and ageable caches.  This should balance the seeks
191  * generated by these structures.
192  *
193  * If the vm encountered mapped pages on the LRU it increase the pressure on
194  * slab to avoid swapping.
195  *
196  * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
197  *
198  * `lru_pages' represents the number of on-LRU pages in all the zones which
199  * are eligible for the caller's allocation attempt.  It is used for balancing
200  * slab reclaim versus page reclaim.
201  *
202  * Returns the number of slab objects which we shrunk.
203  */
204 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
205                         unsigned long lru_pages)
206 {
207         struct shrinker *shrinker;
208         unsigned long ret = 0;
209
210         if (scanned == 0)
211                 scanned = SWAP_CLUSTER_MAX;
212
213         if (!down_read_trylock(&shrinker_rwsem))
214                 return 1;       /* Assume we'll be able to shrink next time */
215
216         list_for_each_entry(shrinker, &shrinker_list, list) {
217                 unsigned long long delta;
218                 unsigned long total_scan;
219                 unsigned long max_pass = (*shrinker->shrink)(0, gfp_mask);
220
221                 delta = (4 * scanned) / shrinker->seeks;
222                 delta *= max_pass;
223                 do_div(delta, lru_pages + 1);
224                 shrinker->nr += delta;
225                 if (shrinker->nr < nr="%ld\n">shrink, shrinker->nr);
229                         shrinker->nr = max_pass;
230                 }
231
232                 /*
233                  * Avoid risking looping forever due to too large nr value:
234                  * never try to free more than twice the estimate number of
235                  * freeable entries.
236                  */
237                 if (shrinker->nr > max_pass * 2)
238                         shrinker->nr = max_pass * 2;
239
240                 total_scan = shrinker->nr;
241                 shrinker->nr = 0;
242
243                 while (total_scan >= SHRINK_BATCH) {
244                         long this_scan = SHRINK_BATCH;
245                         int shrink_ret;
246                         int nr_before;
247
248                         nr_before = (*shrinker->shrink)(0, gfp_mask);
249                         shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
250                         if (shrink_ret == -1)
251                                 break;
252                         if (shrink_ret <>nr += total_scan;
261         }
262         up_read(&shrinker_rwsem);
263         return ret;
264 }



Line 204: shrink_slab gets called multiple places, with cscope ctrl+\+c, we get:


1     61  fs/drop_caches.c <>
         nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
2   1697  mm/vmscan.c <>
         shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages);
3   1937  mm/vmscan.c <>
         nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
4   2193  mm/vmscan.c <>
         shrink_slab(nr_pages, sc.gfp_mask, lru_pages);
5   2229  mm/vmscan.c <>
         shrink_slab(sc.nr_scanned, sc.gfp_mask,
6   2247  mm/vmscan.c <>
         shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
7   2454  mm/vmscan.c <<__zone_reclaim>>
         while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&

Tracing back to the calling function, We will see that scanned parameter refer to the scanned LRU pages,
sc->nr_scanned, lru_pages refer to total LRU pages in zones.

Line 216 - 261 loop through shrinker list to shrink slab caches
Line 219 get the maximum shrink cache sizes, See include/linux/mm.h

862 /*
863  * A callback you can register to apply pressure to ageable caches.
864  *
865  * 'shrink' is passed a count 'nr_to_scan' and a 'gfpmask'.  It should
866  * look through the least-recently-used 'nr_to_scan' entries and
867  * attempt to free them up.  It should return the number of objects
868  * which remain in the cache.  If it returns -1, it means it cannot do
869  * any scanning at this time (eg. there is a risk of deadlock).
870  *
871  * The 'gfpmask' refers to the allocation we are currently trying to
872  * fulfil.
873  *
874  * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is
875  * querying the cache size, so a fastpath for that case is appropriate.
876  */
877 struct shrinker {
878         int (*shrink)(int nr_to_scan, gfp_t gfp_mask);
879         int seeks;      /* seeks to recreate an obj */
880
881         /* These are for internal use */
882         struct list_head list;
883         long nr;        /* objs pending delete */
884 };


Line 221 - 224 get the pending shrink object numbers, It maches the code comment above about the
"age equal percentages of the lru and ageable caches"

Line 243 - 258 do batch of SHRINK_BATCH scanning and accumulating shrinked objects to ret variable

Line 263 return the shrinked slab cache objects

A sample git work flow to send/receive patch by email

I googled and tried couple of git work flows to send/receive trivial kernel patches, Here is my summary:




###################################################################

# References

http://linux.yyz.us/git-howto.html

http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html

http://www.kernel.org/pub/software/scm/git/docs/git-send-email.html





# One time commands



> apt-get install git-email

> cd /usr/src

> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

linux-2.6

> cd /usr/src/linux-2.6

> git config --global user.name "Vincent Li"

> git config --global user.email "username@example.com"

> git config --global sendemail.smtpserver smtp.example.com

> git config --global sendemail.smtpserverport 587

> git config --global sendemail.smtpuser  username

> git config --global sendemail.smtppass userpass



update: ( I did for my gmail account below)



  199  apt-get install git-email

  203  git config --global sendemail smtpserver smtp.gmail.com

  205  cd .git

  210  git branch

  212  git config --global sendemail.smtpserver smtp.gmail.com

  213  git config --global sendemail.smtpserverport 587

  214  git config --global sendemail.smtpencryption tls

  215  git config --global sendemail.smtpuser myusername@gmail.com

  216  git config --global sendemail.smtppass xxxxx

  233  git config --global user.name "Vincent Li"

  234  git config --global user.email "myusername@gmail.com"







# Make a new branch  for the patch you're doing.  In this case, I'll do replacing BUG_ON with VM_BUG_ON in mm/vmscan.c



> git checkout -b experimental



# Now edit the file



> perl -pi -e 's/^(\t+)BUG_ON/$1VM_BUG_ON/g' mm/vmscan.c

> git commit -a



# Put in a simple message of a line or two.



Trivial: Replace BUG_ON with VM_BUG_ON for consistency



VM subsystem use VM_BUG_ON to test likely bug situation,mm/vmscan.c still have three BUG_ON left, Replacing it with VM_BUG_ON for code consistency.



# Now exit the editor



# Check the commit, which is the most recent one by default



> git log -1



# See the actual patch with:

> git diff master..HEAD



commit 64ea153753811970563ecf5938a8a87c54336495

Author: Vincent Li 

Date:   Wed Jul 8 10:17:37 2009 -0700



   Trivial: Replace BUG_ON with VM_BUG_ON for consistency



VM subsystem use VM_BUG_ON to test likely bug situation,mm/vmscan.c still have three BUG_ON left, Replacing it with VM_BUG_ON for code consistency.





#If you want to format a single commit with signed off, you can do this with "git format-patch -1 -s ".



> git format-patch -1 -s 64ea153

0001-Trivial-Replace-BUG_ON-with-VM_BUG_ON-for-consisten.patch



# Now look the patch over and see if you need to edit the subject or anything





# Now do a dry run to send the email

> git send-email --dry-run --to=username@example.com

0001-Trivial-Replace-BUG_ON-with-VM_BUG_ON-for-consisten.patch



# Looks good, send for real

> git send-email --to=linux-kernel@vger.kernel.org

0001-Trivial-Replace-BUG_ON-with-VM_BUG_ON-for-consisten.patch



#I use alpine as email client, save the email as mbox single file, for example /tmp/trivial.patch, now use git am to apply the patch



> git checkout master



#git-am  refuses to process new mailboxes while the .git/rebase-apply directory exists, so if you decide to start over from scratch,

 run rm -f -r .git/rebase-apply before running the command with mailbox names.

> rm -f -r .git/rebase-apply

> git am  /tmp/trivial.patch

Applying: Trivial: Replace BUG_ON with VM_BUG_ON for consistency



> git log -1



commit 9ba28a665d0a642f9bfda54a6ffedb8c0e8dbd8b

Author: Vincent Li 

Date:   Wed Jul 8 10:35:08 2009 -0700



   Trivial: Replace BUG_ON with VM_BUG_ON for consistency



   VM subsystem use VM_BUG_ON to test likely bug situation,mm/vmscan.c

   still have three BUG_ON left, Replacing it with VM_BUG_ON for code consistency.



   Signed-off-by: Vincent Li 



# Now if you are not happy with the patch, and don't want it in history, reset master branch with



>git reset --hard HEAD^

That is my sample git work flow, of course you can merge your experimental branch patch with master branch with git merge, I just showed you the way to format/send/receive/apply patch by email, since from time to time, you may need to send out trivial patch and test out other's patch as system administrator, not full time programmer.

more info on how to submit multiple patches from linke below:

http://www.spinics.net/lists/newbies/msg44250.html

For example:

Create a local branch for a tree:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/sfr/linux-next.git
$ cd linux-next
$ git checkout -b devel origin/master

Do some change and commit:

$ emacs drivers/staging/pohmelfs/dir.c
$ git add drivers/staging/pohmelfs/dir.c
$ git commit -m "Staging: pohmelfs/dir.c: Fix something"

Do another change and commit:

$ emacs drivers/staging/pohmelfs/dir.c
$ git commit -m "Staging: pohmelfs/dir.c: Fix another thing"

Generate your patchset with your last two commits

$ git format-patch -s -2

This will create one file for each patch generated.

So to send your patchset you can use the command:

$ git send-email --compose --to='Zac Storer '
--cc='kernelnewbies@xxxxxxxxxxxxxxxxx' *.patch

The command will extract the commit message and use it as the mail
subject, with the --compose flag you can create a prelude mail
explaining your patchset.

So this command will create 3 mails with these subjects

[PATCH 0/2] Staging: pohmelfs/dir.c: Fixes
[PATCH 1/2] Staging: pohmelfs/dir.c: Fix something
[PATCH 2/2] Staging: pohmelfs/dir.c: Fix another thing

Also you can be sure that your email client didn't wrap lines and the
message era encoded in ASCII.

Remember always to use scripts/checkpatch.pl to check your patches and
scripts/get_maintainer.pl to check who are the developers that have to
be cc'ed.

Friday, July 3, 2009

Direct Page reclaim and Background Page reclaim call path

Direct Page Reclaim call path

__get_free_pages ->
alloc_pages ->
  alloc_pages_nodemask ->
     __alloc_pages_slowpath ->
         __alloc_pages_direct_reclaim ->
            try_to_free_pages ->
                do_try_to_free_pages ->
                  shrink_slab/shrink_zones ->
                        shrink_zone ->
                            shrink_list ->
                               shrink_inactive/active_list ->
                                   shrink_page_list ->
                                        page_out ->
                                                |
                                               V
                      mapping->a_ops->writepage


Background Page Reclaim call path


wakeup_kswapd ->
kswapd ->
  balance_pgdat ->
    shrink_slab/shrink_zone ->
       shrink_list ->
          shrink_inactive/active_list ->
             shrink_page_list ->
                page_out ->
                  |
                  V
    
   mapping->a_ops->writepage



Note: Pages is moved from active list to inactive list for freeing in the end.
, shrink_active_list move pages to inactive list, when moving pages, pages are
isolated from lru list to a private list (page_list or l_hold).

Thursday, July 2, 2009

VM_BUG_ON(PageLRU(page) and VM_BUG_ON(!PageLRU(page) in mm/vmscan.c

My confusion about VM_BUG_ON(!PageLRU(page) vs VM_BUG_ON(PageLRU(page)

http://zh-kernel.org/pipermail/linux-kernel/2009-June/011552.html

kernel virtual address caculation


I had an interesting chat with my friend qunying about how to caculate the hex presentation of address to a size:

(14:29:56) vincentinsz: on x86 32bit the kernel image located at physical address 1MiB, which translate to 0x00100000, but how 0x00100000 equals 1M, how to caculate it?
(14:32:00) qunying: 1024*1024 = 0x100000
(14:32:11) qunying: that is 1MiB
(14:33:56) vincentinsz: is there easy way to see 0x100000 as 1024 * 1024?
(14:34:18) qunying: just count the zeros
(14:34:36) qunying: one 0 in hex is 2^^2
(14:34:49) qunying: there is 5 zero, that is 2^^10
(14:35:00) qunying: that is 1M
(14:35:54) qunying: one 0 is 2^^4
(14:35:58) qunying: not 2,
(14:36:09) qunying: 5 zero is 2^^20
(14:37:59) vincentinsz: how do you get one 0 is 2^^4?
(14:38:27) qunying: for one number in hex represents 4 bits in binary
(14:38:55) qunying: 0x10 = 2^4, 0x100 = 2^8, etc
(14:40:01) vincentinsz: 0x10 = 1000 0000
(14:40:13) qunying: ya
(14:40:23) qunying: no
(14:40:29) qunying: 001 000
(14:40:34) qunying: 0001 0000
(14:40:47) vincentinsz: i see
(14:42:20) vincentinsz: what about some other hex address like 0xC0000000 which is about 3G, How to caculate
(14:45:10) vincentinsz: so there is 7 0s which is 2^^28?
(14:45:24) qunying: ya
(14:45:35) qunying: C is 1100
(14:46:00) qunying: so times 2^12
(14:46:35) vincentinsz: ah, so 2^^30 * 3?
(14:48:33) qunying: ya


---

Return page_count(page) - !!page_has_private(page) == 2 discussion


(15:34:44) vincentinsz:  287 static inline int is_page_cache_freeable(struct page *page)
288 {
289         return page_count(page) - !!page_has_private(page) == 2;
290 }

(15:35:21) vincentinsz: this function eventually returns 0 or 1, right?
(15:36:35) qunying: not understand it fully, looks strange to me
(15:38:20) qunying: as !!page_has_private(page) should return 0 or 1, and  !!page_has_private(page) == 2 should always fail, then that is the result of page_count(page)
(15:40:02) vincentinsz: I thought  it is like return 3 - 1 == 2? 1 : 0 ?
(15:40:45) qunying: ah right, forgot the '-'
(15:41:11) qunying: it is always return 0 or 1
(15:42:52) vincentinsz: not sure why number 2 is special in this case ==2
(15:43:13) vincentinsz: why not == 1, or == 3 ?
(15:44:00) qunying: that is beyond my understanding, you make dig into how page_count is working
(15:53:00) vincentinsz: what the !! is for, like !!func(a), always get the oposite of function retuning value?
(15:53:47) qunying: not, it normalize the return code to 0 or 1
(15:54:02) qunying: some func(a0 may return > 1 or <> 1 to make it 1, <> 1 to make it 1, 0 to make it 0
(15:57:25) qunying: ya
(15:57:35) vincentinsz: f**k :-) so <> 0 to make it 1
(16:58:12) qunying logged out.


---


(09:31:33) vincentinsz: Hi, still to the strange !!((page)->flags & ((1 <<>flags & ((1 <<>flags to something like 00010000, assuming the 1 bit value represents the PG_private, am I right?
(09:34:46) qunying: yes
(09:35:33) vincentinsz: then !!(0000100000) make it to vaule 1, right?
(09:35:59) qunying: yes
(09:38:03) vincentinsz: someone else had this explaintion: [url=http://zh-kernel.org/pipermail/linux-kernel/2009-June/011228.html]http://zh-kernel.org/pipermail/linux-kernel/2009-June/011228.html[/url]
(09:38:32) vincentinsz: is that the same thing as you said?
(09:39:27) qunying: ya
(09:40:13) vincentinsz: is that to say that !! will always get 1?
(09:40:36) qunying: no, it says none 0 value to 1
(09:40:49) qunying: 0 will always get 0
(09:51:05) vincentinsz: ok. Oh and the page_count(page) -  !!((page)->flags & ((1 <<>flag PG_private is set, then there should be another two  bit set to 1 in (page)->flags so that this page can be freeable
(09:51:53) qunying: i see
(09:53:06) vincentinsz: the other two bit could mean a page is in user mapped address space and LRU (Least recently used) list which are most likely for page reclaim candidate

(10:03:46) vincentinsz: There are many details, I could be wrong :-), the devil is the detail
(10:04:19) qunying: ^_6
(10:48:29) vincentinsz: ok, more, page_count(page) count the reference count of page, if page flag PG_private bit flag is set, the page is  pagecache page backed by inode or swap , so the pagecache itself would have 1 reference to the page, that is at least 2 ref count. Then the page has to be referenced in LRU list so it can be freed, that is 3.
(10:51:11) qunying: hmm, that is why it minors the 1 reference from private bit reference
(10:51:16) qunying: minus

and my question in zh-kernel mailing list


http://zh-kernel.org/pipermail/linux-kernel/2009-June/011426.html


Johannes Weiner has patched this function with comments to make it clear


http://marc.info/?l=linux-mm&m=124830074212169&w=2

I had a chat on #mm channel with hnaz about task_struct's member children and sibling list head:



* Now talking on #mm
* Topic for #mm is:  Memory Management - http://linux-mm.org/
* Topic for #mm set by ChanServ!services@services.oftc.net at Fri May 22 01:44:02 2009
macli I am newbie, and reading mm/oom_kill.c, wondering why list_for_each_entry(child, &p->children, sibling) in badness(), not list_for_each_entry(child, &p->children, children)?

hnaz macli: it's the linkname.  task->children is the head of a list that is linked by task->sibling

macli hnaz: I see #define list_for_each_entry(pos, head, member) in list.h,  where I can find the code that task->children ,the list head which is linked by task->sibling, I see struct task_struct has two list_head children and silbling

macli struct list_head children;      /* list of my children */

macli struct list_head sibling;       /* linkage in my parent's children list */

macli I am assuming that children is the list head of a task's children list

macli sibling is the list head of a task's parent's children list which is different with the children list head, that is my understanding of reading the comment

hnaz macli: you have to understand that a 'list_head' is at the same time a node.  it represents one link in the list

hnaz macli: children is the link to other task structs that represent the children

hnaz macli: while sibling is the link to chain up a task as part of another task's children list

macli hnaz: I see, thanks for the explaintion

Vincent Tech Blog